Hacker Newsnew | past | comments | ask | show | jobs | submit | bron123's commentslogin

We introduce the Adversarial Confusion Attack as a new mechanism for protecting websites from MLLM-powered AI Agents. Embedding these “Adversarial CAPTCHAs” into web content pushes models into systemic decoding failures, from confident hallucinations to full incoherence. The perturbations disrupt all white-box models we test and transfer to proprietary systems like GPT-5 in the full-image setting. Technically, the attack uses PGD to maximize next-token entropy across a small surrogate ensemble of MLLMs.


We’ve released a preliminary work introducing a new vector of attack against multimodal LLMs. Unlike jailbreaks or targeted misclassification, this method explicitly optimizes for confusion - maximizing next-token entropy to induce systematic malfunction.

The attack successfully fooled GPT-5, producing structured hallucinations from an adversarial image.

The ultimate goal is to prevent AI Agents from reliably operating on websites by embedding adversarial “confusion images” into their visual environments.


Łysa Góra in Poland might have once been a pyramid. Crypts were discovered beneath the peak of the mountain by Adolf Kudlinski (check youtube).

Polish archaeologists refuse to investigate further, dismissing it as pseudohistory.

They can't handle the possibility that an advanced civilization could have existed as far back as the Neolithic period.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: