Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These watermarks are not robust to paraphrasing attacks: AUC ROC falls from 0.95 to 0.55 (barely better than guessing) for a 100 token passage.

The existing impossibility results imply that these attacks are essentially unavoidable (https://arxiv.org/abs/2311.04378) and not very costly, so this line of inquiry into LLM watermarking seems like a dead end.



I spent the last five years doing PhD research into steganography, with a particular focus on how to embed messages into LLM outputs. Watermarking is basically one-bit steganography.

The first serious investigations into "secure" steganography were about 30 years ago and it was clearly a dead end even back then. Sure, watermarking might be effective against lazy adversaries--college students, job applicants, etc.--but can be trivially defeated otherwise.

All this time I'd been lamenting my research area as unpopular and boring when I should've been submitting to Nature!


Though, surely secure steganography with LLMs should be quite easy?

Presumably there are things like key exchanges that look like randomness, and then you could choose LLM output using that randomness in such a way that you can send messages that look like an LLM conversation?

Someone starts the conversation with a real message 'Hello!' and then you do some kind of key exchange where what is exchanged is hard to distinguish from randomness, and use those keys to select the probabilities of the coming tokens from the LLM. Then once they is established you use some kind of cipher to generate random-looking ciphertext and use that as the randomness used to select words in the final bit?

Surely that would work? If there is guaranteed insecurity, it's for things like watermarking, not for steganography?


I’ve been working in the space since 2018. Watermarking and fingerprinting (of models themselves and outputs) are useful tools but they have a weak adversary model.

Yet, it doesn’t stop companies from making claims like these, and what’s worse, people buying into them.


Watermarking is not the way to go. It relies on the honesty of the producers, and watermarks can be easily stripped. With images, the way to go is detect authentic images, not fake ones. I've written about this extensively: https://dev.to/kylepena/addressing-the-threat-of-deep-fakes-...


I think this misses a key point.

If there were a law that AI generated text should be watermarked then major corporations would take pains to apply the watermark, because if they didn't then they would be exposed to regulatory and reputational problems.

Watermarking the text would enable people training models to avoid it, and it would allow search engines to determine not to rely on it (if that was the search engine preference).

It would not mean that all text not watermarked was human generated, but it would mean that all text not watermarked and provided by institutional actors could be trusted.


> It would not mean that all text not watermarked was human generated, but it would mean that all text not watermarked and provided by institutional actors could be trusted.

What?


well - trusted in the sense that the unwatermarked text was human generated ;o)


You simply cannot trust that non-watermarked text was human generated. Laws can be broken. Companies are constantly being found in violation of the law.

You're trading the warm feeling of an illusion of trust for a total lack of awareness and protection against even the most mild attempt at obfuscation. This means that people who want to hurt or trick you, will have free reign to do it, even if it means your 90-year-old grandmother lacks the skill.


Here's an example of why I think this would work.

GDPR.

How many breaches of privacy by large orgnaizations occur in the EU? When they occur, what happens?

On the other hand - what's the story in the USA?

Alternatively what would have happened if we simply said "data privacy cannot be maintained, no laws will help"?


Even if you achieved perfect compliance with law-abiding organizations, that does nothing to protect you against any individual organization which does not abide by local laws.

Consider any hacker from a non-extraditing rogue state.

Consider any nation state actor or well-equipped NGO. They are more motivated to manipulate you than Starbucks.

Consider the slavish, morbid conditions faced by foreign workers who manufacture your shoes and mine your lithium. All of your favorite large companies look the other way while continuing to employ such labor today, and have a long history of partnering with the US government to overthrow legitimate foreign democratic regimes in order maintain economic control. Why would these companies have better ethics regarding AI-generated output?

And consider the US government, whose own intelligence agencies are no longer forbidden from employing domestic propaganda, and whom will certainly get internal permission to circumnavigate any such laws, while still exploiting them to their benefit.


Ok, so what protects you from these folks? What positive measure can be suggested here - that is better than the measures I suggest and subsumes them?


The solution is not to watermark anything, because it is futile. Teach your citizens that anything that can be machine generated, will be machine generated. Where exactly is the problem here?


> How many breaches of privacy by large orgnaizations occur in the EU? When they occur, what happens?

Malicious non-compliance is still common IME. Enforcement is happening but has been focused on the very large egregious abuse so far only.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: