Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a digital signature, same as sending an email with GPG to prove you sent it. You wouldn't say that because some people use GPG you can somehow detect who wrote every email on earth, it's a push model vs pull. This is why I wrote "any sentence" vs "some sentences".


Watermarking is not at all like a digital signature and a lot like steganography. I only have a surface level understanding of the process, but it works by biasing token selection to encode information into the resulting text in a way that's resistant to later modifications and rephrasing.

I have my doubts about the effectiveness of this method and realistically, it won't make any difference because the bad actors will just use an LLM that doesn't snitch on them, so you're technically correct.


The only way to make that stenography robust is to have the encoded message be generated with some secret key that can be verified. Otherwise anyone could manually fake the stenography into human typed messages assisted by some encoder and you'd have no way of telling if it was really typed by an LLM. That line of thinking is what makes it have to be like a signature to work like you said for "any sentence". I also think these methods only work above certain character limits. Short messages are impossible to tell.


If you look here : GitHub.com/HNx1/IdentityLM you can see that it’s relatively easy to sign LLM output with a private key using an adaptation of the watermarking method.


This application is exactly what I was describing. I'll look it over to see how it scales the encryption strength based on token length or how it deals with short messages, which is the only thing I'd think it'd be very hard to do. If you print 2 paragraphs it's easy to change some tokens with a secret key mask but if you print "Yes", it's not so easy. Thanks for the great share.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: