That's a digital signature, same as sending an email with GPG to prove you sent ...

dns_snek · on Feb 21, 2024

Watermarking is not at all like a digital signature and a lot like steganography. I only have a surface level understanding of the process, but it works by biasing token selection to encode information into the resulting text in a way that's resistant to later modifications and rephrasing.

I have my doubts about the effectiveness of this method and realistically, it won't make any difference because the bad actors will just use an LLM that doesn't snitch on them, so you're technically correct.

vasco · on Feb 21, 2024

The only way to make that stenography robust is to have the encoded message be generated with some secret key that can be verified. Otherwise anyone could manually fake the stenography into human typed messages assisted by some encoder and you'd have no way of telling if it was really typed by an LLM. That line of thinking is what makes it have to be like a signature to work like you said for "any sentence". I also think these methods only work above certain character limits. Short messages are impossible to tell.

tempusalaria · on Feb 21, 2024

If you look here : GitHub.com/HNx1/IdentityLM you can see that it’s relatively easy to sign LLM output with a private key using an adaptation of the watermarking method.

vasco · on Feb 21, 2024

This application is exactly what I was describing. I'll look it over to see how it scales the encryption strength based on token length or how it deals with short messages, which is the only thing I'd think it'd be very hard to do. If you print 2 paragraphs it's easy to change some tokens with a secret key mask but if you print "Yes", it's not so easy. Thanks for the great share.