I'm fascinated that this approach works *at all*, but that said, I don't believe...

TeMPOraL · 2024-11-05T23:49:25 1730850565

Even with temperature = 0, LLMs are still non-deterministic, as their internal, massively parallelized calculations are done with floating point arithmetic, which is order-dependent. Running the same LLM with the exact same parameters multiple times might still yield slightly different probabilities in the output, making this watermarking scheme even less robust.

jkhdigital · 2024-11-06T00:17:30 1730852250

This isn't necessarily true, it just depends on the implementation. I can say that because I've published research which embeds steganographic text into the output of GPT-2 and we had to deal with this. Running everything locally was usually fine--the model was deterministic as long as you had the same initial conditions. The problems occurred when trying to run the model on different hardware.

nprateem · 2024-11-06T05:06:14 1730869574

That's not my experience unless LLM providers are caching results. It's frustratingly difficult to get it to output substantially different text for a given prompt. It's like internally it always follows mostly the same reasoning for step 1, then step 2 applies light fudging of the output to give the appearance of randomness, but the underlying structure is generally the same. That's why there's so much blog spam that all pretty much read the same, but while one "delves" into a topic another "dives" into it.

How long until they can write genuinely unique output without piles of additional prompting?

SirMaster · 2024-11-07T19:07:45 1731006465

Hmm, I ask LLMs to write me stories all the time, and I only give it a couple sentences as a prompt, loosely describing the setting of the story. And If I prompt it the exact same way, the events of the story are usually very different.

saagarjha · 2024-11-07T08:35:51 1730968551

This is generally not a problem for most inference.

emporas · 2024-11-05T23:58:39 1730851119

In practice, every programmer or a writer who gets the LLM output, does a lot of rewriting for already existing code, or already existing text. Stitching together parts of many LLM outputs is the only way to use an LLM effectively, even stitching together parts of different LLMs, which i do all the time.

Recognizing only parts of a watermark, and many watermarked parts scattered all around doesn't seem possible at all, in my mind.

They can however develop a software to sell very expensively to universities, schools etc, and it will occasionally catch a very guilty person who uses it all the time and doesn't even try to make the answer better, who always hands over the LLM answer in one piece.

At the end of the day, it will lead to so many false accusations people will stop trusting it. In chess players and tournaments false accusations of cheating happen all the time, for 15 years or more. Right now former world chess champion Kramnik has accused over 50 top chess players of cheating, including the 5 times US champion Nakamura, in the span of 2 months.

If a software like that gets applied to schools and universities, we are gonna have the fun of our lives.