I'm fascinated that this approach works at all, but that said, I don't believe watermarking text will ever be practical. Yes, you can do an academic study where you have exactly 1 version of an LLM in exactly 1 parameter configuration, and you can have an algorithm that tweaks the logits of different tokens in a way that produces a recognizable pattern. But you should note that the pattern will be recognizable only when the LLM version is locked and the parameter configuration is locked. Which they won't be in the real world. You will have a bunch of different models, and people will use them with a bunch of different parameter combinations. If your "detector" has to be able to recognize AI generated text from a variety of models and a variety of parameter combinations, it's no longer going to work. Even if you imagine someone bruteforcing all these different combos, trouble is that some of the combos will produce false positives just because you tested so many of them. Want to get rid off those false positives? Go ahead, make the pattern stronger. And now you're visibly altering the generated text to an extent where that is a quality issue.
Even with temperature = 0, LLMs are still non-deterministic, as their internal, massively parallelized calculations are done with floating point arithmetic, which is order-dependent. Running the same LLM with the exact same parameters multiple times might still yield slightly different probabilities in the output, making this watermarking scheme even less robust.
This isn't necessarily true, it just depends on the implementation. I can say that because I've published research which embeds steganographic text into the output of GPT-2 and we had to deal with this. Running everything locally was usually fine--the model was deterministic as long as you had the same initial conditions. The problems occurred when trying to run the model on different hardware.
That's not my experience unless LLM providers are caching results. It's frustratingly difficult to get it to output substantially different text for a given prompt. It's like internally it always follows mostly the same reasoning for step 1, then step 2 applies light fudging of the output to give the appearance of randomness, but the underlying structure is generally the same. That's why there's so much blog spam that all pretty much read the same, but while one "delves" into a topic another "dives" into it.
How long until they can write genuinely unique output without piles of additional prompting?
Hmm, I ask LLMs to write me stories all the time, and I only give it a couple sentences as a prompt, loosely describing the setting of the story. And If I prompt it the exact same way, the events of the story are usually very different.
In practice, every programmer or a writer who gets the LLM output, does a lot of rewriting for already existing code, or already existing text. Stitching together parts of many LLM outputs is the only way to use an LLM effectively, even stitching together parts of different LLMs, which i do all the time.
Recognizing only parts of a watermark, and many watermarked parts scattered all around doesn't seem possible at all, in my mind.
They can however develop a software to sell very expensively to universities, schools etc, and it will occasionally catch a very guilty person who uses it all the time and doesn't even try to make the answer better, who always hands over the LLM answer in one piece.
At the end of the day, it will lead to so many false accusations people will stop trusting it. In chess players and tournaments false accusations of cheating happen all the time, for 15 years or more. Right now former world chess champion Kramnik has accused over 50 top chess players of cheating, including the 5 times US champion Nakamura, in the span of 2 months.
If a software like that gets applied to schools and universities, we are gonna have the fun of our lives.
In summary, this will not work in practice. Ever.