> the team tested it on 20 million prompts given to Gemini. Half of those prompts were routed to the SynthID-Text system and got a watermarked response, while the other half got the standard Gemini response. Judging by the “thumbs up” and “thumbs down” feedback from users, the watermarked responses were just as satisfactory to users as the standard ones.
Three comments here:
1. I wonder how many of the 20M prompts got a thumbs up or down. I don't think people click that a lot. Unless the UI enforces it. I haven't used Gemini, so I might be unaware.
2. Judging a single response might be not enough to tell if watermarking is acceptable or not. For instance, imagine the watermarking is adding "However," to the start of each paragraph. In a single GPT interaction you might not notice it. Once you get 3 or 4 responses it might stand out.
3. Since when Google is happy with measuring by self declared satisfaction? Aren't they the kings of A/B testing and high volume analysis of usage behavior?
My timesheet SAAS constantly asks for feedback, which I give 0/10 as constantly asking for feedback really annoys me.
They then contact me and ask me why, so I tell them then they say there is nothing they can do. A week later I’ll get a pop up asking for feedback and we go round the same loop again.
Three comments here:
1. I wonder how many of the 20M prompts got a thumbs up or down. I don't think people click that a lot. Unless the UI enforces it. I haven't used Gemini, so I might be unaware.
2. Judging a single response might be not enough to tell if watermarking is acceptable or not. For instance, imagine the watermarking is adding "However," to the start of each paragraph. In a single GPT interaction you might not notice it. Once you get 3 or 4 responses it might stand out.
3. Since when Google is happy with measuring by self declared satisfaction? Aren't they the kings of A/B testing and high volume analysis of usage behavior?