Evaluators with CriticGPT outperform those without 60% of the time. So, slightly...

pama · 2024-06-28T02:10:28 1719540628

I’m not sure why 60 vs 40 is slightly better than random chance. A person using this system has a 50% higher success rate than those not using it. I wouldnt call this a slight better result.

smrq · 2024-06-28T02:27:22 1719541642

That's just... obviously incorrect. It's 10% higher. (60%, as opposed to the expected 50% from random chance.)

pama · 2024-06-28T02:36:51 1719542211

I’m not sure what you mean.

You can see the plots if you prefer, or think of it this way: out of a total of 100 trials, one team gets 40 and the other gets 60 = 40 + 40 * 50%

If you want to think of a 75% win rate as a more extreme example: you could say 25% above random or you could say one team wins 3 times as many cases as the other. Both are equivalent but I think that the second way conveys the strength of the difference much better.

The results in this work are statistically significant and substantial.

torginus · 2024-06-30T21:05:42 1719781542

Not sure what this means, but if someone asked me to critique iOS code for example, I wouldn't be much of a help since I don't know the first thing about it , other than some generic best practices.

I'm sure ChatGPT would outperform me, and I could only aid it in very limited ways.

That doesn't mean an expert iOS programmer wouldn't run circles around it.