Those benchmarks are so cynical. Every test prep tutor taught dozens/hundreds of...

gabipurcaru · on March 14, 2023

This is what they claim:

We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative—see our technical report for details.

swatcoder · on March 14, 2023

Yes, and none of the tutored students encounter the exact problems they’ll see on their own tests either.

In the language of ML, test prep for students is about sharing the inferred parameters that underly the way test questions are constructed, obviating the need for knowledge or understanding.

Doing well on tests, after this prep, doesn’t demonstrate what the tests purport to measure.

It’s a pretty ugly truth about standardized tests, honestly, and drives some of us to feel pretty uncomfortable with the work. But it’s directly applicable to how LLM’s engage with them as well.

Raphaellll · on March 14, 2023

You can always argue that the model has seen some variation of a given problem. The question is if there are problems that are not a variation of something that already exists. How often do you encounter truly novel problems in your life?

riku_iki · on March 14, 2023

I doubt they reliably verified it was minority of problems were seen during training.