Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Those benchmarks are so cynical.

Every test prep tutor taught dozens/hundreds of students the implicit patterns behind the tests and drilled it into them with countless sample questions, raising their scores by hundreds of points. Those students were not getting smarter from that work, they were becoming more familiar with a format and their scores improved by it.

And what do LLM’s do? Exactly that. And what’s in their training data? Countless standardized tests.

These things are absolutely incredible innovations capable of so many things, but the business opportunity is so big that this kind of cynical misrepresentation is rampant. It would be great if we could just stay focused on the things they actually do incredibly well instead of the making them do stage tricks for publicity.




This is what they claim:

We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative—see our technical report for details.


Yes, and none of the tutored students encounter the exact problems they’ll see on their own tests either.

In the language of ML, test prep for students is about sharing the inferred parameters that underly the way test questions are constructed, obviating the need for knowledge or understanding.

Doing well on tests, after this prep, doesn’t demonstrate what the tests purport to measure.

It’s a pretty ugly truth about standardized tests, honestly, and drives some of us to feel pretty uncomfortable with the work. But it’s directly applicable to how LLM’s engage with them as well.


You can always argue that the model has seen some variation of a given problem. The question is if there are problems that are not a variation of something that already exists. How often do you encounter truly novel problems in your life?


I doubt they reliably verified it was minority of problems were seen during training.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: