Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

it got a 4 or 5 on every ap test except the english ones for what it's worth. Even the calculus ones which surprised me since past LLMs have been bad at math.



This strikes me as kind of ironic -- you'd think a language model would do better on questions like essay prompts and multiple choice reading comprehension questions regarding passages than it would in calculations. I wonder if there are more details about these benchmarks somewhere, so we can see what's actually happening in these cases.


I don't find it ironic, because a language model is (currently?) the wrong tool for the job. When you are asked to write an essay, the essay itself is a byproduct. Of course it should be factually and grammatically correct, but that's not the point. The real task is forming a coherent argument and expressing it clearly. And ideally also making it interesting and convincing.


I guess my reference was to the 3.5 version since that one had much more variation in test scores across all the AP exams. But yes, 4 seems to have made mince meat of them all!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: