it got a 4 or 5 on every ap test except the english ones for what it's worth. Ev...

Syntheticate · on March 14, 2023

This strikes me as kind of ironic -- you'd think a language model would do better on questions like essay prompts and multiple choice reading comprehension questions regarding passages than it would in calculations. I wonder if there are more details about these benchmarks somewhere, so we can see what's actually happening in these cases.

jltsiren · on March 14, 2023

I don't find it ironic, because a language model is (currently?) the wrong tool for the job. When you are asked to write an essay, the essay itself is a byproduct. Of course it should be factually and grammatically correct, but that's not the point. The real task is forming a coherent argument and expressing it clearly. And ideally also making it interesting and convincing.

mym1990 · on March 14, 2023

I guess my reference was to the 3.5 version since that one had much more variation in test scores across all the AP exams. But yes, 4 seems to have made mince meat of them all!