Ugh that testing graph confirms that AP Environmental Science was indeed the eas...

AndrewKemendo · on March 14, 2023

This right here. This is the goalposts shifting

Obviously your comment is somewhat tongue and cheek, but your claim that a benchmark for human pride ("I needn't be proud of passing that exam") is no longer relevant because a machine can do it - or maybe a better way to say it was, "This computer proved what I already assumed"

It's so interesting to see it happen in real time

mym1990 · on March 14, 2023

Yeah, I didn't even think of it like that but good point. To me its not even that a machine can do the thing, GPT-4 crushing it across all spectrums resets my baseline, but GPT-3.5 having such variation and excelling at that specific thing was what made my ears perk up.

adammarples · on March 14, 2023

I think it's more that the exam was shown to be the easiest of all the exams

mustacheemperor · on March 14, 2023

I am interested that GPT4 botched AP Lang and Comp and AP English Lit and Comp just as badly as GPT3.5, with a failing grade of 2/5 (and many colleges also consider a 3 on those exams a failure). Is it because of gaps in the training data or something else? Why does it struggle so hard with those specific tests? Especially since it seems to do fine at the SAT writing section.

dragonwriter · on March 14, 2023

> Ugh that testing graph confirms that AP Environmental Science was indeed the easiest AP class

No, it just indicates that it was the one whose subject matter was best covered by GPT-3.5’s training data.

mym1990 · on March 14, 2023

Do we know what the training data was?

HDThoreaun · on March 14, 2023

it got a 4 or 5 on every ap test except the english ones for what it's worth. Even the calculus ones which surprised me since past LLMs have been bad at math.

Syntheticate · on March 14, 2023

This strikes me as kind of ironic -- you'd think a language model would do better on questions like essay prompts and multiple choice reading comprehension questions regarding passages than it would in calculations. I wonder if there are more details about these benchmarks somewhere, so we can see what's actually happening in these cases.

jltsiren · on March 14, 2023

I don't find it ironic, because a language model is (currently?) the wrong tool for the job. When you are asked to write an essay, the essay itself is a byproduct. Of course it should be factually and grammatically correct, but that's not the point. The real task is forming a coherent argument and expressing it clearly. And ideally also making it interesting and convincing.

mym1990 · on March 14, 2023

I guess my reference was to the 3.5 version since that one had much more variation in test scores across all the AP exams. But yes, 4 seems to have made mince meat of them all!

Der_Einzige · on March 14, 2023

Funny you claim this, because the AP Environmental Science pass rate is really low compared to other APs, at least it was when I took it. Maybe it's because the quality of the avg test taker was lower, but I'm not especially convinced that this is the case.

mym1990 · on March 14, 2023

I had no idea! My assessment was based on other students at the time expressing that it was an easy test and also myself passing after a semester of goofing off.

FrojoS · on March 15, 2023

[sarcasm]

Cause there was only one correct answer for every question: "97% of scientists agree ..."

[/sarcasm]