Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ugh that testing graph confirms that AP Environmental Science was indeed the easiest AP class and I needn't be proud of passing that exam.


This right here. This is the goalposts shifting

Obviously your comment is somewhat tongue and cheek, but your claim that a benchmark for human pride ("I needn't be proud of passing that exam") is no longer relevant because a machine can do it - or maybe a better way to say it was, "This computer proved what I already assumed"

It's so interesting to see it happen in real time


Yeah, I didn't even think of it like that but good point. To me its not even that a machine can do the thing, GPT-4 crushing it across all spectrums resets my baseline, but GPT-3.5 having such variation and excelling at that specific thing was what made my ears perk up.


I think it's more that the exam was shown to be the easiest of all the exams


I am interested that GPT4 botched AP Lang and Comp and AP English Lit and Comp just as badly as GPT3.5, with a failing grade of 2/5 (and many colleges also consider a 3 on those exams a failure). Is it because of gaps in the training data or something else? Why does it struggle so hard with those specific tests? Especially since it seems to do fine at the SAT writing section.


> Ugh that testing graph confirms that AP Environmental Science was indeed the easiest AP class

No, it just indicates that it was the one whose subject matter was best covered by GPT-3.5’s training data.


Do we know what the training data was?


it got a 4 or 5 on every ap test except the english ones for what it's worth. Even the calculus ones which surprised me since past LLMs have been bad at math.


This strikes me as kind of ironic -- you'd think a language model would do better on questions like essay prompts and multiple choice reading comprehension questions regarding passages than it would in calculations. I wonder if there are more details about these benchmarks somewhere, so we can see what's actually happening in these cases.


I don't find it ironic, because a language model is (currently?) the wrong tool for the job. When you are asked to write an essay, the essay itself is a byproduct. Of course it should be factually and grammatically correct, but that's not the point. The real task is forming a coherent argument and expressing it clearly. And ideally also making it interesting and convincing.


I guess my reference was to the 3.5 version since that one had much more variation in test scores across all the AP exams. But yes, 4 seems to have made mince meat of them all!


Funny you claim this, because the AP Environmental Science pass rate is really low compared to other APs, at least it was when I took it. Maybe it's because the quality of the avg test taker was lower, but I'm not especially convinced that this is the case.


I had no idea! My assessment was based on other students at the time expressing that it was an easy test and also myself passing after a semester of goofing off.


[sarcasm]

Cause there was only one correct answer for every question: "97% of scientists agree ..."

[/sarcasm]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: