Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Bingo. These are very 'human' tasks.

As others have said elsewhere, the issue remains accuracy. I wish every response comes with an accurate estimation of how true the answer is, because at the moment it gives wrong answers as confidently as right ones.




So the thing is, giving wrong answers with confidence is literally what we train students to do when they are unsure.

I can remember my GRE coach telling me that it was better to confidently choose an answer I only had 50% confidence in, rather than punt on the entire question.

AIs hallucinate because, statistically, it is 'rewarding' for them to do so. (In RLHF)


In the context of standardized testing, sure. I don't think I'd try that in a research paper.


This is literally in the context of standardized testing? GPT 'evals'?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: