Hacker News new | past | comments | ask | show | jobs | submit login

Long quote, but I think this is useful context for the argument:

"LLM believers will probably demur: But humans also make mistakes, and surely we’re not prepared to say that humans can’t reason just because they make mistakes? First, it is not accurate to say without qualification that “humans can reason,” certainly not in the sense that we can randomly pluck any person from the street and expect them to reliably perform normatively correct reasoning. Most neurobiologically normal humans have the capacity to become proficient in reasoning, but actually attaining such proficiency takes significant training and discipline. ... But if a human made these mistakes, the ones reported in this article, then I would conclude without any hesitation that they cannot reason. Even if they went on to list a large number of other examples demonstrating impeccable reasoning, I would suspect that other factors (such as rote memorization or cheating) were behind the performance discrepancy. For the mistakes reported here are not performance mistakes, the sort of innocuous errors that humans might make—and promptly correct—when they are careless or tired. If a human made these mistakes, and made them consistently under repeated questioning, that would indicate without doubt that they don’t have the necessary logical competence, that they lack fundamental concepts that are part and parcel of the fabric of reasoning, such as logical entailment and set membership."

So really what this is saying is "GPT-4 makes certain categories of mistakes wrt reasoning that indicate it is in general not doing 'true reasoning' even if it does say the right things to indicate reasoning in other cases". And yeah, if that's the basis of your argument, sure. But how would it be doing rote memorization or "cheating" in the cases it does get things right? A weird notion...

Anyway it feels rather pointless to make this a binary quality. As this article points out, humans (on average) make various reasoning mistakes due to cognitive biases as well. GPT-4 *can* output valid explanations for its reasoning for various questions, but fails to do so correctly in many cases (as shown in this piece), and to me it is more interesting to discuss the implications of this rather than to just establish that fact (which is not news to anyone afaik). This does have a 'Conclusions' section that delves into this a little, but it rather over-general and weak.

Still, this is pretty well written and it is good to have a compilation of examples to demonstrate GPT-4 is still not a "human-like reasoner" for anyone not aware of these models still having such flaws, I suppose.




The paper is not reproducible lol.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: