> Two humans in this thread just read the solution and thought it was correct.
My guess is that they just skim read and missed what ChatGPT actually wrote, it's not that they misunderstood what "vegan wolf" means [1]. On the other hand, you cannot skim read what you are writing yourself, that's not how the mind works.
The gist of the problem here is that, unlike a human, ChatGPT doesn't understand the words it generates, which leads to hilarious results.
As another example, look at the "debugging" of GPT-4's assumptions someone posted in a sibling comment: it "knows" the vegan wolf will eat plant-based food and it "knows" a cabbage is a plant, yet it "thinks" the wolf "will not harm the cabbage"... which is a misunderstanding no human will make (if they know what "vegan" and "cabbage" mean). This doesn't happen in a long chain of reasoning (where a human can
lose the line of thought) but in very short paragraphs, one right after the other! This failure mode requires not understanding the individual assumptions, which prevents GPT from making the connection. I was asked for an error that showed GPT misunderstanding something no person would, and I did.
[1] question for you: did you think the wrong solution was right because you thought a vegan wolf cannot eat the cabbage (let me bet this is NOT what crossed your mind) or because the person who posted it made it look as if it was the right solution and you skim read it without paying attention, assuming "this person said it's right and it's posting it as a rebuttal, so it's likely right" (this is my bet)?
If the latter, this failure mode is not one of misunderstanding what "vegan wolf" means (which is what debugging GPT's process shows), but one of very human laziness/jumping to conclusions. Do note this cannot happen when you write the solution yourself!
Two humans in this thread just read the solution and thought it was correct. Me being one of them.
Another further down in the thread manually wrote up a solution making the exact same mistake.
I think you want things to be different, but they're not. Your answering how you'd think humans would respond, not how people actually respond.
Does it mean everyone made that same mistake? No, but I bet a bunch did.