The crucial difference there is the presence of an external agent intelligent enough to spot that the answer is wrong; humans can do that for themselves. ChatGPT doesn't self-reflect.
Interestingly, many (most?) humans don't self-reflect or correct themselves unless challenged by an external agent as well — which doesn't necessarily have to be another human.
Also of note, GPT-4 seems to show huge improvements so far over GPT-3 when it comes to "thinking out loud" to come to a (better) answer to more complex problems. Kind of a front-loaded reflection of correctness for an overall goal before diving into the implementation weeds — something that definitely helps me (as a human) avoid unnecessary mistakes in the first place.
> Interestingly, many (most?) humans don't self-reflect or correct themselves unless challenged by an external agent as well
Disagree with you here - why do you say this? Maybe we don't apply self-reflection consistently (for example when it comes to political beliefs) but even toddlers know when they haven't achieved the goal they were aiming for. ChatGPT has no clue unless you prod it, because it doesn't know anything - it's stringing words together using probability.