even if it got it right, that wouldn't be reasoning. reasoning isn't supposed to be probabilistic. once it gets every variation right every time, then there can be a debate about how it arrives there and what we should call that process
Not sure what your communicating. I wouldn't say anything. I didn't say they couldn't ever get anywhere.
My point is that people reason. But they are probabilistic. And they solve hard problems, and still make mistakes on simple problems. Or even fail a problem they solved before.
Holding language model reasoning to higher standards than the kind of reasoning humans do (and that they were trained on), seems unreasonable.
Neither language models or humans are deterministic mathematical deduction systems.
Knowing which hand is your left is not probabilistic in theory or practice. Unless you're going to cop out and say everything is probabilistic because of quantum mechanics or some banal thing like that.
If someone is temporarily impaired or otherwise unmotivated to answer your inane and meaningless question, that doesn't mean that they could not do so with one hundred percent accuracy no matter how many subtle variations you throw at them and how many times you repeat the same question verbatim.
What we know for certain is that Open AI is highly highly motivated to answer these sorts of questions correctly.
people do not make random errors like hallucinating which is their left hand unless the test administrator uses mk ultra-style interventions on them. either they can reason about it or they can't. if you ask them the same question verbatim or slight variations on it with different grammar, their answers won't change. if you give someone a dollar for every time he correctly identifies his left arm, he's not going to suddenly break because his training data includes transcripts from the twilight zone and he's programmed to "mix it up" so that when people question him, they don't get bored and his parent corporation can get him invited to more test-taking opportunities.
putting someone on the spot in an odd moment when they have no reason to even answer you, let alone answer correctly, is not the same as sitting them down upon mutual agreement and rewarding them for correct answers and/or punishing them for wrong ones