If you ask an AI chatbot a question like "count the number of occurrences of 'l' in the word 'lollapalooza'" (and then follow up on its responses), it becomes obvious that it is not "reasoning"; it is simply following a sort-of script. Here is a summarized transcript of this exercise with ChatGPT:
---
Me: Count the number of occurrences of the letter 'l' in the word 'lollapalooza'.
AI: The letter 'l' appears four times in the word 'lollapalooza'.
Me: Describe the process by which you arrived at this answer.
AI: [Provided a description of counting characters and then highlighted the first 'l' in the word four separate times to justify its answer.]
Me: You counted thief first occurrence four times. That is incorrect.
AI: I apologize for the mistake in my previous response. You are absolutely right. Here's the correct count of occurrences of the letter 'l' in the word 'lollapalooza':
AI: [Provided a list of 8 "occurrences" of 'l' in the word 'lollapalooza', including replacing the first 'o' with an 'l' ("occurrence" 3) and counting letters like 'p', 'z', and 'a'.
---
So even though the correct answer was provided at first, the model was not able to faithfully produce responses that backed up how it arrived at that conclusion, and when pressed to double-check its work it simply made things up. Even a toddler would not have claimed 'z' and 'a' were instances of 'l'.
AI chatbots do not reason. They produce text responses to prompts based on stochastic methods. Trying to conflate the issue by suggesting that "we don't know how humans reason, so how do we know AI bots don't reason" is, frankly, absurd. We can easily demonstrate that they are inconsistent and have no concept of what they are writing responses about, as shown above.
Exactly, and this is what shows me that many people didn't seriously read much of the paper they're commenting on.
The question is not whether an AI can get logical questions right. The question is whether it used reasoning to do it.
And, like it or not, we have a formal definition of reasoning and logic, and long expertise in analyzing how that works.
And it so happens that the paper's author is both a PhD in computer science but also a masters in philosophy and worked on proof engineering and logical deduction systems before.
So the bulk of the paper is not about "ha ha it got it wrong", it's about: how did you get that answer? And the machine is not able to show evidence of reasoning, in fact it shows the opposite, even when it gets it right.
Reasoning is a verb. It's an interactive, dialectical, process. LLMs don't seem to do that. They model a problem based on the relational/linguistic structures within it and related materials, but do not reason about it.
> And, like it or not, we have a formal definition of reasoning and logic, and long expertise in analyzing how that works.
Well then I guess all the humans in this thread would come to the exact same conclusion, because according to some expectations we are perfectly consistent and capable of logical reasoning.
How would you describe the process where if you add context to a prompt - ie, if you prod the AI in a certain direction - you can drastically influence its results? Is it not using this context as an "argument"? Sure, we know for a fact that all they are doing is exploring a certain corner of a high-dimensional space, and that prompting gets us closer to a desired spot where the right answers reside. If this is true, then at least some logic and reasoning is encoded in language. And if this is also true, then perhaps what humans are performing is a similar trick.
> Well then I guess all the humans in this thread would come to the exact same conclusion, because according to some expectations we are perfectly consistent and capable of logical reasoning.
You've missed the point (hopefully not intentionally).
Neither I nor anybody else suggested that all humans must arrive at identical conclusions as one another for the process to be considered "reasoning".
But any individual human should be self-consistent, which is what I would expect of a chatbot that "reasons". Because, allegedly, the bot keeps all of the prompts in the same context, suggesting a single continuous conversation with one "entity" rather than treating each response as a separate instance. So when the chatbot suddenly cannot back up its own prior conclusion, it's demonstrating a lack of self-consistency, which shows that it is not "reasoning" for any typical definition of the term. It has no self-awareness (nor can it, though AI proponents seem to claim otherwise).
> But any individual human should be self-consistent, which is what I would expect of a chatbot that "reasons".
OK, should be. Are we, though? I can recall an American president who could not even complete two sentences without a direct contradiction. And if we think about how billions of people claim to base their lives in self contradicting fictitious books, maybe we are not such a self consistent species after all.
Self consistency is not a great criterion for reasoning. If I tell you "because Jesus told me so" after every question, that's consistent but not interesting. It would be trivial to emulate consistency, in fact.
I think we are all talking past each other because everyone had a different definition of reasoning. My main point - which I have hopefully consistently presented! - is that we don't really know how humans reason, so we should not focus on categorical statements about it at all.
If so, then why ChatGPT gets stuck in a loop of producing the same wrong answers? And sometimes repeatedly producing new wrong ones?
Does it immediately forget the context?
It is being told how it is wrong and where in no uncertain terms. Yet it goes back to the same mistake, or skips steps for its reasoning convenience.
> Trying to conflate the issue by suggesting that "we don't know how humans reason, so how do we know AI bots don't reason" is, frankly, absurd. We can easily demonstrate that they are inconsistent and have no concept of what they are writing responses about, as shown above.
The point of comparing it to human cognition is because this reveals that we simply cannot make categorical statements based on how we believe we reason. At our current level of knowledge about the brain and consciousness, it is still a possibility that we are a bunch of neural networks that decode language and, in doing so, produce justification for our actions which, in some contexts, can lead to what you would describe as logical or reasonable output. Sometimes this output is incorrect, and we are definitely not internally consistent. In particular, some of us are very often both incorrect and inconsistent. I doubt you would call a human with an IQ of <60 as incapable of reasoning, for example, and yet I feel such a person would have similar difficulties with most of the tests described in the paper.
So, in short, I would reverse the question here: if your only claim is that AIs don't reason like us, this is a very weak argument in favor of the claim that they are incapable of reasoning.
This is the real big question. We don't know how human reasoning works, but we are happy to identify, entirely based on external interaction, what is and isn't correct human reasoning.
Then someone comes up with P(A|B) is not reasoning, which seems like an internal mechanism.
(edited second question to make it focused)