That's remarkably dismissive. He addresses your argument right away in the paper...

falcolas · on Aug 8, 2023

> Instead it just produces "answers" which are a statistical guess based on other things it has seen on the internet.

It boggles my mind that folks expect otherwise from a Machine Learning tool, no matter how advanced and stuffed with data it may be. Perhaps it's the same phenomenon that causes us humans to see faces in clouds, smiles on dogs, and Jesus' likeness on toast?

cratermoon · on Aug 8, 2023

Somewhere I definitely read about how human psychology makes us prone to that sort of thing. Even as for back as Eliza, cognitive scientists were commenting on how our thinking can be fooled.

cmrdporcupine · on Aug 8, 2023

I think there's an ideological bias in our culture that pushes people to believe that intelligent or structured phenomenon inevitably emerge organically and progressively from complex phenomena.

Teleological thinking -- a kind of imagining of purpose and cause from chaotic/natural events and entities -- riddles popular thinking, especially from people in our profession. Science fiction is especially full of it.

It's not just restricted to this domain at all. IMHO similar bias underlies thinking around economics and the magical hand of the free market economy.

Its also a bias evident in the way some people talk about nature, gardening, etc. E.g. permaculture / natural farming people show it all the time.

nomel · on Aug 8, 2023

> I think there's an ideological bias in our culture that pushes people to believe that intelligent or structured phenomenon inevitably emerge organically and progressively from complex phenomena.

All science points to this being the case, for us. I think the only ones opposed are those that believe in young earth creationism, and only some portion of those that believe in old earth creationism.

freejazz · on Aug 8, 2023

Why would it boggle your mind? This thread is full of AI proponents insisting that GPT reasons

roywiggins · on Aug 8, 2023

It does state: "The ability to perform basic arithmetic is a necessary ingredient for reasoning." which doesn't seem obvious to me at all.

cmrdporcupine · on Aug 8, 2023

I read that as perform not get the right answer.

Performing arithmetic is one kind of reasoning process. But getting the answers right is not necessarily the same as performing.

If you go on to read, what he's trying to test is the system's ability to even attempt to plan out a problem solving "route". Which it doesn't really do. If it could, it could defer to another system (fancy calculators or solvers) to do the work. But its lack of ability to reason means it can't even be made to do that.

(EDIT: I do think the paper would be stronger if he put the math and formal logic etc problems later. E.g. the problem he puts forward in 3.14, 3.15 etc is more immediately damning as it reflects the "kind" of daily life reasoning that people would expect these systems to be able to perform.)

akomtu · on Aug 8, 2023

A langugage model sees a pile of examples with digits and imitates those examples. A reasoning model sees the inner principle behind this pile, and instead of imitating examples, it uses the learnt principle to produce answers.

antonvs · on Aug 8, 2023

How do you know this? What's an example of a "reasoning model"?

If the only example is the human mind, for all we know our reasoning capability and ability to discern principles could work much the same way, and it's just some more subtle differences that lead to the differences in capabilities. There are plenty of cases where it appears as though GPT has discerned the "inner principle" behind something to produce answers.

niam · on Aug 8, 2023

Language models aren't really optimized for imitation though, they're optimized to predict. One means of prediction, which models have found to be effective in many contexts (especially when short on training time/compute), is comparable to imitation.

But this isn't to say that language models are incapable of establishing "inner principles".

og_kalu · on Aug 8, 2023

This paper is not even reproducible lol. It makes a nonsensical claim it can't even back with results. Look at multiple comments here actually trying them out.

kordlessagain · on Aug 8, 2023

> GPT-4 Can't Reason

This is absolutely dismissive of the claim to an advanced LLM being capable of "reasoning", or the action of thinking about something in a logical, sensible way.

That is the sum of the paper. Further, the author even goes on to say that if they asked a human these questions, they would conclude the same:

> Of course, even sophisticated human reasoners make mistakes, just like trained singers can hit false notes. But if a human made these mistakes, the ones reported in this article, then I would conclude without any hesitation that they cannot reason. Even if they went on to list a large number of other examples demonstrating impeccable reasoning, I would suspect that other factors (such as rote memorization or cheating) were behind the performance discrepancy.

So the author admits their own biases, which are used to bolster the argument that, if reasoning appears to be lacking in an answer, the system or entity itself is absolutely incapable of any reasoning and something else must explain why it appears to be reasoning in the first place. That's a VERY convenient way of dismissing any evidence that counters the claim.

> The problem is not that GPT-4 can't to the math problems

The problem is the system was not allowed or provided a path to generate a means to arrive at answering the math problem using a language that is better suited to answering analytical questions: code. That the author "denied" the LLM the ability to write code is the issue here, not the model's interface limitations. An analogy would be that if a user is using English and asks a question that requires using Pali, that the LLM would be "prevented" from answering in Pali unless the user said it could understand it. In the same vein, it doesn't make sense to, by default, output Python if the system is unsure if the user understands or knows how to run Python or not.

If you say "I understand Python. Select two random numbers between 1381 and 1453 and multiply them together, reporting the result." the LLM will be capable of answering this question by generating code to solve the problem. This is likely to work every single time any type of question like this is asked, but it does require the user "run" the code.

GPT-4 has the ability to do this with code interpreter, so the question is formed "why did OpenAI choose to allow the user to explicitly indicate code can be written?" The answer likely lies in understanding not everyone can interpret or understand Python, a coding language, and therefore it remains an OPTION for the user to choose first. By not allowing the LLM to show the answers to analytical questions in code, the author "blocks" the LLM's ability to show off reasoning. And by stating that failures constitute a "proving" of the non-reasoning, the author gets what they want.

From a scientific standpoint, a good hypothesis must be formed that can be disproven, as related to reasoning ability. If any experiment is run that is based on a hypothesis that is absolute (this thing can't reason) then the results are not scientific, but instead opinion.

sdenton4 · on Aug 8, 2023

What can I say, we've seen a piiiile of lazy dismissals of LLM work based on examples from arithmetic and string manipulation. They aren't novel or interesting.