I saw an LLM having this kind of problem when I was doing some testing a ways back. I asked it to order three fruits from largest to smallest. I think it was orange, blueberry and grapefruit. It could do that easily with a simple prompt. When the prompting included something to the effect of “think step by step”, it would try to talk through the problem and it would usually get it wrong.
How much does this align with how we learn math? We kind of instinctively learn the answers to simple math questions. We can even at some point develop an intuition for things like integrating and differentials. But the moment we are asked to explain why, or worse provide a proof, things become a lot harder. Even though the initial answer may be correct.
I definitely don’t learn math by means of gradient descents.
We can possibly say math is not learned, but a mental models of abstractions are developed. How? We dunno, but what we do know is we don’t learn by figuring the common features between all previously seen equations only to guess them later…
Mind operates on higher and higher levels of abstractions building on each other in a much fascinating way, very often not with words, but with structure and images.
Of course there are people with aphantasia, but i really fail to see how any reasoning happens in purely language level. Someone on this forum also noted - in order to reason one needs an ontology to facilitate the reasoning process. LLMs don’t do ontologies…
And finally, not least though, LLM and ML people in general seem to equate intuition to some sort biased.random(). Well intuition is not random, and is hard to describe in words. So are awe and inspiration. And these ARE part of (precondition to, fuel for) humanity’s thought process more that we like to admit.
The fact it (is suggested / we are led to believe / was recently imlied ) the neurons can be explained to be doing something like it on the underlying layer still says little about the process of forming ontological context needed for any kind of syllogism.
Humans learn skills like basic mathematics by reasoning about their environment and building internal models of problems they’re trying to solve. LLMs do not reason and they cannot model their environment.
>It's not thinking
>it compressed the internet into a clever, lossy format with nice interface and it retrieves stuff from there.
Humans do both, why can't LLM's?
>Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.
More like pulling out a deep-fried meme, looking for context, then searching google images until you find the most "original" JPG representation with the least amount of artifacts.
There is more data to add confidently, it just has to re-think about it with a renewed perspective, and an abstracted-away higher-level context/attention mechanism.
> Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.
Empirically speaking, I have a set of evals with an objective pass/fail result and a prompt. I'm doing codegen, so I'm using syntax linting, tests passing, etc. to determine success. With chain-of-thought included in the prompting, the evals pass at a significantly higher rate. A lot of research has been done demonstrating the same in various domains.
If chain-of-thought can't improve quality, how do you explain the empirical results which appear to contradict you?
The paper is interesting because CoT has been so widely demonstrated as effective. The point is that it "can" hurt performance on a subset of tasks, not that CoT doesn't work at all.
It's literally in the second line of the abstract: "While CoT has been shown to improve performance across many tasks..."
I have no idea how accurate it actually is, But I've had the process used by LLM described as the following: "Think of if like a form of UV Mapping, applied to language constructs rather than 3D points in space, and the limitations and approximations you experience are similar to those emerging when having to project a 2D image over a 3D surface."
These kind of reductive thought-terminating cliches are not helpful. You are using a tautology (it doesn't think because it is retrieving data and retrieving data is not thinking) without addressing the why (why does this preclude thinking) or the how (is it doing anything else to generate results).
There is nothing in the LLM that would have the capability to create new information by reasoning, when the existing information does not already include what we need.
There is logic and useful thought in the comment, but you choose not to see it because you disagree with the conclusion. That is not useful.
It would be interesting to think about how it got it wrong. My hunch is that in the "think step by step" section it made an early and incorrect conclusion (maybe even a subtly inferred conclusion) and LLMs are terrible at walking back mistakes so it made an internally consistent conclusion that was incorrect.
A lot of CoT to me is just slowing the LLM down and keeping it from making that premature conclusion... but it can backfire when it then accidentally makes a conclusion early on, often in a worse context than it would use without the CoT.
I always found it interesting how sorting problems can get different results when you add additional qualifiers like colors or smells or locations, etc.
Natively, I understand these to influence the probability space enough to weaken the emergence patterns we frequently overestimate.
The model is likely to had already seen the exact phrase from its last iteration. Adding variation generalizes the inference away from over-trained quotes.
Every model has the model before it, and it's academic papers, in it's training data.
Changing the qualifiers pulls the inference far away from quoting over-trained data, and back to generalization.
I am sure it has picked up on this mesa-optimization along the way, especially if I can summarize it.
Wonder why it hasn't been more generally intelligent, yet.
I'll rank those three fruits from largest to smallest:
1. Grapefruit
2. Orange
3. Blueberry
The grapefruit is definitely the largest of these three fruits - they're typically around 4-6 inches in diameter. Oranges are usually 2-3 inches in diameter, and blueberries are the smallest at roughly 0.5 inches in diameter.