To paraphrase what you're saying: an AI can't reason, because it is built to stochastically predicts tokens, which is not reasoning and which is different from the activity of reasoning.
Which I agree with.
But also, by observation, it can (at least some of the time) emit token sequences that emulate reasoning (at least on some simple tasks.)
So perhaps it can reason in the same way a submarine can swim.
So what's happening is this: valid arguments have a structure which can be modelled statistically.
But validity isn't a statistical notion: P(-A|A) is zero.
Since P(Premise|Contradiction) is frequently above zero in LLMs they engage in randomly "irrational" reasoning. That's what makes them particularly unreliable.
The reason for any given P(premise|contradiction) having any given freq is just that "its that freq in the corpus". This is a pretty insane reason, and so incomprehensibly insane that many -- i guess -- cannot fathom how an LLM can appear to reason without being able to.
I suppose there's a sort of truman show effect: the reality of the underlying mechanism is so outside anything most people could analogise to, they fail to be able to see the trick taking place.
People talk about "human failures" but we never ascribe confidences to propositions based on their frequency in a text corpus; and the apparent "Reasoning" which arises out of this is incomprehensible.
That is without some systematic training in applied stats so you can set up the model and reason about it regardless of its outputs -- which are irrelevant to its mechanism
It's true. Validity isn't a statistical notion, and that's why I agree with you that LLMs can't "reason" if we're being precise with language. Just like they can't "know" or "believe" or basically any other verb we use in epistemology.
But the argument is that by dint of repetition, the models actually encode some of the structure of basic logical primitives: syllogisms, entailment, modus ponens/tollens, etc.
This then weights their output so that (for simple enough stuff) they're more likely to emit outputs that are logically sound, than outputs that aren't logically sound. Indeed, this has to be the case, or they couldn't maintain any level of coherency in their output at all (which, like it or not, they can.)
Like you, I'm not comfortable calling this reasoning. But it's also something that is not 100% entirely unlike reasoning, either, at least in terms of output.
The heart of science is distinguishing illusion from reality: from measures of events from models of events. Shadows from what casts them.
The spherical shadow of an object here alas, isnt caused by a spherical object. It's a spikey deformed object whose shadow is spherical with the right prompt. This is easy to show.
Engineers are people who put on light shows. Engineers make the magic lanterns --- theatres of people who believe there's another world in front of them.
Scientists are interested in the quality of the film grain. The spikiness of the suppose sphere. Or, the failure of an LLM's ability to "reason"
Look, of course statistics isn't reasoning because you can't build a proof with statistics (or probabilities).
But, we have to wonder: when people say "reasoning", do they really, really mean, drawing inferences from axioms and theorems using some set of inference rules? Or do they just mean that they can ask a question and get an answer that makes sense, back?
I certainly think it's the latter. People are imprecise when they speak about reasoning, just as they are imprecise when reasoning. Most people who are going to be using LLMs will not be people looking for precise, correct answers derived by sound inference procedures. Those who need precision will seek it elsewhere, where precision can be obtained. The rest will be happy with "reasoning", quote-unquote.
Basically the use case for LLMs reminds me of a couple pieces of work that were published in the past, where people used neural nets to approximate the results of a precise calculation. One team trained a neural net to predict the chaotic motion of three astronomical bodies ("the three body problem"). Another trained a neural net to approximate an automated planner programmed to drive a drone around in a thicket of trees without crashing. In both cases the trained model could obviously only return approximately correct results, and it could only approximate results that had already been calculated classically, but, at least in the second case, if I remember correctly, there was a significant speedup of the operation of the drone, compared with the automated planner- very reasonably so, since the model approximating the planner didn't have to do all the hard work of actually, you know, calculating a plan.
My bottom line is that no matter what you (or I, or the article's author, or anyone else) can say or show about the true capabilities of LLMs to reason, people are totally going to use them _as if_ they could reason, and the job of proving that they can't is going to get all that much harder for that experience, misguided as it may be.
Ultimately the question of whether LLMs can reason is going to be, outside specialist circles, as relevant as "Does Netflix distribute art?". Sure, it doesn't. But people watch it anyway. Most people seemingly don't need art as much, and they don't need reasoning as much, either.
Which is a tragic conclusion, of course. At least some of us are still working on AI whose purpose is to make humans better at thinking, not that takes away their motivation to think.
>an AI can't reason, because it is built to stochastically predicts tokens, which is not reasoning and which is different from the activity of reasoning.
A feature indispensable in the generation of a sequence is recovered in the limit of predicting the sequence. So prediction does not exclude higher level cognitive processes. Transformers being universal sequence2sequence modelers give reason to believe they can reach the limit.
There are an infinite number of models which generate the same exact (infinite) sequence.
No model is "recovered in the limit".
And, more severely, we're blind to the future. So of the infinite models of all of history, we're not even interested in the ones which are maximally retrospectively predictive.
Almost all of those are maximally non-predictive of the future, and indeed, much worse than the ones which fail to predict the past well.
So your 'recovering in the limit' alas is a dangerous kind of pseudoscience: fitting the past.
We want models which are wrong for the right reasons. Not models which are right for the wrong ones.
The latter fail catastrophically.
The models we require enable us to simulate unrealised futures; the causal, abductive models of science (, say).
>There are an infinite number of models which generate the same exact (infinite) sequence.
Only if there is no constraint on the model. But the fixed number of parameters, and the inductive bias of attention limits the space of models that are learnable. In the limit of infinite data but finite capacity for memorization, the only available solution will be to recover the information dynamic of the generating process.
>So of the infinite models of all of history, we're not even interested in the ones which are maximally retrospectively predictive.
Presumably the same processes that operated in the past will operate in the future. So accurately modelling the past is certainly very informative for the future.
> Only if there is no constraint on the model. But the fixed number of parameters
Nope. cf. the under-determination of evidence by theory.
> So accurately modelling the past is certainly very informative for the future.
Nope. cf. we don't have infinite measures on infinite events in the past.
Consider that any given measure of an event, say M(E) is really a measure of a near infinite number of causes, say C1...n. Now, how many independent measures of these do we have? A handful.
so we really have, M(E|Controlling(Handful C1..n)))
do we want to model that?!? No.. that's insane. That's superstition. That's what all of science stands against.
Here's what we do: we build models not fit to any data. We build models that can generate the data we observe, but we build them by familiarity with reality *NOT* with the measures M.
How do we do that? Many ways, but the body is a key component of the answer. We have practically certain causal models of the body.
by iterating through such sensory-motor actions we can find concepts which produce these discrete splits. We call these concepts 'objects'. And from these we assemble models of reality whose shadow is our measures.
We handle the cup, and by handling the cup can imagine the cup, and by imagine the cup can generate measures of the cup.
You cannot reverse from a shadow of a vase to the clay vase itself: there are an infinite number, given abribtarily infinite parameters. The task is to find the right parameterisation; not to suppose that any given one has one solution (it doesnt, so a fool's hope anyway).
Such a condition is just a formalisation of supersition: my model is decided by the data; my parameterisation is 'free'. My model of the world is cooincidence.
Indeed, the truth is, in a sense, only a single parameter.
>Nope. cf. the under-determination of evidence by theory.
You mean the underdetermination of theory by evidence? This isn't really relevant. Given no prior information and no constraints on the model, the theory is underdetermined. But given some strong constraints, there will be one model that best explains the data. Of course a model can't deal with an arbitrary distribution shift, but no one expects a model to be insensitive to a change in the underlying data. The question is whether the constraints of natural systems and an LLM with a given inductive bias and finite capacity are sufficient constraints. It's not totally clear that it is or isn't, but this isn't decided by the underdetermination principle.
>Nope. cf. we don't have infinite measures on infinite events in the past.
I don't see how this is relevant to the point it is in response to.
>Such a condition is just a formalisation of supersition: my model is decided by the data; my parameterisation is 'free'. My model of the world is cooincidence.
Everything is data in the end; your interactions with the world by touch are just data. Our brains have an inductive bias towards certain interpretations of certain data, and these interpretations tend to promote proliferation in organisms. But LLMs have an inductive bias as well, one that allows them to search for circuits that generate training data. LLMs tend to land on interesting models that result in non-trivial generalization abilities in some contexts. This ability goes beyond just frequency modeling. Of course LLMs are limited to what it can know of the world through its data paradigm. But so are we. Such a limit in itself doesn't imply an in principle limit to modeling/understanding in LLMs.
You're projecting statistics onto reality. Reality is a place of necessity, not frequency. Our bodies place of causes not consequences. Our interaction with the world is a causal knowledge.
If everything were P(A|B) knowledge would be impossible. Thankfully we're in the world and we know directly and without inference: we move our hands according to a technique and they so-move. Absent this direct, immediate, causal, certain knowledge of our own bodies --- there is no way of knowing anything.
All knowledge is a recursion from the certain causal mechanism of the body: hands to tools, tools to models, models to data; from data to refinement of models.
Otherwise all "knowledge" would be LLM-like, based merely on pre-occurrent patterns of data. There would be no imagination, no possibility, no necessity, no reason.. indeed, no knowledge.
The world painted by P(A|B,| stupid limited dumb measures of the past) is a dark, dangerous and fickle one.
You should not wish to live there. It makes no sense, nor could it. Thankfully, you're able to walk around; imagine things that have never been; actually learn; grow (organically); adapt (physiologicall); develop skills (sensory-motorily). So you arent so severly disabled you're reduced to replies which are maximially consistent with what happens to be in alll ebooks ever written by people who could write them for the first time.
You arent so disabled that the very basis of writing is precluded: direct access to the world.
you need to elaborate on your theory of knowledge/concept formation as I have not seen it explained anywhere else in this manner. particularly: "Absent this direct, immediate, causal, certain knowledge of our own bodies --- there is no way of knowing anything.". and "All knowledge is a recursion from the certain causal mechanism of the body: hands to tools, tools to models, models to data; from data to refinement of models." and this: "Reality is a place of necessity, not frequency. Our bodies place of causes not consequences. Our interaction with the world is a causal knowledge."
may i enquire on the sources for the above ? spinoza ?
let me know if you have any longer writeups on it, and if not i would urge you to write.
thanks ! that the world could be modal is a revelation to me, as i viewed it as deterministic unfolding interrelated events - the links, and material are nice for me to sit back and reflect.
using chatgpt as the philosopher's assistant is a nice touch.
can i ask a follow up question: what sort of ethics derive naturally out of the above ?
I don't think modality and determinism are in tension -- this is a misunderstanding behind the free will debate.
Imv, it is literally true that "you could have done otherwise" without there being a violation of determinism.
How? Determinism, in this narrow sense at least, is about how events in the physical world relate across time, ie., that necessarily P(later|earlier) = 1. These events are to be thought of as infinitely precise states of the maximally basic stuff of reality, ie., all the info that possibly exists.
But these arent mechanisms, these are states. As soon as you describe relationships between states, ie., causal mechanisms, you're talking about what would happen if the universe were in some state that it may never enter.
I take it to be a basic property of reality that these mechanisms are (at least as) basic as these states. Eg., that the motion "of the most basic stuff" is as fundamental as "where that stuff is".
So, eg., suppose there's a basic atom A and basic atom B and they move this way: A repels A, A attracts B, B repels AB, B attracts BA, etc.....
Now this behaviour is a basic part of their existence: were there to be a universe of AAABBB, then "this would happen"; if BBBBBAAA then "something else would happen".
The "initial conditions" of the universe, ie., it's state prevents its mechanisms from ever entering certain states. But those states are possible given those mechanisms. It's in part what "mechanism" means that it is possible to enter more states than just which ones happen to occur.
So, on free will, what does it mean to say "I could have done otherwise" -- it means that the relevant causal mechanisms make genuinely possible many states. (But the actual initial conditions precluded observing more than one).
Or as a layman would put it: you would have been kind were you a different person; so, your cutely was determined by the kind of person you are. It is because of who you are (state) that what you did (free causal mechanism) was cruel (particular state obtained by operation of causal mechanism).
This may make it clear what people mean when they say, "well your brother wasn't cruel!" as if that mattered. Well: it does matter! It shows that the causal mechanisms we call, "people acting in the world" are so wide open (free) that kindness is possible.
Thus we do have free will. We are free insofar as we are in motion: our possible behaviour is much greater than our actuality. And we are determined: we, by bad luck, arent in a world were our better behaviours are realised.
It is a fundamental property of those particles above (A, B) that were they alone they wouldnt move. You cannot eliminate that property for talk of what they happened to do in the actual world. In the actual world they are really freer than can ever be observed directly.
But it is trivial to observe this indirectly: we can pull A far way from B and see what happens (etc.).
Likewise when people say "You (qua causal agent) could have done otherwise, and you didn't, so you're guilty!" there is no error here at all.
You were guilty precisely because your actions were not accidental, were not indeterminate or random. Your actions were determined by your state. And we judge that state to be one of guilt for a crime: possessing some intent and means to kill, say.
It is in this way determinism is required for free will; and required for a modal universe. If the motions of particles were indeterminate they wouldnt be causal.
Which I agree with.
But also, by observation, it can (at least some of the time) emit token sequences that emulate reasoning (at least on some simple tasks.)
So perhaps it can reason in the same way a submarine can swim.