Why should odd failure modes invalidate the claim of reasoning or intelligence in LLMs? Humans also have odd failure modes, in some ways very similar to LLMs. Normal functioning humans make assumptions, lose track of context, or just outright get things wrong. And then there people with rare neurological disorders like somatoparaphrenia, a disorder where people deny ownership of a limb and will confabulate wild explanations for it when prompted. Humans are prone to the very same kind of wild confabulation from impaired self awareness that plague LLMs.
Rather than a denial of intelligence, to me these failure modes raise the credence that LLMs are really onto something.
The dimension of this issue that never gets air time is that we've made having kids almost completely intentional. The richer a country becomes, the more intentional having kids becomes. The dynamic we see with rich countries is that as having kids becomes more intentional, there's also the increase in reasons why people would choose to delay or forego having kids.
I think you're spot on. And all of the various theories and analysis are pretty laughable if one has any sort of historical context.
- "People don't have kids because they're afraid of climate change" - Wildly overestimates the number of people who figure climate change into their life plans, and it discounts the numerous catastrophes people have feared and experience in the past while continuing to have high birth rates.
- "People don't have kids because everything is too expensive" - My father-in-law has 11 siblings and they grew up in a 2 bedroom, 1 bathroom home. His story is not unique.
"having kids is almost completely intentional"....in countries where this is the case due to birth control, abortion, feminism (and other cultural shifts), the birth rate plummets.
Delving into the reasons why people opt to have fewer or no children when given the choice consistently across races, religions, cultural background, etc would be a book-length endeavor, but to me it really is that simple. There are numerous reasons someone wouldn't want to have more children, and they tend to find one of them when given the choice.
Yes, this absolutely appears to be the main reason. Both in practical terms through birth control, but also through cultural terms in that it's now seen as a choice rather than as an obvious thing you do. To change this course, we probably need to change the culture first so that a birth control ban will be supported. That's currently not looking likely, so population collapse it is
An LLM has an internal linguistic model (i.e. it knows token patterns), and that linguistic model models humans' linguistic models (a stream of tokens) of their actual world models (which involve far, far more than linguistics and tokens, such as logical relations beyond mere semantic relations, sensory representations like imagery and sounds, and, yes, words and concepts).
So LLMs are linguistic (token pattern) models of linguistic models (streams of tokens) describing world models (more than tokens).
It thus does not in fact follow that LLMs model the world (as they are missing everything that is not encoded in non-linguistic semantics).
At this point, anyone claiming that LLMs are "just" language models aren't arguing in good faith. LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Tokens can represent anything, not just words. Roughly the same architecture can generate passable images, music, or even video.
No, its extremely silly to use the incidental name of a thing as an argument for the limits of its relevance. LLMs were designed to model language, but that does not determine the range of their applicability, or even the class of problems they are most suited for. It turns out that LLMs are a general computing architecture. What they were originally designed for is incidental. Any argument that starts off "but they are language models" is specious out of the gate.
Sorry, but using "LLM" when you mean "AI" is a basic failure to understand simple definitions, and also is ignoring the meat of the blog post and much of the discussion here (which is that LLMs are limited by virtue of being only / mostly trained on language).
Everything you are saying is either incoherent because you actually mean "AI" or "transformer", or is just plain wrong, since e.g. not all problems can be solved using e.g. single-channel, recursively-applied transformers, as I mention elsewhere here: https://news.ycombinator.com/item?id=46948612. The design of LLMs absolutely determines the range of their applicability, and the class of problems they are most suited for. This isn't even a controversial take, lots of influencers and certainly most serious researchers recognize the fundamental limitations of the LLM approach to AI.
You literally have no idea what you are talking about and clearly do not read or understand any actual papers where these models are developed, and are just repeating simplistic metaphors from blog posts, and buying into marketing.
In this case this is not so. The primary model is not a model at all, and the surrogate has bias added to it. It's also missing any way to actually check the internal consistency of statements or otherwise combine information from its corpus, so it fails as a world model.
Compiler output can be inconsistent and correct. For any source code there is an infinite number of machine code sequences that maintain the semantic constraints of the source code. Correctness is defined semantically, not by consistency.
There's almost a good point here, but you're misusing concepts that obfuscate the point you're trying to make. Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic. Inference produces scores for every word in their vocabulary. This score map is then sampled from according to the temperature to produce the next token. But this non-determinism is artificially injected.
But the determinism/non-determinism axis isn't the core issue here. The issue is that they are trained by gradient descent which produces instability/unpredictability in its output. I can give it a set of rules and a broad collection of examples in its context window. How often it will correctly apply the supplied rules to the input stream is entirely unpredictable. LLMs are fundamentally unpredictable as a computing paradigm. LLMs training process is stochastic, though I hesitate to call them "fundamentally stochastic".
> Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic.
You cannot formally verifiy prose or the text that LLMs generates when attempting to compare what a compiler does. So even in this sense that is completely false.
No-one can guarrantee that the outputs will be 100% to what the instructions you are giving to the LLM, which is why you do not trust it. As long as it is made up of artificial neurons that predict the next token, it is fundamentally a stochastic model and unpredictable.
One can maliciously craft an input to mess up the network to get the LLM to produce a different output or outright garbage.
Compilers have reproducable builds and formal verification of their functionality. No such thing with LLMs exist. Thus, comparing LLMs to a compiler and suggesting that LLMs are 'fundamentally deterministic' or is even more than a compiler is completely absurd.
You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic. Your points are largely correct but you're not using the right words which just obfuscates your meaning.
Nope. You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic when compared to a 'compiler' and only repeating a known misconception of tweaking the temperature to 0 which does not bring the determinism you claim it brings with LLMs [0] [1] [2], otherwise you would not have this problem in the first place.
By even doing that, the result of the outputs are useless anyway. So this really does not help your point at all. So therefore:
> You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic.
There is nothing deteministic or predictable about an LLM even when you compare it to a compiler, unless you can guarrantee that the individual neurons through inference give a predictable output which would be useful enough for being a drop-in compiler replacement.
> You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic
Its software. Without an external randomness source, its 100% deterministic excluding impacts of hardware glitches. This...isn’t debatable. You can make it seem non-deterministic by concealing inputs (e.g., when batching multiple requests, any given request is “nondeterministic” when viewed in isolation in many frameworks because batches use shared state and aren’t isolated), but even then it is still deterministic you are just choosing to look at an incomplete set of the inputs that determine the output.
> Its software. Without an external randomness source, its 100% deterministic excluding impacts of hardware glitches. This...isn’t debatable.
I don't think anyone would even go as far as to include all deep neural networks which are indeed a large scale collection of neural networks as being "100% deterministic"; regardless of their architecture. Not even you yourself and I can explain transparently why it sometimes works or doesn't work the way it does especially with any inputs. (Which adverse inputs can really mess up the model).
But first of all, the entire sentence that you should be quoting for complete context is:
>> You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic when compared to a 'compiler' and only repeating a known misconception of tweaking the temperature to 0 which does not bring the determinism you claim it brings with LLMs [0] [1] [2], otherwise you would not have this problem in the first place.
So given this "100% determinism" you just said, surely that means that LLMs can replace a traditional compiler which needs this said determinism, since that LLMs are so useful for such a use case in production?
Then, as we practically test this, all of this quickly falls into my secondary point:
>> By even doing that, the result of the outputs are useless anyway. So this really does not help your point at all.
Again, there is just no point with repeating such myths from AI boosters that deep neural networks like LLMs are '100% deterministic'; even with temp=0 tweaks and in the practical sense.
Yes, there's some unknown sources of non-determinism when running production LLM architectures at full capacity. But that's completely irrelevant to the point. The core algorithm is deterministic. And you're still conflating deterministic and predictable. It's strange to have such disregard for the meaning of words and their correct usage.
> Yes, there's some unknown sources of non-determinism when running production LLM architectures at full capacity. But that's completely irrelevant to the point.
It is directly relevant and supports my whole point which just debunked your assertions on LLMs being ‘deterministic’ which doesn’t exist at a fundamental sense which you can’t guarantee that the behaviour and even the outputs will be the same.
> The core algorithm is deterministic. And you're still conflating deterministic and predictable.
The entire LLM is still non-deterministic and it is still considered to be unpredictable even if you take that to account.
> It's strange to have such disregard for the meaning of words and their correct usage.
Nope. Not only you have shown absolutely zero sources at all to prove the deterministic nature of LLMs to where it can function as a “compiler”, you ultimately conceded by agreeing with the linked paper(s) recognising that LLMs still do not have deterministic or predictable properties at all; even if you tweak the temp, parameters, etc.
Therefore, once again LLMs are NOT compilers as even feeding them adversarial inputs can mess up the entire network up to become useless.
>Not only you have shown absolutely zero sources at all to prove the deterministic nature of LLMs to where it can function as a “compiler”
Note that I never defended using LLMs as a compiler. In fact I argued it would be inappropriate. I simply disagreed that the reason is because they are non-deterministic. If you weren't conflating the meaning of deterministic and predictable, you wouldn't keep misreading me.
> As a consequence, the model is no longer deterministic at the sequence-level,
but only at the batch-level
therefore they are deterministic when the batch size is 1
Your second source lists a large number of ways how to make LLMs determnistic. The title of your third source is "Defeating Nondeterminism in LLM Inference" which also means that they can be made deterministic.
Every single one of your sources proves you wrong, so no more sources need to be cited.
> therefore they are deterministic when the batch size is 1
This is like saying: "C++ is 'safe' when you turn off all the default features and if you know what you are doing", but up to the point where it becomes absolutely useless, and it's still not safe.
The language is still fundamentally memory unsafe, just like how LLMs are fundamentally deep neural networks which that comes with downsides such as being unpredictable blackboxes which carry lots of non-determinism of outputs with that.
> Your second source lists a large number of ways how to make LLMs determnistic. The title of your third source is "Defeating Nondeterminism in LLM Inference" which also means that they can be made deterministic.
That is the point: "When", "Can be made", "how to make LLMs determnistic".
It just tells you something about why both papers recognise the problem of non-determinism in LLMs which makes my whole point even more valid which is why I linked those papers.
Those papers have highlighted the fundamental nature of these LLMs right at the start of the paper.
> Every single one of your sources proves you wrong, so no more sources need to be cited.
It doesn't have to be a reference to dualism. We can draw a distinction between specific patterns of brain activity and the body that realizes it. "I" exist only when the characteristic property of neural activity that realizes the self is present. I am the realization of this second-order property. Here the "soul" is this specific pattern of dynamics realized by my body's neurons.
LLMs are extrapolation machines. They have some amount of hardcoded knowledge, and they weave a narrative around this knowledgebase while extrapolating claims that are likely given the memorized training data. This extrapolation can be in the form of logical entailment, high probability guesses or just wild guessing. The training regime doesn't distinguish between different kinds of prediction so it never learns to heavily weigh logical entailment and suppress wild guessing. It turns out that much of the text we produce is highly amenable to extrapolation so LLMs learn to be highly effective at bullshitting.
LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.
The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.
Notice that the Rule 110 string picks out a machine, it is not itself the machine. To get computation out of it, you have to actually do computational work, i.e. compare current state, perform operations to generate subsequent state. This doesn't just automatically happen in some non-physical realm once the string is put to paper.
Rather than a denial of intelligence, to me these failure modes raise the credence that LLMs are really onto something.
reply