I think it’s undeniable that LLMs encode knowledge, but the way they do so and what their answers imply, compared to what the same answer from a human would imply, are completely different.
For example if a human explains the process for solving a mathematical problem, we know that person knows how to solve that problem. That’s not necessarily true of an LLM. They can give such explanations because they have been trained on many texts explaining those procedures, therefore they can generate texts of that form. However texts containing an actual mathematical problem and the workings for solving it are a completely different class of text for an LLM. The probabilistic token weightings for the maths text explanation don’t help at all.
So yes these are fascinating, knowledgeable and even in some ways very intelligent systems. However it a radically different form of intelligence from us, in ways we find difficult to reason about.
Well it's like birds and airplanes. Do airplanes "fly" in the same sense that birds do? Of course not, birds flap their wings and airplanes need to be built, fueled and flown by humans. You could argue that the way birds fly is "more natural" or superior in some ways but I've yet to see a bird fly Mach 3.
If you replace the analogy with humans and LLMs, LLMs won't ever reason or understand things in the same way we do, but if/when their output gets much smarter than us across the board, will it really matter?
I think the issue is there are good reasons to think LLMs architected and trained the way they are now can never approach human reasoning capability. That’s because the corpus of human written material is simply grossly inadequate to communicate or encode the knowledge necessary for that.
Our written material assumes huge swathes of contextual knowledge, real world experience, and human lived experience that LLMs don’t and can’t have. At least architected and trained as they are now.
Thats on top of the crippling inability LLMs have to generalise an ability to perform a task from the ability to generate a description of how to do the task. Plus many other similar limitations that would be inexplicable if displayed by a human.
Of course LLMs aren’t the final word in AI development. I think they’re a vitally important step towards general AI, and we’ll get there eventually as we develop ever more capable architectures.
> LLMs architected and trained the way they are now can never approach human reasoning capability
Not sure if you’ve played with GPT-4 but honestly it’s getting there. If you take the bar exam, ChatGPT was in the bottom 10% of participants, GPT-4 is in the top 90%.
It obviously isn’t the ultimate test of reasoning/intelligence but I think we would agree that a human who’s in the top 90% is likely to be pretty smart.
> Of course LLMs aren’t the final word in AI development
Couldn’t agree more. AGI will come from plugging a few of these systems together.
GPT4 still suffers from the same limitations I outlined earlier though. For example that being able to explain how to do things is independent of being able to actually do them. That’s a crippling cognitive limitation. This is just not as obvious because for some tasks it’s been trained how to do them through different methods.
Let’s imagine a map of cognitive capabilities. Humans are a big area on that map. Previous AI systems were small dots or lines on that map, some of them like AlphaZero extending outside the human zone. ChatGPT is an archipelago of several decent sized blobs disconnected from each other, and some of those edge out lightly outside the human Zone. It’s better at some specific tasks than humans.
The problem is the sometimes large gaps between some of the blobs. Capacity at some tasks tell you nothing about its ability at what we would think of as closely related tasks for a human. For GPT4 even, these are utterly different tasks and if it can do them both, it can often do them for completely different reasons than a human does.
If you test it at say 10 tasks that all happen to fall within its capabilities, those widely separated blobs of ability, you’d think it was incredibly intelligent at a huge range of tasks, unaware of the gaps. With a human you’d know those areas would be connected. But with GPT they are not. It’s by probing the gaps where it fails that we begin to understand how much and in what ways it fundamentally differs from us.
This map is getting harder for outsiders to probe though, because OpenAI is papering over some gaps with tuned training. This is like adding some new blobs in a different colour. These appear to close some gaps and add new capabilities, but the systems in the model that implement those aren’t related to the features of the model that give it its other abilities.
Yes I know, as I said they are very knowledgeable and in some ways very intelligent. We just need to bear in mind their processing architecture is radically different from our. This makes our intuitions about their abilities highly error prone.
Absolutely. The shoggoth metaphor is extremely apt here.
What I was specifically responding to is the claim that they can only solve certain kinds of problems because those kinds of problems (and their solutions) were in the training set. By now there's plenty of counter-examples of unique problems that are nevertheless solved. At which point I think we do have to call it "understanding" and "reasoning", even as we acknowledge that it is a very alien form of understanding and reasoning that we just barely managed to squeeze into something that kinda sorta feels humanish.
simonh says >"We just need to bear in mind their processing architecture is radically different from ours."<
The hardware architectures are certainly different but there is a possibility that at least parts of the "software" architectures may be remarkably similar.
For example if a human explains the process for solving a mathematical problem, we know that person knows how to solve that problem. That’s not necessarily true of an LLM. They can give such explanations because they have been trained on many texts explaining those procedures, therefore they can generate texts of that form. However texts containing an actual mathematical problem and the workings for solving it are a completely different class of text for an LLM. The probabilistic token weightings for the maths text explanation don’t help at all. So yes these are fascinating, knowledgeable and even in some ways very intelligent systems. However it a radically different form of intelligence from us, in ways we find difficult to reason about.