GPT4 still suffers from the same limitations I outlined earlier though. For example that being able to explain how to do things is independent of being able to actually do them. That’s a crippling cognitive limitation. This is just not as obvious because for some tasks it’s been trained how to do them through different methods.
Let’s imagine a map of cognitive capabilities. Humans are a big area on that map. Previous AI systems were small dots or lines on that map, some of them like AlphaZero extending outside the human zone. ChatGPT is an archipelago of several decent sized blobs disconnected from each other, and some of those edge out lightly outside the human Zone. It’s better at some specific tasks than humans.
The problem is the sometimes large gaps between some of the blobs. Capacity at some tasks tell you nothing about its ability at what we would think of as closely related tasks for a human. For GPT4 even, these are utterly different tasks and if it can do them both, it can often do them for completely different reasons than a human does.
If you test it at say 10 tasks that all happen to fall within its capabilities, those widely separated blobs of ability, you’d think it was incredibly intelligent at a huge range of tasks, unaware of the gaps. With a human you’d know those areas would be connected. But with GPT they are not. It’s by probing the gaps where it fails that we begin to understand how much and in what ways it fundamentally differs from us.
This map is getting harder for outsiders to probe though, because OpenAI is papering over some gaps with tuned training. This is like adding some new blobs in a different colour. These appear to close some gaps and add new capabilities, but the systems in the model that implement those aren’t related to the features of the model that give it its other abilities.
Let’s imagine a map of cognitive capabilities. Humans are a big area on that map. Previous AI systems were small dots or lines on that map, some of them like AlphaZero extending outside the human zone. ChatGPT is an archipelago of several decent sized blobs disconnected from each other, and some of those edge out lightly outside the human Zone. It’s better at some specific tasks than humans.
The problem is the sometimes large gaps between some of the blobs. Capacity at some tasks tell you nothing about its ability at what we would think of as closely related tasks for a human. For GPT4 even, these are utterly different tasks and if it can do them both, it can often do them for completely different reasons than a human does.
If you test it at say 10 tasks that all happen to fall within its capabilities, those widely separated blobs of ability, you’d think it was incredibly intelligent at a huge range of tasks, unaware of the gaps. With a human you’d know those areas would be connected. But with GPT they are not. It’s by probing the gaps where it fails that we begin to understand how much and in what ways it fundamentally differs from us.
This map is getting harder for outsiders to probe though, because OpenAI is papering over some gaps with tuned training. This is like adding some new blobs in a different colour. These appear to close some gaps and add new capabilities, but the systems in the model that implement those aren’t related to the features of the model that give it its other abilities.