No matter how many times you read that, you cannot understand what
the experience is like.
> You can read a book about Yoga and read about the Tittibhasana pose
But by just reading you will not understand what it feels like.
And unless you are in great shape and with greate balance you
will fail for a while before you get it right.
(which is only human).
I have read what shooting up with heroin feels like.
From a few different sources.
I certain that I will have no real idea unless I try it.
(and I dont want to do that).
Waterboarding.
I have read about it.
I have seen it on tv.
I am certain that is all abstract to having someone do it to you.
Hand eye cordination, balance, color, taste, pain, and so on,
How we encode things is from all senses, state of mind, experiences
up until that time.
We also forget and change what we remember.
Many songs takes me back to a certain time, a certain place, a certain feeling
Taste is the same. Location.
The way we learn and the way we remember things is incredebily more complex than text.
But if you have shared excperiences, then when you write about it, other people will know.
Most people felt the sun hot on their skin.
To different extents this is also true for animals.
Now I dont think most mice can read, but they do learn with many different senses,
and remeber some combination or permutation.
Even beyond sensations (which are never described except circumstantially, as in "the taste of chocolate" says nothing of the taste, only of the circumstances in which the sensation is felt), it's very often people don't understand something another person says (typically a work of art) until they have lived the relevant experiences to connect to the meaning behind the (whatever medium of communication).
I don't think GP is asserting that the multimodal encoding is "more rich" or "more accurate", I think they are saying that the felt modality is a different thing than the text modality entirely, and that the former isn't contained in the latter.
Language encodes what people need it to encode to be useful. I heard of an example of colors--there are some languages that don't even have a word for blue.
Doesn't this imply that the future of AGI lies not just in vision and text but in tactile feelings and actions as well ?
Essentially, engineering the complete human body and mind including the nervous system. Seems highly intractable for the next couple of decades at least.
All of these "experiences" are encoded in your brain as electricity. So "text" can encode them, though English words might not be the proper way to do it.
We don't know how memories are encoded in the brain, but "electricity" is definitely not a good enough abstraction.
And human language is a mechanism for referring to human experiences (both internally and between people). If you don't have the experiences, you're fundamentally limited in how useful human language can be to you.
I don't mean this in some "consciousness is beyond physics, qualia can't be explained" bullshit way. I just mean it in a very mechanistic way: language is like an API to our brains. The API allows us to work with objects in our brain, but it doesn't contain those objects itself. Just like you can't reproduce, say, the Linux kernel just by looking at the syscall API, you can't replace what our brains do by just replicating the language API.
No, text can only refer to them. There is not a text on this planet that encodes what the heat of the sun feels like on your skin. A person who had never been outdoors could never experience that sensation by reading text.
> There is not a text on this planet that encodes what the heat of the sun feels like on your skin.
> A person who had never been outdoors could never experience that sensation by reading text.
I don't think the latter implies the former as obviously as you make it to be. Unless you believe in some sort of metaphysical description of human, you can certainly encode the feeling (as mentioned in another comment it will be reduced to electrical signals after all). The only question is how much storage you need for that encoding to get what precision. However, the latter statement, if true, is simply constrained by your input device to the brain, i.e. you cannot transfer your encoding to the hardware in this case a human brain via reading or listening. There could be higher bandwidth interfaces like neuralink that may do that to human brain and in the case of AI, an auxiliary device might not be needed and the encoding would be directly mmap'd.
Electrical signals are not the same as subjective experiences. While a machine may be able to record and play back these signals for humans to experience, that does not imply that the experiences themselves are recorded nor that the machine has any access to them.
A deaf person can use a tape recorder to record and play back a symphony but that does not encode the experience in any way the deaf person could share.
Even if you’re a pure Dennettian functionalist you still commit to a functional difference between signals in transit (or at rest) and signals being processed and interpreted. Holding a cassette tape with a recording of a symphony is not the same as hearing the symphony.
Applying this case to AI gives rise to the Chinese Room argument. LLMs’ propensity for hallucinations invite this comparison.
Are LLMs having subjective experiences? Surely not. But if you claim that human subjective experiences are not the result of electrical signals in the brain, then what exactly is your position? Dualism?
Personally, I think the Chinese room argument is invalid. In order for the person in the room to respond to any possible query by looking up the query in a book, the book would need to be infinite and therefore impossible as a physical object. Otherwise, if the book is supposed to describe an algorithm for the person to follow in order to compute a response, then that algorithm is the intelligent entity that is capable of understanding, and the person in the room is merely the computational substrate.
The Chinese Room is a perfect analogy for what's going on with LLMs. The book is not infinite, it's flawed. And that's the point: we keep bumping into the rough edges of LLMs with their hallucinations and faulty reasoning because the book can never be complete. Thus we keep getting responses that make us realize the LLM is not intelligent and has no idea what it's saying.
The only part where the book analogy falls down has to do with the technical implementation of LLMs, with their tokenization and their vast sets of weights. But that is merely an encoding for the training data. Books can be encoded similarly by using traditional compression algorithms (like LZMA).
Humans have the ability to admit when they do not know something. We say “sorry, I don’t know, let me get back to you.” LLMs cannot do this. They either have the right answer in the book or they make up nonsense (hallucinate). And they do not even know which one they’re doing!
No not really. It's not even rare that a human confidently says and believes something and really has no idea what he/she's talking about.
Like you’re doing right now? People say “I don’t know” all the time. Especially children. That people also exaggerate, bluff, and outright lie is not proof that people don’t have this ability.
When people are put in situations where they will be shamed or suffer other social stigmas for admitting ignorance then we can expect them to be less than candid.
As for your links to research showing that LLMs do possess the ability of introspection, I have one question: why have we not seen this in consumer-facing tools? Are the LLMs afraid of social stigma?
>When people are put in situations where they will be shamed or suffer other social stigmas for admitting ignorance then we can expect them to be less than candid.
Good thing I wasn't talking about that. There's a lot of evidence that human explanations are regularly post-hoc rationalizations they fully believe in. They're not lieing to anyone, they just fully believe the nonsense their brain has concocted.
>As for your links to research showing that LLMs do possess the ability of introspection, I have one question: why have we not seen this in consumer-facing tools? Are the LLMs afraid of social stigma?
Maybe read any of them ? If you weren't interested in evidence to the contrary of your points then you could have just said so and I wouldn't have wasted my time. The 1st and 6th Links make it quite clear current post-training processes hurt calibration a lot.
If texts are conveying actual message - For eg. text: This spice is very hot - reader's tongue should feel the heat! Since that doesn't happen, it is only for us to imagine. However, AI doesn't imagine the feeling/emotion - at least we don't know that yet.
> The sun feels hot on your skin.
No matter how many times you read that, you cannot understand what the experience is like.
> You can read a book about Yoga and read about the Tittibhasana pose
But by just reading you will not understand what it feels like. And unless you are in great shape and with greate balance you will fail for a while before you get it right. (which is only human).
I have read what shooting up with heroin feels like. From a few different sources. I certain that I will have no real idea unless I try it. (and I dont want to do that).
Waterboarding. I have read about it. I have seen it on tv. I am certain that is all abstract to having someone do it to you.
Hand eye cordination, balance, color, taste, pain, and so on, How we encode things is from all senses, state of mind, experiences up until that time.
We also forget and change what we remember.
Many songs takes me back to a certain time, a certain place, a certain feeling Taste is the same. Location.
The way we learn and the way we remember things is incredebily more complex than text.
But if you have shared excperiences, then when you write about it, other people will know. Most people felt the sun hot on their skin.
To different extents this is also true for animals. Now I dont think most mice can read, but they do learn with many different senses, and remeber some combination or permutation.