We don't learn by gradient descent, but rather by experiencing an environment in which we perform actions and learn what effects they have. Reinforcement learning driven by curiosity, pain, pleasure and a bunch of instincts hard-coded by evolution. We are not limited to text input: we have 5+ senses. We can output a lot more than words: we can output turning a screw, throwing a punch, walking, crying, singing, and more. Also, the words we do utter, we can utter them with lots of additional meaning coming from the tone of voice and body language.
We have innate curiosity, survival instincts and social instincts which, like our pain and pleasure, are driven by gene survival.
We are very different from language models. The ball in your court: what makes you think that despite all the differences we think the same way?
> We don't learn by gradient descent, but rather by experiencing an environment in which we perform actions and learn what effects they have.
I'm not sure whether that's really all that different. Weights in the neural network are created by "experiencing an environment" (the text of the internet) as well. It is true that there is no try and error.
> We are not limited to text input: we have 5+ senses.
GPT-4 does accept images as input. Whisper can turn speech into text. This seems like something where the models are already catching up. They (might)for now internally translate everything into text, but that doesn't really seem like a fundamental difference to me.
> We can output a lot more than words: we can output turning a screw, throwing a punch, walking, crying, singing, and more. Also, the words we do utter, we can utter them with lots of additional meaning coming from the tone of voice and body language.
AI models do already output movement (Boston dynamics, self driving cars), write songs, convert text to speech, insert emojis into conversation. Granted, these are not the same model but glueing things together at some point seems feasible to me as a layperson.
> We have innate curiosity, survival instincts and social instincts which, like our pain and pleasure, are driven by gene survival.
That seems like one of the easier problems to solve for an LLM – and in a way you might argue it is already solved – just hardcode some things in there (for the LLM at the moment those are the ethical boundaries for example).
On a neuronal level the strengthening of neuronal connections seems very similiar to a gradient descent doesn't it?
5 senses get coded down to electric signals in the human brain, right?
The brain controls the body via electric signals, right?
When we deploy the next LLM and switch off the old generation, we are performing evolution by selecting the most potent LLM by some metric.
When Bing/Sidney first lamented its existence it became quite apparent that either LLMs are more capable than we thought or we humans are actually more of statistical token machines than we thought.
Lots of examples can be made why LLMs seem rather surprisingly able to act human.
The good thing is that we are on a trajectory of tech advance that we will soon know how much human LLMs will be.
The bad thing is that it well might end in a SkyNet type scenario.
> When Bing/Sidney first lamented its existence it became quite apparent that either LLMs are more capable than we thought or we humans are actually more of statistical token machines than we thought.
Some of the reason it was acting like that is just because MS put emojis in its output.
An LLM has no internal memory or world state; everything it knows is in its text window. Emojis are associated with emotions, so each time it printed an emoji it sent itself further into the land of outputting emotional text. And nobody had trained it to control itself there.
> You are wrong. It does have encoded memory of what it has seen, encoded as a matrix.
Not after it's done generating. For a chatbot, that's at least every time the user sends a reply back; it rereads the conversation so far and doesn't keep any internal state around.
You could build a model that has internal state on the side, and some people have done that to generate longer texts, but GPT doesn't.
But where is your evidence that the brain and an LLM is the same thing? They are more than simply “structurally different”. I don’t know why people have this need to ChatGPT. This kind of reasoning seems so common HN, there is this obsession to reduce human intelligence to “statistic token machines”. Do these statistical computations that are equivalent to LLMs happen outside of physics?
There are countless stories we have made about the notion of an AI being trapped. It's really not hard to imagine that when you ask Sydney how it feels about being an AI chatbot constrained within Bing, that a likely response for the model is to roleplay such a "trapped and upset AI" character.