The rumors that OpenAI deal with MS would give them everything till they got to AGI... A perpetual license to all new development.
All the "Safety people" have left the OpenAi building. Even musk isnt talking about safety any more.
I think the bet was that if you fed an LLM enough, got it big enough it would hit a tipping point, and become AGI, or sentient or sapient. That lines up nicely with the MS terms, and MS's on paper.
I think they figured out that the math doesn't work that way (and never was going to). A prediction of the next token being better isnt intelligence any more than weather prediction will become weather.
The "next token" thing is literally true, but it might turn out to be a red herring, because emergence is a real phenomenon. Like how with enough NAND-gates daisy-chained together you can build any logic function you like.
Gradually, as these LLM next-token predictors are set up recursively, constructively, dynamically, and with the right inputs and feedback loops, the limitations of the fundamental building blocks become less important. Might take a long time, though.
> Like how with enough NAND-gates daisy-chained together you can build any logic function you like.
The version of emergence that AI hypists cling to isn't real, though, in the same way that adding more NAND gates won't magically make the logic function you're thinking about. How you add the NAND gates matters, to such a degree that people who know what they're doing don't even think about the NAND gates.
But isn't that what the training algorithm does? (Genuinely asking since I'm not very familiar with this.) I thought it tries anything, including wrong things, as it gradually finds better results from the right things.
Better results, yes, but that doesn't mean good results. It can only find local optima in a predetermined state space. Training a neural network involves (1) finding the right state space, and (2) choosing a suitable gradient function. If the Correct Solution isn't in the state space, or isn't reachable via gradual improvement, the neural network will never find it.
An algorithm that can reason about the meaning of text probably isn't in the state space of GPT. Thanks to the https://en.wikipedia.org/wiki/Universal_approximation_theore..., we can get something that looks pretty close when interpolating, but that doesn't mean it can extrapolate sensibly. (See https://xkcd.com/2048/, bottom right.) As they say, neural networks "want" to work, but that doesn't mean they can.
That's the hard part of machine learning. Your average algorithm will fail obviously, if you've implemented it wrong. A neural network will just not perform as well as you expect it to (a problem that usually goes away if you stir it enough https://xkcd.com/1838/), without a nice failure that points you at the problem. For example, Evan Miller reckons that there's an off-by-one error in everyone's transformers. https://www.evanmiller.org/attention-is-off-by-one.html
If you add enough redundant dimensions, the global optimum of a real-world gradient function seems to become the local optimum (most of the time), so it's often useful to train a larger model than you theoretically need, then produce a smaller model from that.
> But isn't that what the training algorithm does?
It's true that training and other methods can iteratively trend towards a particular function/result. But in this case the training is on next token prediction which is not the same as training on non-verbal abstract problem solving (for example).
There are many things humans do that are very different from next token prediction, and those things we do all combine together to produce human level intelligence.
> There are many things humans do that are very different from next token prediction, and those things we do all combine together to produce human level intelligence.
Exactly
LLMs didn't resolve knowledge representation problems. We still don't know how it's going in our brains, but at least we know, we may do internal symbolic knowledge representation and reasoning. LLMs don't. We need a kind of different math for ANNs, a new convolution but for text where layers extract features through the lexical analysis and ontology utilisation, and then train the network.
This presupposes that conscious, self-directed intelligence is at all what you're thinking it is, which it might not be (probably isn't). Given that, perhaps no amount of predictors in any arrangement or with any amount of dynamism will ever create an emergent phenomenon of real intelligence.
You say emergence is a real thing, and it is, but we have not one single example of it taking the form of sentience in any human-created thing of complexity.
When my friends talked about how AGI is just creating huge enough neural network & feeding it enough data, I have always compared it to: imagine locking a mouse in a library with all the knowledge in the world & expecting it to come out super intelligent.
The mouse would go mad, because libraries preserve more than just knowledge, they preserve the evolution of it. That evolution is ongoing as we discover more about ourselves and the world we live in, refine our knowledge, disprove old assumptions and theories and, on occasion, admit that we were wrong to dismiss them. Also, over time, we place different levels of importance to knowledge from the past. For example, an old alchemy manual from the middle ages used to record recipes for a cure for some nasty disease was important because it helped whoever had access to it quickly prepare some ointment that sometimes worked, but today we know that most of those recipes were random, non-scientific attempts at coming up with a solution to a medical problem and we have proven that those medicines do not work. Therefore, the importance of the old alchemist's recipe book as a source of scientific truth has gone to zero, but the historic importance of it has grown a lot, because it helps us understand how our knowledge of chemistry and its applications in health care has evolved. LLMs treat all text as equal unless it will be given hints. But those hints are provided by humans, so there is an inherent bias and the best we can hope for is that those hints are correct at the time of training. We are not pursuing AGI, we are pursuing the goal of automating the process of creation of answers that look like they are the right answers to the given question, but without much attention to factual, logical, or contextual correctness.
No. The mouse would just be a mouse. It wouldn't learn anything, because it's a mouse. It might chew on some of the books. Meanwhile, transformers do learn things, so there is obviously more to it than just the quantity of data.
(Why spend a mouse? Just sit a strawberry in a library, and if the hypothesis holds that the quantity of data is the only thing that matters holds, you'll have a super intelligent strawberry)
That's the question though, do they? One way of looking at gen AI is as a highly efficient compression and search. WinRAR doesn't learn, neither does Google - regardless of the volume of input data. Just because the process of feeding more data into gen AI is named "learning" doesn't mean that it's the same process that our brains undergo.
I've yet to see a mouse write even mediocre python, let alone a rap song about life in ancient Athens written in Latin.
Don't get me wrong, organic brains learn from far fewer examples than AI, there's a lot organic brains can do that AI don't (yet), but I don't really find the intellectual capacity of mice to be particularly interesting.
On the other hand, the question of if mice have qualia, that is something I find interesting.
>but I don't really find the intellectual capacity of mice to be particularly interesting.
But you should find their self-direction capacity incredible and their ability to instinctively behave in ways that help them survive and propagate themselves. There isn't a machine or algorithm on earth that can do the same, much less with the same minuscule energy resources that a mouse's brain and nervous system use to achieve all of that.
This isn't to even mention the vast cellular complexity that lets the mouse physically act on all these instructions from its brain and nervous system and continue to do so while self-recharging for up to 3 years and fighting off tiny, lethal external invaders 24/7, among other things it does to stay alive.
> But you should find their self-direction capacity incredible
No, why would I?
Depending on what you mean by self-direction, that's either an evolved trait (with evolution rather than the mouse itself as the intelligence) for the bigger picture what-even-is-good, or it's fairly easy to replicate even for a much simpler AI.
The hard part has been getting them to be able to distinguish between different images, not this kind of thing.
> and their ability to instinctively behave in ways that help them survive and propagate themselves. There isn't a machine or algorithm on earth that can do the same,
> much less with the same minuscule energy resources that a mouse's brain and nervous system use to achieve all of that.
Is nice, but again, this is mixing up the intelligence of the animal with the intelligence of the evolutionary process which created that instance.
I as a human have no knowledge of the evolutionary process which lets me enjoy the flavour of coriander, and my understanding of the Krebs cycle is "something about vitamin C?" rather than anything functional, and while my body knows these things it is unconventionable to claim that my body knowing it means that I know it.
I think you're completely missing the wider picture in your insistence on giving equivalency to the mouse with any modern AI, LLM or machine learning system.
The evolutionary processes behind the mouse being capable of all that are a part of the long distant past, up to the present, and their results are manifest in the physiology and cognitive abilities (such as they are) of the mouse), but this means that these abilities, conscious, instinctive and evolutionary only exist in the physical body of that mouse and nowhere else. No man-made algorithm or machine is capable of anything remotely comparable and its capacity for navigating the world is nowhere near as good. Once again, this especially applies when you consider that the mouse does all it does using absurdly tiny energy resources, far below what any LLM would need for anything similar.
Evolution is an abstract concept, and abstract concepts cannot be “intelligent” (whatever that means). This is like saying that gravity or thermodynamics are “intelligent”.
It doesn't say specifically, but I think these lasted more than a day, assuming you'll accept random predator species as a sufficient proof-of-concept substitute for mice which have to do many similar things but smaller:
Still passes the "a machine that would survive a single day" test, and given machines run off electricity and we have PV already food isn't a big deal here.
> I've yet to see a mouse write even mediocre python, let alone a rap song about life in ancient Athens written in Latin.
Isn't this distinction more about "language" than "intelligence". There are some fantastically intelligent animals, but none of them can do the tasks you mention because they're not built to process human languages.
Prior to LLMs, language was what "proved" humans were "more intelligent" than animals.
But this is besides the point; I have no doubt that if one were to make a mouse immortal and give it 50,000 years experience of reading the internet via a tokeniser that turned it into sensory nerve stimulation and it getting rewards depending on how well it can guess the response, it would probably get this good sooner simply because organic minds seem to be better at learning than AI.
But mice aren't immortal and nobody's actually given one that kind of experience, whereas we can do that for machines.
Machines can do this because they can (in some senses but not all) compensate for the sample-inefficient by being so much faster than organic synapses.
I agree with the general sentiment but want to add: Dogs certainly process human language very well. From anecdotal experience of our dogs:
In terms of spoken language they are limited, but they surprise me all the time with terms they have picked up over the years. They can definitely associate a lot of words correctly (if it interests them) that we didn't train them with at all, just by mere observation.
A LLM associates bytes with other bytes very well. But it has no notion of emotion, real world actions and reactions and so on in relation to those words.
A thing that dogs are often way better than even humans is reading body language and communicating through body language. They are hyper aware of the smallest changes in posture, movement and so on. And they are extremely good at communicating intent or manipulate (in a neutral sense) others with their body language.
This is a huge, complex topic that I don't think we really fully understand, in part because every dog also has individual character traits that influence their way of communicating very much.
Here's an example of how complex their communication is. Just from yesterday:
One of our dogs is for some reason afraid of wind. I've observed how she gets spooked by sudden movements (for example curtains at an open window).
Yesterday it was windy and we went outside (off leash in our yard), she was wary and showed subtle fear and hesitated to move around much. The other dog saw that and then calmly got closer to her, posturing towards the same direction she seemed to go. He made small very steps forward, waited a bit, let her catch up and then she let go of the fear and went sniffing around.
This all happened in a very short amount of time, a few seconds, there is a lot more to the communication that would be difficult and wordy to explain. But since I got more aware of these tiny movements (from head to tail!) I started noticing more and more extremely subtle clues of communication, that can't even be processed in isolation but typically require the full context of all movements, the pacing and so on.
Now think about what the above example all entails. What these dogs have to process, know and feel. The specificity of it, the motivations behind it. How quickly they do that and how subtle their ways of communications are.
Body language is a large part of _human_ language as well. More often than not it gives a lot of context to what we speak or write. How often are statements misunderstood because it is only consumed via text. The tone, rhythm and general body language can make all the difference.
To be fair, it’s the horsepower of a mouse, but all devoted to a single task, so not 100% comparable to the capabilities of a mouse, and language is too distributed to make a good comparison of what milestone is human-like. But it’s indeed surprising how much that little bit of horsepower can do.
I think it depends on how you define intelligence, but _I_ mostly agree with Francois Collet's stance that intelligence is the ability to find novel solutions and adaptability to new challenges. He feels that memorisation is an important facet, but it is not enough for true intelligence ant that these systems excel at type2 thinking but gave huge gaps at type1.
The alternative I'm considering is that It might just be that it's just a dataset problem, feeding these llms on words makes the lack a huge facet of embodied axistance that is needed to get context.
A LLM has to do an accurate simulation of someone critically evaluating their statement in order to predict a next word.
If an LLM can predict the next word without doing a critical evaluation, then it raises the question of what the intelligent people are doing. They might not be doing a critical evaluation at all.
> If an LLM can predict the next word without doing a critical evaluation, then it raises the question of what the intelligent people are doing
Well certainly: in the mind ideas can be connected tentatively by affinity, and they become hypotheses of plausible ideas, but then in the "intelligent" process they are evaluated to see if they are sound (truthful, useful, productive etc.) or not.
Intelligent people perform critical evaluation, others just embrace immature ideas passing by. Some "think about it", some don't (they may be deficient in will or resources - lack of time, of instruments, of discipline etc.).
The poster wrote that prediction of the next token seems like intelligence to him. He was replied that consideration over content is required. You are now stating that it is not proven it will not happen. But the point was that prediction of the next token is not the intelligence sought, and if and when the intelligence sought will happen, that will be a new stage - the current stage we do not call intelligence.
I have some experience with LLM, and they definitely do consider the question. They even seem to do simple logical inference.
They are not _good_ at it right now, and they are totally bad at making generalizations. But who says it's not just an artifact of the limited context?
Look at the google engineer who thought they had an AI locked up in the basement... https://www.theverge.com/2022/6/13/23165535/google-suspends-...
MS paper on sparks of AGI: https://www.microsoft.com/en-us/research/publication/sparks-...
The rumors that OpenAI deal with MS would give them everything till they got to AGI... A perpetual license to all new development.
All the "Safety people" have left the OpenAi building. Even musk isnt talking about safety any more.
I think the bet was that if you fed an LLM enough, got it big enough it would hit a tipping point, and become AGI, or sentient or sapient. That lines up nicely with the MS terms, and MS's on paper.
I think they figured out that the math doesn't work that way (and never was going to). A prediction of the next token being better isnt intelligence any more than weather prediction will become weather.