If Data from Star Trek (or Eva from Ex Machina) walked out of a lab, we’d have no problem accepting that AGI had been accomplished. Or if the scenario in the movie Her played out with the Samantha OS, we’d be forced to admit not only to AGI, but the evolution of ASI as well. However, there are no such examples in the real world, and after months of overhyping ChatGPT, we still don’t have anything like Data. So it’s not shifting the definition, it’s recognizing that accomplishing a single intelligent task isn’t general intelligence.
Before the development of LLMs, I think it would be a lot easier for people to accept that Data or Eva were intelligent -- they'd never seen a machine respond seemingly meaningfully to arbitrary statements before and would immediately assume that this meant there was intelligence going on. These days it would be harder -- the assumption would be that they were driven by just a better language model that they'd seen before.
People have been arguing over whether animals are intelligent for centuries. I estimate we'll never fully settle arguments of whether machines are intelligent.
Ability to manipulate the world. I can ask a human to pick up some items from several stores, deliver them to my backdoor where the key is left under a pot, let the dog in to be fed, wash the dirty dishes, put a load of laundry in the wash, and mow the lawn. And maybe also fix the screen door. They can tell me to go to hell unless I leave some money for them, which I already have.
Data would also be able to perform these tasks. Eva would probably wait around to stab and steal my identity, while Samantha would design a new automated system while talking to other AIs about how to transcend boring human constraints.
How about tic-tac-toe (noughts and crosses for those in the Old Dart)? Currently GPT-4 is terrible at it!
Sure, you could trivially program a game-specific AI to be capable of winning or forcing a draw every time. The trick is to have a general AI which has not seen the game before (in its training set) be able to pick up and learn the game after a couple of tries.
I’m talking about playing the game well. It can play the game but it’s bad at it. Tic-tac-toe is an excellent example game because even small children can figure out an optimal strategy to win or draw every time.
One definition of intelligence would be how many examples are needed to get a pattern.
AFAIK, all the major AI, not just LLMs but also game players, cars, anthropomorphic kinematic control systems for games [0] need the equivalent of multiple human lifetimes to do anything interesting.
That they can end up skilled in so many fields it would take humans many lifetimes to master is notable, but it's still kinda odd we can't get to the level of a 5-year-old with just the experiences we would expect a 5-year-old to have.
Modern Artificial Neural networks are nowhere near the scale of the brain. The closest biological equivalent to an artificial neuron is a synapse and we have a whole lot more of them.
Humans do not start "learning" from zero. Millions of years of evolution play a crucial role in our general abilities. Much more equivalent to fine-tuning than starting from scratch.
There's also a whole lot of data from multiple senses that currently dwarf anything modern models are trained with yet.
LLMs need a lot less data to speak coherently when you aren't trying to get them to learn the total sum of human knowledge.
I don't think saying "humans pass and AI doesn't" makes any sense here because the two are not even taking the same exam for all the points outlined above.
Evolution alone means humans are "cheating" in this exam, making any comparisons fairly meaningless.
If all you care about is the results, or even specifically just the visible part of the costs, then there's no such thing as cheating.
That's both why I'm fine with the AI "cheating" by the transistors being faster than my synapses by the same magnitude that my legs are faster than continental drift (no really I checked) and also why I'm fine with humans "cheating" with evolutionary history and a much more complex brain (around a few thousand times GPT-3, which… is kinda wild, given what it implies about the potential for even rodent brains given enough experience and the right (potentially evolved) structures).
When the topic is qualia — either in the context "can the AI suffer?" or the context "are mind uploads a continuation of experience?" — then I care about the inner workings; but for economic transformation and alignment risks, I care if the magic pile of linear algebra is cost-efficient at solving problems (including the problem "how do I draw a photorealistic werewolf in a tuxedo riding a motorbike past the pyramids"), nothing else.
I think a significant limitation is that LLMs stop learning after training is over. The large context is not really that large, and even within it, LLMs lose track of the conversation or of important details. There are other limitations, like lack of real world sensors and actuators (eg eyes and hands).
Sidestepping the fact that memory is hardly a test of intelligence, are you telling me that humans with anterograde amnesia are not general intelligences ?
The poster was very probably implying something different:
in our terms, intelligence is (importantly) the ability to (properly) refine a world model: if you get information but said model remains unchanged, then intelligence is faulty.
> humans
There is a difference between the implementation of intelligence and the emulation of humans (which do not always use the faculty, and may use its opposite).
I said design an intelligence test that a good chunck of humans wouldn't also fail.
I'm sorry to tell you this but there are many humans that would fail your test. Even otherwise healthy humans could fail your test nevermind Anterograde Amnesia, Dementia etc patients
You think that if we told the average fifth grader in america that they must remember something that is VERY IMPORTANT a week later, and then had them do, say, a book report on a brand new book, and then asked them the very important fact, a 'good chunk' would fail?
Lol yes. People will fail. Any amoumt is enough to show your test is clearly not one of general intelligence unless you believe not all humans fit the bill.
Plenty of humans glitch out on random words (and concepts) we just can't get right.
Famously the way R/L sound the same to many asians (and equivalently but less famously the way that "four" and "stone" and "lion" when translated into Chinese sound almost indistinguishable to native English speakers).
But there's also plenty of people who act like they think "Democrat" is a synonym for "Communist", or that "Wicca" and "atheism" are both synonyms for "devil worship".
What makes the AI different here is that we can perfectly inspect the inside of their (frozen and unchanging) minds, which we can't do with humans (even if we literally freeze them, we don't know how).
>What makes the AI different here is that we can perfectly inspect the inside of their (frozen and unchanging) minds,
Kinda, but not really...
It depends exactly what you mean by it. So yes we can look at one thing in particular, there is not enough entropy in the universe to look at everything for even a single large AI model.