This sounds exciting, but the example outputs look quite bad. E.g. from the inte...

momenti · on May 12, 2022

The model only has about 1B parameters which is relatively small.

The language models that produced very impressive results have >>50B parameters, e.g. GPT-3 with 175B, Aleph Alpha Luminous (200B), Google PaLM (540B). GPT-3 can understand and answer basic trivia questions, and impressively mimic various writing styles, but it fails at basic arithmetic. PaLM can do basic arithmetic much better and explain Jokes. Dall-E 2 (specialized on image generation) has 3.5B parameters for the image generation alone and it uses a 15B language model to read in text (a version of GPT-3).

inductive_magic · on May 12, 2022

Imagine what the alternative would imply. AI would be solved, and thus, intelligence itself. Predicting tokens is not actually true intelligence, and that’s not really the point of these models. This is a step on the letter, not the rooftop. It looks a lot like we’ll get there though, if you compare the state of the art to ANYTHING labeled AI five years ago. Thats the exciting part.

[edit] to emphasize: predicting tokens is a very interesting mechanic, but in a design of intelligent software, it would be no more than that: the mechanic of one or more of its components/modules/subsystems. The real deal is to figure out what those components are. Once you have that part done, you can implement it in a language of your choice, be it token prediction, asm or powerpoint :-)

CRG · on May 12, 2022

It's also smaller than GPT-2 (1.2B vs 1.6B) and trained with a lot less language data (6% of the training mix).

sdwr · on May 12, 2022

Yeah, the captions are in the right arena but fundamentally wrong. In the baseball picture it recognizes the ball, pitcher, and the act of throwing, but calls the action wrong. Its object recognition and pattern matching are excellent, but higher level thinking and self-correction are totally absent.

Which is exactly where GPT, etc., are capping out. Its easier to see the flaws in this one because its more general, so spread out more thinly.

To get to the next step (easy to say from an armchair!), these models need a sense of self and relational categories. Right now a 5-year old can tell a more coherent story than GPT. Not more sophisticated, but it will have a central character and some tracking of emotional states.

habitue · on May 12, 2022

> Its easier to see the flaws in this one because its more general, so spread out more thinly.

I really think this is due to the very limited number of parameters in GATO: 1.2B vs. 175B for GPT-3. They intentionally restricted the model size so that they could control a robot arm (!) in real time.

> these models need a sense of self and relational categories.

The places where I personally see GPT-3 getting hung up on higher level structure seem very related to the limited context window. It can't remember more than a few pages at most, so it essentially has to infer what the plot is from a limited context window. If that's not possible, then it either flails (with higher temperatures) or outputs boring safe completions that are unlikely to be contradicted (with lower temperatures)

ravi-delia · on May 12, 2022

It's a very small model, I think due to the intent to use it for robotics. It's not that it's good per se, even if it were just a language model it would be smaller than GPT-2, it's that it's bad at a lot of different things. I hope to see analysis into how much of it is multi-purpose, but as of now it's looking really cool

peddling-brink · on May 12, 2022

That could be solved with accurate lookups from trusted sources. Humans do the same thing, we have associations and trusted facts. AI has the associations, they just need to add the trusted facts compendium. "Hmm I know that Marseille is associated with France, but I don't remember the capitol, Hey Google.."

password54321 · on May 12, 2022

Yeah they put that example for a reason. Read the paper and stop acting like this is some great insight that you discovered.

chilmers · on May 12, 2022

What exactly did I say that implied I was acting as this was a “great insight I’d discovered”? That’s a rather rude and unfair insult I’d say.

password54321 · on May 12, 2022

When someone only mentions a fault with nothing else to add it comes off dismissive which is a common theme for comments on AI research.