To clarify what I mean on this specific bit: the SOTA results in 2D and 3D vision, audio, translation, NLP, etc are all transformers. Past results do not necessarily predict future performance, and it would be absurd to claim that an immutable state of affairs, but it's certainly interesting that all of the domain-specific architectures have been flattened in a very short period of time.
Thanks for clarifying. Well, my argument is that the state of the art is more the result of trends in research than of the true capabilities of different approaches.
Take my little rant about Rich Sutton's (a god, btw) Bitter Lesson with respect to RL. So, there's AlphaGo, AlphaZero and μZero, yes? AlphaGo knows the rules of Go and starts with some expert knowledge, and beats very human Go player. AlphaZero knows the rules of Go but has no expert knowledge and it beats AlphaGo. And μZero neither knows the rules of Go, nor has expert knowledge, and it beats AlphaZero, and can also plays chess, shoggi and Atari games, with one hand while eating a banana. Do you know how hard it is to eat a banana with one hand? Unpeeled!
Easy to draw a conclusion from that. Except all those systems were developed and used by DeepMind, and there are very few entities besides DeepMind that can even train them, so all we know is what DeepMind claims and we have no way to check their claims. For example, can I test different configurations of μZero, with and without knowledge of the rules of the game and expert knowledge? Not really. And it's clear to me that DeepMind are pushing very, very hard a form of AI that relies on having gigantic resources, like the ones the just completely coincidentally happen to be among the few entities to have access to. So I remain unconvinced.
(I need to re-read the μZero paper, it's in my pdf buffer. I didn't get it the first time I read it, and it might well be that they did make sufficient ablation studies to convince even me and I just don't remember it).
To clarify what I mean on this specific bit: the SOTA results in 2D and 3D vision, audio, translation, NLP, etc are all transformers. Past results do not necessarily predict future performance, and it would be absurd to claim that an immutable state of affairs, but it's certainly interesting that all of the domain-specific architectures have been flattened in a very short period of time.