The title of this paper is a reference to a previous paper titled "Attention Is All You Need"[0][1]. This seminal work described the transformer model that is the basis for almost all LLMs, and is almost certainly the most cited paper on AI even though it was only published in 2017.
It has definitely been overused by too many authors.
This reminds me a passage of Orwell's essay "Politics and the English Language":
> A newly−invented metaphor assists thought by evoking a visual image, while on the other hand a metaphor which is technically "dead" (e.g., iron resolution) has in effect reverted to being an ordinary word and can generally be used without loss of vividness. But in between these two classes there is a huge dump of worn−out metaphors which have lost all evocative power and are merely used because they save people the trouble of inventing phrases for themselves
By that argument you must also hate anything that mentions the term "considered harmful", or makes any form of derivative cultural reference (like just about every episode of the Simpsons). Why do you let it get to you?
Then why waste your time with getting upset about people making tired cultural references? It's a chuckle at best and a meh at worst, getting bothered by it is a waste of effort.
Transformers are what made ML infinitely scalable and caused a huge amount of progress in very few years since everyone could just go scale things. However, idk how many of those papers actually even cite the transformer paper?
I just checked Google Scholar, not perfect but good for an indicative; "A logical calculus of the ideas immanent in nervous activity" [WS McCulloch, W Pitts - The bulletin of mathematical biophysics, 1943] has ~33,000 citations, and "Attention is all you need" [A Vaswani, N Shazeer, et al, Advances in Neural Information Processing Systems, 2017] has ~180,000 citations.
As I understand, the transformer architecture is built on deep learning.
Would you say that transformers made a bigger progress RELATIVE to the progress made by deep learning?
AFAIK, before the first wave of AI powered apps that were visible to users appeared thanks to deep learning in the early 10s. Users went from nothing to fancy AI features, the question is likely subjective but is the jump from nothing to fancy AI features the same as the jump from fancy AI features to GenAI in relative terms?
We can't forget that new tech builds upon older tech hence merits need to be relative
Probably because of the modern "publish or perish" mantra led to an exponential growth in publications, and "newer is better" means that newer impactful papers get cited more than older impactful publications. But that thesis is probably a paper in itself (of the meta analysis navel gazing variety).
Because the MCP neuron is taken as common knowledge and people do not feel the need to explicitly reference it (and haven't for some time), and the pace of publishing has increased in recent years.