Can you explain Transformers in one sentence? (attention mechanisms) [video]

janalsncm · on Sept 11, 2023

I always thought the query/key/value analogy was confusing and unnecessary. That tired analogy is why I don’t think Attention is All You Need is a particularly good paper. The BERT paper is much more readable.

If you actually look at what a self attention head looks like it’s much easier to understand and really not that complicated.

Once you get self attention, multi headed attention is just doing that N times in parallel over the same sequence.

lotsoweiners · on Sept 11, 2023

Transformers are robots that turn into cars and vice versa.

tonymillion · on Sept 11, 2023

What about megatron and sound wave… and in fact most of the decepticons

Transformers: Robots in disguise!