DeepSeep proposed the multi-head latent attention technique! :) As far as I know...

karolist · 2025-01-29T00:22:46 1738110166

Fair point, thanks for clarification, it seems this was first proposed in https://arxiv.org/pdf/2405.04434? I was confused by your title mentioning DeepSeek but then first paragraph revert to "...language models like ChatGPT and DeepSeek faster at generating text".

t55 · 2025-01-29T00:25:01 1738110301

Right, that's a good point. I'll adjust the intro a bit. We wanted to provide a more holistic overview on what MLA is, what came before it, and why it matters :) hope it was useful!

t55 · 2025-01-29T00:28:50 1738110530

Just refined it a bit; I hope it's clearer now!