Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
karmakaze
23 hours ago
|
parent
|
context
|
favorite
| on:
TransMLA: Multi-head latent attention is all you n...
I'm not "in the field" though I like to read about and use LLMs. This video "How DeepSeek Rewrote the Transformer [MLA]"[0] is really good at explaining MHA, MQA, GQA, and MLA with clear visuals/animations and how DeepSeek MLA is 57x more efficient.
[0]
https://www.youtube.com/watch?v=0VLAoVGf_74&t=960s
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
[0] https://www.youtube.com/watch?v=0VLAoVGf_74&t=960s