Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DeepSeep proposed the multi-head latent attention technique! :)

As far as I know, they are the only ones using it so far



Fair point, thanks for clarification, it seems this was first proposed in https://arxiv.org/pdf/2405.04434? I was confused by your title mentioning DeepSeek but then first paragraph revert to "...language models like ChatGPT and DeepSeek faster at generating text".


Right, that's a good point. I'll adjust the intro a bit. We wanted to provide a more holistic overview on what MLA is, what came before it, and why it matters :) hope it was useful!


Just refined it a bit; I hope it's clearer now!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: