Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also, there are so many innovations in their papers (Deepseek math, Deepseek v2/v3, R1) that I honestly wouldn’t even care. They figured out a way to train on only 2048 H800s when big companies are buying them in the hundreds of thousands. They created a new RL algorithm. They improved MoE. They improved the KV cache. They built an super efficient training framework.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: