Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not, it uses locality-sensitive hashing to reduce attention complexity from O(n^2) to O(nlogn) while maintaining the same performance in 16GB as a best model that could fit into 100GB but nobody scaled it up to 1000 GPUs as its purpose was the opposite.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: