Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are using RingAttention that scales self attention computation linearly by number of devices by passing KV result blocks in a ring:

https://arxiv.org/abs/2310.01889 (Submitted on 3 Oct 2023)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: