Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh wow, you’re right. Though it seems that they are using very small weight group sizes: either 16 or 32 (fp16 scaling factor per group). In this paper it seems there’s no weights grouping, so it’s a bit apples to oranges.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: