Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Major breakthrough in LLM scene. Achieve performance and perplexity equivalent to full FP16 models of same parameter size.

And you can fit 120B model with a single card 24GB VRAM. This is mind blowing.



I mean, it expands the hardware selection, but until there's models and leader boards etc, can't really say it's a break through.


I would assume a GPU isn’t specifically optimized for ternary computation and specialized accelerators would whip the pants off a GPU




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: