Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Higher the precision the better. Use what works within your memory constraints.


With serious diminishing returns. At inference time, no reason to use fp64 and should probably use fp8 or less. The accuracy loss is far less than you'd expect. AFAIK Llama 3.2 3B fp4 will outperform Llama 3.2 1B at fp32 in accuracy and speed, despite 8x precision.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: