Higher the precision the better. Use what works within your memory constraints. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ericlewis on Oct 9, 2024 \| parent \| context \| favorite \| on: Addition is all you need for energy-efficient lang... Higher the precision the better. Use what works within your memory constraints.

jasonjmcghee on Oct 9, 2024 [–]

With serious diminishing returns. At inference time, no reason to use fp64 and should probably use fp8 or less. The accuracy loss is far less than you'd expect. AFAIK Llama 3.2 3B fp4 will outperform Llama 3.2 1B at fp32 in accuracy and speed, despite 8x precision.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact