Not really. That's 15.8 fp16 ops compared to 14.7 fp32 ops (that are actually us... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Archit3ch 5 months ago \| parent \| context \| favorite \| on: Run LLMs on Apple Neural Engine (ANE) Not really. That's 15.8 fp16 ops compared to 14.7 fp32 ops (that are actually useful outside AI). It would be interesting to see if you can configure the ANE to recover fp32 precision at lower throughput [1]. [1] https://arxiv.org/abs/2203.03341

brigade 5 months ago [–]

Apple GPUs run fp16 at the same rate as fp32 except on phones, so it is comparable for ML. No one runs inference from fp32 weights.

But the point was about area efficiency

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact