Hacker News new | past | comments | ask | show | jobs | submit login

Heh? Surely fast convert 8-bit int to 16-bit FP,rcp+mul/div is a no-brainer? edit make that fast convert,rcp,fma (float 16 constant 1.0) and xor (same constant)



Unfortunately none of the hardware used for testing supports FP16 arithmetic. Between Intel and AMD, the only platform that supports AVX512-FP16 is currently Sapphire Rapids.


Alderlake supports AVX512-FP16. Still only 9.6x faster than div. Most likely reciprocal is just too slow.


I tried a similar approach with 32-bit FP before, and the problem here is that fast conversion is only fast in the sense of latency. Throughput-wise, it takes 2 uops instead of one, so in the end, a plain float<->int conversion wins.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: