Heh? Surely fast convert 8-bit int to 16-bit FP,rcp+mul/div is a no-brainer? edi...

bremac · 2024-12-22T02:09:07 1734833347

Unfortunately none of the hardware used for testing supports FP16 arithmetic. Between Intel and AMD, the only platform that supports AVX512-FP16 is currently Sapphire Rapids.

Cold_Miserable · 2024-12-24T00:04:01 1734998641

Alderlake supports AVX512-FP16. Still only 9.6x faster than div. Most likely reciprocal is just too slow.

purplesyringa · 2024-12-22T02:05:17 1734833117

I tried a similar approach with 32-bit FP before, and the problem here is that fast conversion is only fast in the sense of latency. Throughput-wise, it takes 2 uops instead of one, so in the end, a plain float<->int conversion wins.