Anyone with a comparison with Intel's deep learning boost or VNNI, which is avai...

brrrrrm · on Jan 5, 2023

I don't see throughput information on Intel's AMX, but their VNNI has information: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

0.5 cycles per instruction max, 3.7GHz clock, that's 7.4e9 instructions per second. If I'm reading it right, that instruction does 16 4-wide dot products, which is ~128 ops. So ~950Gops peak in int8 precision on a server class Xeon assuming no clock throttling.

(edit: flops -> ops)

stephencanon · on Jan 5, 2023

Int8 operations are not "flops" =)