Hacker News new | past | comments | ask | show | jobs | submit login

Anyone with a comparison with Intel's deep learning boost or VNNI, which is available on avx-512 processors such as https://ark.intel.com/content/www/us/en/ark/products/213805/...



I don't see throughput information on Intel's AMX, but their VNNI has information: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

0.5 cycles per instruction max, 3.7GHz clock, that's 7.4e9 instructions per second. If I'm reading it right, that instruction does 16 4-wide dot products, which is ~128 ops. So ~950Gops peak in int8 precision on a server class Xeon assuming no clock throttling.

(edit: flops -> ops)


Int8 operations are not "flops" =)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: