Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Indeed a great article, well worth reading in full for anyone who uses AVX-512.

Two other things that jumped out at me: VPCONFLICT is 10x as fast, compressstoreu is >10x slower. Those might be enough to warrant a Zen4-specific codepath in Highway.




The Intel optimization manual has a fun example where they use vpconflict for vectorizing sparse dot products: https://github.com/intel/optimization-manual/blob/main/chap1...

I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.


Nice! Thanks for linking it :)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: