Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It will not use AVX-512 if you have CFLAGS="-march=tigerlake -O2". You will, at the very least, need CFLAGS="-march=tigerlake -O3" to get it to actually use AVX2, and tigerlake's AVX512 implementation is so poor (clock throttling etc) that gcc will not use AVX-512 on tigerlake. AVX-512 is used if you have -march=znver4 though, so the support for autovectorizing to AVX-512 is clearly there.

https://godbolt.org/z/1a39Mf3bv



Is it actually that bad on Tiger Lake? Or just for really high-width vectors? On my old Ice Lake laptop, single-core AVX-512 workloads do not decrease frequency at all even with wider registers, and multi-core workloads will result in clock speed degradation of a small amount, maybe 100Mhz or so.

Depends on a couple factors (i.e. Ice Lake client only has 1 FMA unit) but I'd be surprised if Tiger Lake was a major regression relative to Ice Lake. It seems like they had it in an OK spot by then.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: