It will not use AVX-512 if you have CFLAGS="-march=tigerlake -O2". You will, at the very least, need CFLAGS="-march=tigerlake -O3" to get it to actually use AVX2, and tigerlake's AVX512 implementation is so poor (clock throttling etc) that gcc will not use AVX-512 on tigerlake. AVX-512 is used if you have -march=znver4 though, so the support for autovectorizing to AVX-512 is clearly there.
Is it actually that bad on Tiger Lake? Or just for really high-width vectors? On my old Ice Lake laptop, single-core AVX-512 workloads do not decrease frequency at all even with wider registers, and multi-core workloads will result in clock speed degradation of a small amount, maybe 100Mhz or so.
Depends on a couple factors (i.e. Ice Lake client only has 1 FMA unit) but I'd be surprised if Tiger Lake was a major regression relative to Ice Lake. It seems like they had it in an OK spot by then.
In my experience it depends on the compiler. clang seems far more willing to autovectorise than gcc. Also, when writing the code you have to write it in a way that strongly hints to the compiler that it can be autovectorised. So lots of handholding.
My question is just... does it? (And does it use AVX-512 profitably?)