No one is going to be able to seriously use and support AVX512 (or be sufficient...

jackmott42 · on Sept 26, 2022

Imagine next gen consoles, suppose they stick with AMD. Then every game studio and game engine studio is going to love flinging some AVX-512 around. Developers will get more experience with it, any game that runs on PC and Console is going to look slow on PC if you have intel cpus with bad support. More libraries and tools will get created that people will want to use.

Adoption could accelerate quick!

kllrnohj · on Sept 26, 2022

Next-next gen consoles are probably still a good 5+ years away. AVX-512 for consumers will either have already become "a thing" or it'll be dead & buried by then.

jackmott42 · on Sept 26, 2022

People said that about it 5 years ago to. Yet here we are. Nobody is going to just get rid of it, servers are already using it.

bayindirh · on Sept 26, 2022

The biggest problem is not support for the instruction set in the silicon, but the performance penalty it brings.

SIMD hardware is the most power hungry block on Intel CPUs, and the frequency penalty it brings is never completely disclosed in the tech docs. Even Intel doesn't share that information with you (as a serious customer) sometimes.

In HPC world, no instruction is too obscure or niche to use. However, when you use these instructions too frequently, the heat load it generates can slow you down instead of accelerating you over the course of your job, so AVX512 is a pretty mixed case in Intel CPUs.

Regardless of this penalty, numeric code benefits from wider SIMD pipelines in most cases. At worst, you see no speedup, but you're investing for the future.

On the other hand, we have seen applications which run faster on previous generation hardware due to over-optimization.

Tuna-Fish · on Sept 26, 2022

> However, when you use these instructions too frequently, the heat load it generates can slow you down

It's not the heat load that slows you down. If you are using them enough that you produce enough heat that you have to downclock, it's still a win because the instructions improved your throughput more than what you lost in clocks.

The problem with Intel's initial AVX-512 implementation was that they didn't clock down because of heat, they clocked down pre-emptively and substantially whenever the CPU executed even a single AVX-512 instruction, even if there was no added heat load, and stayed on the lower clocks for a long period. This worked fine any proper SIMD loads, but was crushing in any situation where there was just a handful of AVX-512 ops between long stretches, such as using an AVX-512 optimized version of some library function.

bayindirh · on Sept 26, 2022

> [T]hey clocked down pre-emptively and substantially whenever the CPU executed even a single AVX-512 instruction...

Because you were hitting the power envelope limits in the CPU in these cases too. You might not see the heat, but the CPU cannot carry the power required to keep that core at non-AVX speeds with these power-hungry blocks operated at full speed.

As I said, to add insult to the injury, Intel didn't share the exact details of its AVX implementations and frequency ranges it operates, either.

Ah, publicly sharing your findings is/was forbidden too.

adrian_b · on Sept 26, 2022

No, as the above poster said, Intel slows down the CPU before any actual increase in power consumption or temperature occurs, because their fear that their power limit and temperature controller will not be able to react fast enough when the power increase eventually happens.

Whatever control mechanism is used in the AMD Zen CPUs is better than Intel's, so they downclock only when the power consumption really increases and the clock frequency recovers when the power consumption decreases, so there is no penalty when using sporadically some 512-bit instructions, like in the Intel CPUs.

paulmd · on Sept 27, 2022

No, actually grandparent is correct here - Skylake-X/Skylake-SP have never clocked down the first time they see an AVX-512 instruction. It actually is when the AVX instructions start to get dense enough to justify a voltage swing upwards. This actually exists in Haswell as well - certain AVX2 instructions are designated "heavy" and if you get enough of them you'll enter a voltage or frequency transition.

On Skylake-X there are more states... AVX-512 light and heavy as well.

https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html#...

https://travisdowns.github.io/blog/2020/08/19/icl-avx512-fre...

boundchecked · on Sept 27, 2022

Skylake-X on X299 can be configured to not downclock at all, the light and heavy stages are referred in BIOS as AVX2- and AVX512-offsets, but of course that comes with extra heat and power draw. The voltage transition period can't be mitigated AFAIK.

bayindirh · on Sept 27, 2022

If I understood this correctly, this is for desktop chips and systems. My experience is solely based on Xeon family of processors.

Our systems doesn’t feature a similar override, but we can adjust the thermal and power envelope of the processors and system in general.

When we get a new bunch of systems, I’ll look into it, but my hopes are not that high.

Maybe we’ll get AMD systems this time, who knows.

coder543 · on Sept 26, 2022

> The biggest problem is not support for the instruction set in the silicon, but the performance penalty it brings.

Why is that sentence present tense instead of past tense? I suppose it continues to be a problem for Intel, but your comment appears to be presenting Intel downsides as if they were universal. Zen 4 apparently implements AVX-512 efficiently, without the problems Intel implementations experienced. That's what this whole discussion is about, and that's what Phoronix found as well.[0]

Hopefully Intel will catch up to AMD on AVX-512, but in the mean time, people optimizing software should be aware that AVX-512 has few (if any) downsides on certain platforms. Phoronix found zero performance penalty, but perhaps more testing is required.

[0]: https://www.phoronix.com/review/amd-zen4-avx512/6

ComputerGuru · on Oct 4, 2022

Because it's hard to say "conditionally enable AVX-512 only on platforms supporting it BUT not on platforms where it actually brings performance penalties."

janwas · on Oct 4, 2022

Can you give a real example of AVX-512 actually causing a net performance penalty on any CPU?

The only way I can see that happening is using AVX-512 in small, infrequently called functions such as strcmp, and the solution is: don't do that.

If proper SIMD code runs for say 1ms at a time, it's pretty much guaranteed to benefit from any implementation of AVX-512.

janwas · on Sept 27, 2022

> The gains AVX512 brings over AVX2 are, for most people w/ specialty libs excluded

The last two things I worked on, image compression and quicksort, see 1.4-1.6x end to end speedups from AVX-512 vs AVX2. Is that sufficiently motivating? Especially because the only thing we had to do was ensure that CI machines are AVX-512 capable so that those test codepaths also run.

The "terrible CPU support" is a fact of life, not just in x86 (AES is 'optional' in SVE2, sigh), and so we deal with it via runtime dispatch - using what the CPU supports.

mort96 · on Sept 26, 2022

I don't see why performance-critical code wouldn't have an AVX512 implementation in addition to a scalar or SSE or AVX2 fallback, if AVX512 gives a big enough speed-up on a large enough number of relevant devices.