I wonder how much of that 59% gain comes from the 512bit registers/instructions ...

TinkersW · on Sept 26, 2022

Mysticals report indicates much of it does come from wider instructions, because it can saturate the core easier. Zen 3 was front end bottlenecked, so on Zen4 running AVX512 it can more often hit 4x256. The new instructions are useful and some help perf, but mostly only for pretty specialized stuff. Masking is nice but I think people really exaggerate the improvement from it, vblend was only 2 cycles.