IIRC, there were issues with it causing frequency throttling on Intel cpus, whereas AMD's avoid that by "double pumping". It would be very interesting to compare and contrast there.
Seems like there's be value in it for all the new ML hotness that's come about. The AMD 7950x seems like it hits the sweet spot for that.
The vqsort README says it is a non-issue on that generation CPU (Skylake-X) based on their benchmarks: https://github.com/google/highway/tree/master/hwy/contrib/so...
At least on their low clock speed server CPU (<=3GHz), downclocking was hard to measure compared to clock speed variability they got with std::sort.
The 10980xe used for windows benchmarks here normally boosts much higher, so bigger differences could be expected. The author of OP mentioned measuring clock speeds with perf and seeing some difference.