Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Shuffle is the SIMD's killer app

A shame that AVX512 only has pshufb (aka: permute), and is missing the GPU-instruction "bpermute", aka backwards permute.

pshufb is effectively a "gather" instruction over a AVX register. Equivalent to GPU permutes.

bpermute, in GPU land, is a "scatter" instruction over a vector register. There's no CPU / AVX equivalent of it. But I keep coming up with good uses of the bpermute instruction (much like pshufb is crazy flexible, its inverse, the backwards permute, is also crazy flexible).

--------

Almost any code that's finding itself "gathering" data across a vector register, will inevitably "scatter" the data back at some point.

Much like how "pext" is the "gather" instruction for 64-bits, you need pdep to handle the equal-and-opposite case. Its incredibly silly that AVX / AVX512 has implemented only one-half of this concept (gather / pshufb / aka Permute).

I wish for the day that Intel/AMD implements (scatter / backwards-pshufb / aka Backwards-Permute).

-------

Fortunately, I got Vega64 and NVidia Graphics Cards with both permute and bpermute instructions for high-speed shuffling of data. But CPU-space should benefit from this concept too.




OK that's cool, didn't know about bpermute. Made sense there should be a counterpart. Well when you only have pshufb, it works OK, yeah there's tons of gaps but if you're clever and...and if you compromise speed...thanks for telling me about bpermute!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: