Btw, here is a VLA vector register sort: https://godbolt.org/z/Env64961q It has ...

janwas · 2025-04-26T17:29:08 1745688548

Nice work :) Clang x86 indeed unrolls, which is good. But setting the CC and AA mask constants looks fairly expensive compared to fixed-pattern shuffles.

Yes, the 2D aspect of the sorting network complicates things. Transposing is already harder to make VLA and fusing it with the other shuffles certainly doesn't help.