Thanks for the enlightenment here, some of those I didn't know. Being used to ex...

dragontamer · on Dec 1, 2023

> I was never sure nvidia would maintain over time.

PTX is maintained over time. Its a high-level assembly so to speak, the full details of the machine remain abstracted so that code can be more portable.

SASS is not. SASS changes from architecture-to-architecture. SASS is the actual machine code of NVidia cards. There's an overall understanding of SASS in the GPU world but its not really documented and you "shouldn't" want to learn about it.

--------

I should note that Intel's "pshufb" instruction is very similar to the permute instruction in NVidia/AMD. So yeah, there's a high-speed generic shuffle that's key to Intel/AMD AVX512 code.

But having the backwards-direction (bpermute) available too, as well as __shared__ memory for all other cases is great.