That's great. Having a usable baseline is important to ship it in more than a ha...

adrian_b · 2025-03-19T12:46:01 1742388361

The CUDA approach is just a software abstraction layer. The hardware of the NVIDIA GPUs is no more similar to the CUDA model than AVX-512 (the NVIDIA GPUs use 1024-bit vectors instead of 512-bit vectors).

There exists also for Intel AVX/AVX-512 a compiler that implements the CUDA approach (Intel Implicit SPMD Program Compiler). Such compilers could be written for translating any programming language into AVX-512, while using the same concurrency model as CUDA.

Moreover, as a software model the "CUDA approach" is essentially the same as the OpenMP approach, except that the NVIDIA CUDA compilers are knowledgeable about the structure of the NVIDIA GPUs, so they are able to map automatically the concurrent threads specified by the programmer into GPU hardware cores, threads and SIMD lanes.

janwas · 2025-03-20T09:10:59 1742461859

In addition to ISPC, it is possible to do this kind of vector-length abstraction at the library level, e.g. in our Highway library.

We routinely write code that works on 128-512 bit vectors. Some use cases are harder than others, e.g. transposing.