Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's great. Having a usable baseline is important to ship it in more than a handful of handpicked functions.

But the whole approach with fixed-length instructions seems terrible to me. It takes Intel a decade to add another batch of instructions for another width, and the existing applications don't benefit from the new instructions, even if they already process wide batches of data.

The CUDA approach is so much more appealing: here's my data, and you can process it in how many small or large units as you want.



The CUDA approach is just a software abstraction layer. The hardware of the NVIDIA GPUs is no more similar to the CUDA model than AVX-512 (the NVIDIA GPUs use 1024-bit vectors instead of 512-bit vectors).

There exists also for Intel AVX/AVX-512 a compiler that implements the CUDA approach (Intel Implicit SPMD Program Compiler). Such compilers could be written for translating any programming language into AVX-512, while using the same concurrency model as CUDA.

Moreover, as a software model the "CUDA approach" is essentially the same as the OpenMP approach, except that the NVIDIA CUDA compilers are knowledgeable about the structure of the NVIDIA GPUs, so they are able to map automatically the concurrent threads specified by the programmer into GPU hardware cores, threads and SIMD lanes.


In addition to ISPC, it is possible to do this kind of vector-length abstraction at the library level, e.g. in our Highway library.

We routinely write code that works on 128-512 bit vectors. Some use cases are harder than others, e.g. transposing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: