Hacker News new | past | comments | ask | show | jobs | submit login

The code in this article is incorrect. The CUDA kernel is never called: https://github.com/RijulTP/GPUToolkit/blob/f17fec12e008d0d37...

I'd also like to point out that 90 % of the time spent to "compute" the Mandelbrot set with the JIT-compiled code is spent on compiling the function, not on computation.

If you actually want to learn something about CUDA, implementing matrix multiplication is a great exercise. Here are two tutorials:

https://cnugteren.github.io/tutorial/pages/page1.html

https://siboehm.com/articles/22/CUDA-MMM




>If you actually want to learn something about CUDA, implementing matrix multiplication is a great exercise.

There is SAXPY (matrix math A*X+Y), purportedly ([1]) the hello world of parallel math code.

>SAXPY stands for “Single-Precision A·X Plus Y”. It is a function in the standard Basic Linear Algebra Subroutines (BLAS)library. SAXPY is a combination of scalar multiplication and vector addition, and it’s very simple: it takes as input two vectors of 32-bit floats X and Y with N elements each, and a scalar value A. It multiplies each element X[i] by A and adds the result to Y[i].

[1]: https://developer.nvidia.com/blog/six-ways-saxpy/


Thank you for this, comments like yours is exactly why I keep coming back to HN.


Thanks a lot for pointing it out, I have fixed the code and updated the blog.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: