I'd also like to point out that 90 % of the time spent to "compute" the Mandelbrot set with the JIT-compiled code is spent on compiling the function, not on computation.
If you actually want to learn something about CUDA, implementing matrix multiplication is a great exercise. Here are two tutorials:
>If you actually want to learn something about CUDA, implementing matrix multiplication is a great exercise.
There is SAXPY (matrix math A*X+Y), purportedly ([1]) the hello world of parallel math code.
>SAXPY stands for “Single-Precision A·X Plus Y”. It is a function in the standard Basic Linear Algebra Subroutines (BLAS)library. SAXPY is a combination of scalar multiplication and vector addition, and it’s very simple: it takes as input two vectors of 32-bit floats X and Y with N elements each, and a scalar value A. It multiplies each element X[i] by A and adds the result to Y[i].
I'd also like to point out that 90 % of the time spent to "compute" the Mandelbrot set with the JIT-compiled code is spent on compiling the function, not on computation.
If you actually want to learn something about CUDA, implementing matrix multiplication is a great exercise. Here are two tutorials:
https://cnugteren.github.io/tutorial/pages/page1.html
https://siboehm.com/articles/22/CUDA-MMM