I doubt that you can make it faster on the GPU than on CPU when utilizing SIMD, ...

fragmede · on June 21, 2024

I've got it working on a T4 via Google Colab. The PDF takes 178 milliseconds to the 206 listed in the readme for the C version, so 15%?

https://github.com/fragmede/wc-gpu/blob/main/wc_gpu.ipynb

Dylan16807 · on June 21, 2024

It's only at a limit like that if you don't parallelize. And sure you could use more cores, but you can go a lot faster on 20% of a GPU than on 20% of your CPU cores.