Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I doubt that you can make it faster on the GPU than on CPU when utilizing SIMD, reason being that you are actually doing something close to trivial upon looking at each byte in sequence. So you transfer it from CPU memory to GPU memory in order to do almost nothing with it.


I've got it working on a T4 via Google Colab. The PDF takes 178 milliseconds to the 206 listed in the readme for the C version, so 15%?

https://github.com/fragmede/wc-gpu/blob/main/wc_gpu.ipynb


It's only at a limit like that if you don't parallelize. And sure you could use more cores, but you can go a lot faster on 20% of a GPU than on 20% of your CPU cores.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: