I'm not sure why this is a moat. Isn't it just a matter of translation from CUDA...

david-gpu · 2024-10-17T11:56:09 1729166169

Most of the computations are done inside NVidia proprietary libraries, not open-source CUDA. And if you saw what goes inside those libraries, I think you would agree that it is a substantial moat.

theGnuMe · 2024-10-17T15:47:00 1729180020

There are clean room approaches like AMDs and Scale.

caeril · 2024-10-17T16:29:34 1729182574

Geohot has multiple (and ongoing) rants about the sheer instability of AMD RDNA3 drivers. Lisa Su engaged directly with him on this, and she didn't seem to give a shit about their problems.

AMD is not taking ML applications seriously, outside of their marketing hype.

fvv · 2024-10-17T21:44:19 1729201459

Rdna3 is not cdna

david-gpu · 2024-10-17T19:45:03 1729194303

Are you suggesting that Scale can take cuDNN kernels and run them at anything resembling peak performance on AMD GPUs?

Because functional compatibility is hardly useful if the performance is not up to par, and cuDNN will run specific kernels that are particularly tuned to not only a specific model of GPU, but also to the specific inputs that the user is submitting. NVidia is doing a ton of work behind the scenes to both develop high-performance kernels for their exact architecture, but also to know which ones are best for a particular application.

This is probably the main reason why I was hesitant to join AMD a few years ago and to this day it seems like it was a good decision.

blharr · 2024-10-17T15:34:21 1729179261

Sure you can probably translate rough code and get something that "works" but all the thousands of small optimizations that are baked in are not trivial to just translate.

noduerme · 2024-10-18T06:13:44 1729232024

I like the take that small optimizations, taken together, amount to a moat. I feel like this could be a profoundly understated paradigm.