I'm not sure why this is a moat. Isn't it just a matter of translation from CUDA to some other instruction set? If AMD or someone else makes cheaper hardware that does the same thing, it doesn't seem like a stretch for them to release a PyTorch patch or whatever.
Most of the computations are done inside NVidia proprietary libraries, not open-source CUDA. And if you saw what goes inside those libraries, I think you would agree that it is a substantial moat.
Geohot has multiple (and ongoing) rants about the sheer instability of AMD RDNA3 drivers. Lisa Su engaged directly with him on this, and she didn't seem to give a shit about their problems.
AMD is not taking ML applications seriously, outside of their marketing hype.
Are you suggesting that Scale can take cuDNN kernels and run them at anything resembling peak performance on AMD GPUs?
Because functional compatibility is hardly useful if the performance is not up to par, and cuDNN will run specific kernels that are particularly tuned to not only a specific model of GPU, but also to the specific inputs that the user is submitting. NVidia is doing a ton of work behind the scenes to both develop high-performance kernels for their exact architecture, but also to know which ones are best for a particular application.
This is probably the main reason why I was hesitant to join AMD a few years ago and to this day it seems like it was a good decision.
Sure you can probably translate rough code and get something that "works" but all the thousands of small optimizations that are baked in are not trivial to just translate.