I was always surprised at how AMD hasn't already thrown a bunch of money at this problem. Maybe they have and are just incompetent in this area.
My prediction is AMD is already working on this internally, except more oriented around PyTorch not Hotz's Tinygrad, which I doubt will get much traction.
He mentioned ROCm, and apparently had lack luster experience with it.
>The software is called ROCm, it’s open source, and supposedly it works with PyTorch. Though I’ve tried 3 times in the last couple years to build it, and every time it didn’t build out of the box, I struggled to fix it, got it built, and it either segfaulted or returned the wrong answer. In comparison, I have probably built CUDA PyTorch 10 times and never had a single issue.
Not surprising lol. This was also the experience I had while experimenting with MLIR approximately 3 years ago. You'd need to git checkout a very specific commit and then even change some flags in code to have a successful build. I'm sure things are better now but I haven't messed with it since then.
AMD is not going down the path of ROCm; perhaps they claim to do so, but as evidenced by the lack of both effort and results, they clearly are not.
The parent post is surprised that they still aren't making the appropriate investments to make it work. They kind of started to do that a few years ago, but then it fell on the wayside without reaching even table stakes, which in my opinion would require providing a ROCm distribution that works out of the box for most of their recent consumer cards (i.e. those cards which the enthusiasts/students/advocates/researchers might use while choosing which software stack to learn, and afterward base corporate compute cluster purchasing decisions on whether they support the software they wrote for e.g. CUDA+Pytorch), and they seem to be failing at that.
AMD is limited by numerous patent and other legal issues. For this reason small company that releases everything as open source have some chances to beat AMD on their own hardware.
My prediction is AMD is already working on this internally, except more oriented around PyTorch not Hotz's Tinygrad, which I doubt will get much traction.