> So why not just fix AMD accelerators in pytorch? Both ROCm and pytorch are open sourced. Isn't the point of the OSS community to use the community to solve problems?
Because there's no real evidence that AMD cares about this problem, and without them caring your efforts may well be replaced by whatever AMD does next in the space. Their Brooks language[1] is abandoned, OpenCL doesn't compare well, ROCm is like the Sharepoint of GPU APIs (it ticks boxes but doesn't actually work very well).
> So why not just fix AMD accelerators in pytorch
Why not just buy NVidia? They care deeply about the space, will actually help you if you have trouble, etc etc.
Even using Google TPUs is better: Google will help you too.
While everyone using NVidia isn't great for the market as a whole as an individual company or person it makes a lot of sense.
Read "The Red Team (AMD)" section in the linked article:
> The software is called ROCm, it’s open source, and supposedly it works with PyTorch. Though I’ve tried 3 times in the last couple years to build it, and every time it didn’t build out of the box, I struggled to fix it, got it built, and it either segfaulted or returned the wrong answer. In comparison, I have probably built CUDA PyTorch 10 times and never had a single issue.
This is geohot. He knows how to build software, and how to fix problems.
Note that "Our short term goal is to get AMD on MLPerf using the tinygrad framework."
> There's a clear difference in how AMD and Nvidia measure TFLOPS. techpowerup shows AMD at 2-3x Nvidia, but performance is similar. Either AMD is crazy underutilized or something is wrong. Does anyone know the answer?
From the linked article:
> That’s the kernel space, the user space isn’t better. The compiler is so bad that clpeak only gets half the max possible FLOPS. And clpeak is a completely contrived workload attempting to maximize FLOPS, never mind how many FLOPS you get on a real program
> This is geohot. He knows how to build software, and how to fix problems.
This is a nonsequitor. This feels like when my uncle learns that I know how to program he asks me to build a website. These are two different things. I do ML and scientific computing, I'm not your guy. Hotz is a wiz kid but why should we expect his talents to be universal? Generalists don't exist.
And we're talking the guy who tweeted about believing that the integers and reals have the same cardinality right? Between that and his tweets on quantum we definitely have strong evidence that his jailbreaking skills don't translate to math or physics.
He's clearly good at what he does. There's no doubt about that. But why should I believe that his skills translate to other domains?
STOP MAKING GODS OUT OF MEN. Seriously, can we stop this? What does Stanning accomplish? It's creepy. It's creepy if it is BTS, Bieber, Elon, Robert Downey Jr, or Hotz.
> Read "The Red Team (AMD)" section in the linked article:
Clearly I did, I quoted from it. You quoted from the next section (So why does no one use it?).
geohot wrote tinygrad. This is not about believing his skills to translate to other domains. It is his domain.
You definitely shouldn't trust what geohot says about infinitary mathematics or (god forbids) quantum mechanics. On the other hand, you generally should trust what he says about machine learning software stack.
Tinygrad isn't a big selling point. I'd expect most people to be able to build something similar after watching Karpathy's micrograd tutorial. Tinygrad doesn't mean expertise in ML and it similarly doesn't mean expertise in accelerator programming. I wouldn't expect a front end developer to understand Template Metaprogramming and I wouldn't expect an engineer who programs acoustic simulations to be good at front end. You act like there are actually fullstack developers and not just people who do both poorly.
This project isn't even about skill in ML, which demonstrates misunderstandings. The project requires writing accelerator code. Go learn CUDA and tell me how different it is. It isn't something you're going to pick up in a weekend, or a month, and realistically not even a year. A lot of people can write kernels, not a lot of people can do it well.
> You act like there are actually fullstack developers and not just people who do both poorly.
If you haven't worked with someone who's smarter and more motivated than you are, then I can see how you'd draw that conclusion, but if you have, then you'd know that there are full stack developers out there who do both better than you. It's humbling to code in their repos. I've never worked with geohot so I don't know if he is such a person, but they're out there.
> Hotz is a wiz kid but why should we expect his talents to be universal?
No of course not. But this is literally his field of expertise, and there's plenty of reasons to think he knows what he is doing. Specifically, the combination of reverse engineering and writing ML libraries means I'd certainly expect he's had reasonable experience compiling things.
Because there's no real evidence that AMD cares about this problem, and without them caring your efforts may well be replaced by whatever AMD does next in the space. Their Brooks language[1] is abandoned, OpenCL doesn't compare well, ROCm is like the Sharepoint of GPU APIs (it ticks boxes but doesn't actually work very well).
> So why not just fix AMD accelerators in pytorch
Why not just buy NVidia? They care deeply about the space, will actually help you if you have trouble, etc etc.
Even using Google TPUs is better: Google will help you too.
While everyone using NVidia isn't great for the market as a whole as an individual company or person it makes a lot of sense.
Read "The Red Team (AMD)" section in the linked article:
> The software is called ROCm, it’s open source, and supposedly it works with PyTorch. Though I’ve tried 3 times in the last couple years to build it, and every time it didn’t build out of the box, I struggled to fix it, got it built, and it either segfaulted or returned the wrong answer. In comparison, I have probably built CUDA PyTorch 10 times and never had a single issue.
This is geohot. He knows how to build software, and how to fix problems.
Note that "Our short term goal is to get AMD on MLPerf using the tinygrad framework."
> There's a clear difference in how AMD and Nvidia measure TFLOPS. techpowerup shows AMD at 2-3x Nvidia, but performance is similar. Either AMD is crazy underutilized or something is wrong. Does anyone know the answer?
From the linked article:
> That’s the kernel space, the user space isn’t better. The compiler is so bad that clpeak only gets half the max possible FLOPS. And clpeak is a completely contrived workload attempting to maximize FLOPS, never mind how many FLOPS you get on a real program
[1] https://en.wikipedia.org/wiki/BrookGPU