The trouble is that AMD just didn't take AI seriously. For a long time, their equivalent of CUDA was not only Linux only and a pain to use but was outright broken on all consumer cards - as in, they dropped official support for the only consumer cards it officially ran on, promptly broke it so that machine learning runs failed, and dismissed bug reports from the users who were left high and dry because their cards were no longer officially supported. The only way to use AMD for machine learning was to pay out much more than the price of NVidia's consumer cards for server-focused AMD cards that worked worse, were harder to use, and that AMD didn't support for long either. They just never had the small-scale desktop usage that lead to NVidia's cards being the choice for bigger machine learning once scaled up because it simply didn't work.
> The trouble is that AMD just didn't take AI seriously.
Until a couple of years ago, AMD was in survival mode, fighting Intel on one side and Nvidia on the other. Two rivals that were making money hand over fist while AMD was bleeding money.
AMD picked open standards and made investments on open source frameworks and libraries commensurate with their financials, the hope being that the community could help pick up some of the slack. The community, understandably, went with the proprietary solution that worked well at the time and had resources behind.
The net results is that the Nvidia ecosystem has gained a dominant position in the industry and benefits from being perceived as a quasi-standard. On the other hand, open source efforts by AMD or others get viewed as "not serious".
The financial situation of AMD has improved somewhat over the last couple years. So AMD is "taking AI more seriously now". But it might be too late and the proprietary ecosystem has probably won.
For what it's worth, AMD is also incredibly proprietary. The drivers being open source really helps with compatibility and your kernel, but you're still interacting with a massive computer running it's own OS with its own trusted code solution. And that computer also has DMA to your computer.
I would consider their open efforts to be "not serious" for anyone but the consumer space - games, desktop users, maybe even professional text editors. If you're using the GPUs for "professional" applications in a one-off scenario, even AMD falls short.
I'm honestly not sure what the moral of this story is.
AMD's focus was always on pure compute power at a good price. And they always beat NVidia at that game. AMD cards always had the highest hash rate per dollar in crypto mining. AMD has 100% of the console market and the fastest iGPUs by 2x over Intel.
NVidia decided to use gimmicks to sell their cards including texture compression, lighting tricks, improved antique video encoders, motion smoothing, bad proprietary variable refresh rate, ray tracing, cuda and now machine learning features.
Nvidia is fortunate that machine learning has taken off. That is masking AMD winning market share from weak overpriced NVidia 3D products!
You're mashing together a lot under "gimmicks" there.
Texture compression: Useful for games, ongoing work, although I wish they would make cards with appropriate amounts of VRAM
Lighting tricks: Not sure what this is referencing
Improved antique video encoders: NVENC started out with only h.264, but now it supports h.265 and AV1, which aren't antique at all. Niche, but widely used in the streaming industry.
Motion smoothing: The hardware optical flow accelerators in newer cards are important for DLSS, which is a bit gimmicky but works mostly as advertised.
Bad proprietary vrr: No argument here, gsync sucked.
Ray tracing: All 3d games are going to be ray traced sooner or later. Getting a head start on it is a good move, and it's a big head start. The 4090 is ~100% faster than the 7900xtx.
CUDA: No one can seriously call CUDA a gimmick.
Machine learning features: Tensor cores are great.
CUDA isn't a "technology", its a shader language that has been supplanted by better industry-wide standards.... the same standards whose shader languages are compiled by the same Nvidia shader compiler.
CUDA is a moat whose muddy waters has long since ran dry, and you're drinking koolaid if you think its still relevant for greenfield projects.
> and you're drinking koolaid if you think its still relevant for greenfield projects.
So I want to start a new GPU compute project today. Obviously this will primarily be deployed to AWS/Azure/etc, which means only high-end GPUs available are Nvidia. What do you recommend developing this application with?
The way I see it, you would have to be drinking koolaid to use anything besides CUDA.
Why do you think I can't use standard APIs on Nvidia? I literally just said same compiler does both; Nvidia sits on the Khronos committee! They co-wrote the API that everyone uses, that their compiler also speaks!
Vulkan Compute is not an alternative to CUDA. There's a reason PyTorch doesn't provide Vulkan in their official binaries. It's in the source, though--build it yourself, try running any recent ML project, and see what happens.
Thats a weird strawman; compute in Vulkan is a replacement compute in OpenGL and legacy D3D, and as a twin sibling to compute in D3D12.
OpenCL is the actual intended replacement for all the pre-standard APIs, and has achieved its goals. If you want SPIR-V IR, OpenCL allows this and all the major vendor impls support it.
CUDA has no equivalent for SPIR-V, and never will. Nvidia's own internal IR is not, and never will be, documented nor stable across driver versions. This is a massive downside for ML middlewares, as they have no way of directly emitting optimal code that cannot easily be represented in the HLSL-flavored syntax in CUDA.
> CUDA isn't a "technology", its a shader language that has been supplanted by better industry-wide standards
As someone who uses industry-wide standards in a related field...
The proprietary implementation often has the benefit of several more years of iteration with real products than the open standards. 'Supplanted' can only really be evaluated in terms of popularity, not newness or features, because features on paper aren't features in practice until they pay for their migration cost.
That's a wild perspective. I don't know how you can really come to that conclusion either. One attempt at getting Blender to render something using an AMD vs Nvidia card will paint a very very clear picture.
You're entitled to your opinion (which I agree with in broad strokes) but with respect, the op article is specifically about ML. Calling cuda a "gimmick" is silly and completely underestimating the datacenter/ML cluster market share (it dwarfs consumer GPU), and fact of the matter is AMD's CUDA equivalent segfaults. So if "being actually usable to the biggest market" is a gimmick, so be it.
and yet amd lately has been quietly just been slightly less than nvidia but worse product. amd sucks thats just it. their market share is crumbling and nvidias is getting stronger because people are like fuck it, at that price might as well jsut buy the better one that Just works TM
I personally don't have any insider information but just wanted to add what your saying fits with the meta on the gaming community side where commentators are frustrated that nVidia has so much hubris that they think they can just sell essentially last generation level technology without the step up (I think it was 3xxx vs 4xxx or something like that where you'd expect the 4060Ti to be at least as good as 3070Ti) and just trying to make up for it in "software".
It probably takes a lot of confidence in your software developers to make this kind of decisions.
Are they "incredibly proprietary" compared to the competition? Clearly they aren't. Nvidia offers blobs in both consumer and professional markets. Even going to the extent of gimping performance hardware through drivers on more than one occasion.
That said, I think AMD isn't really competing with Nvidia. Sure, their R&D budget is smallish but it feels like they're somewhat fine with the current status quo.
And while they have an open version of the userland, it's also missing features compared to the proprietary one, etc.
Besides, in the end it truly hardly matters whether the firmware is loaded at runtime or lives in updateable flash. It's still not "your PC" in the Stallman sense either way, it's been tivoized regardless of whether firmware is injected at runtime or during assembly. You cannot load unsigned firmware on AMD anymore either, firmware signing started with Vega (iirc) and checksums now cover almost all of the card configuration similar to NVIDIA.
Firmware is also the only way to get proper HDMI support... which is why AMD still does not support HDMI 2.1 on linux. HDMI Forum will not license the spec openly and implementations must contain blobs or omit those features.
Hey, I am not white knighting for AMD here. For all we know, they could only have been pursuing open standards because they've been forced to, as the underdog.
Can we really assign blame to them specifically for not fighting the hdmi forum on our behalf?
Isn't this sort of how specialized hardware kind of works?
At some point, hardware (necessarily?) evolves to become optimized to do one thing, and then you have to just treat the driver as an API to the hardware.
Even "simple" things like keyboards and mice are now small computers that run their own code, moreso more complex devices like sound cards and hard drives.
And since graphics card performance seems to be the bottleneck in a lot of computing, it has become super specialized and you just hand off a high-level chunk of data and it does magic in parallel with fast memory and spits it out the hdmi cable.
For the keyboard/mouse now being small computers that's been true since the 1970s. Almost all keyboards for a period of about 30 years had an 8048 or 8051 CPU. It's how they serialized the keystrokes. From the model M keyboard through to everything up till the USB era.
What OS do you mean? The closest thing I can think of is the embedded CPU that gets called CP in the ISA docs, which mostly schedules work onto the compute units. That has firmware which is probably annoying to disassemble, but it's hard to imagine it doing anything particularly interesting.
Nah. AMD was already profitable in 2018. This is just big mismanagement.
Just having 30 extra good software engineers focusing on AI would have made such a massive difference, because it's so bad that there's a lot of low hanging fruit.
As someone who was pretty invested in AMD stock since 2018, it always made me pretty angry how bad they managed the AI side. Had they done it well, just from the current AI hype the stock would probably be worth 50 bucks more.
> Nah. AMD was already profitable in 2018. This is just big mismanagement.
Hindsight bias much?
How easily we forget in today's speculative AI bubble that AMD rolled into 2018[1] with 6.1x levered D/E and substantial business uncertainty while the Fed was actively ratcheting interest rates up, and ended the fiscal year still 3.3x levered despite turning operationally profitable[2].
> Had they done it well, just from the current AI hype the stock would probably be worth 50 bucks more.
It strikes me as pretty audacious and quite unconscionable to assert "big mismanagement" while simultaneously crying about speculative short-term profit taking opportunities.
As someone who had like 25% of their portfolio in AMD, it was pretty infuriating being forced to buy Nvidia GPUs every single time because the AMD ones were literally useless to me (lack of AI support and cuda in general).
Yes, there's AI hype right now. But Nvidia gpu datacenter growth isn't new. And AMd were asleep
Not asleep; they just directed their efforts at things that haven't worked out. With their APU lines it looked like they wanted to integrate GPUs completely into the CPU - that was hardly asleep to the importance of GPU compute.
The problem they ran in to looks to me to be that they focused on targeting a cost-effective low end market and were caught off-guard by how machine learning workloads work in practice - huge burst of compute to train, then much lower requirements to do inference. That isn't something they were strategically prepared for and that isn't something that software industry has seen before either.
Won't save them from market forces, but their choices to date have been reasonable.
Seriously, look long and hard at those numbers, and when you think you understand what they might mean, consider them again and again until the feeling of insurmountable adversity sinks in and you're on your knees begging public equity markets for an ounce of capital and a pinch of courtesy faith...on the promise of meaningful risk-adjusted ROIC to be delivered in just a few years.
> But Nvidia gpu datacenter growth isn't new. And AMd were asleep
...which is why this remark comes off as sheer arrogance (no disrespect).
Su and the rest of AMD leadership certainly weren't asleep. The difference here is while you're busy scouting speculative waters defended by competition with deep battle pockets and an even deeper technical moat, Su was simply preoccupied bringing a zombie company back to life and building up enough health to slay a weaker giant.
Personally, I was already beyond impressed with one miracle delivered.
>As someone who had like 25% of their portfolio in AMD
>Nah. AMD was already profitable in 2018. This is just big mismanagement.
I guess you know they have debt, and they were paying them off, and were battling with other issues all the way till 2019 / 2020 when Intel had their misstep so they could gain something in the CPU server market?
Yes. And they still could have afforded 30 software engineers to work on ai/compute painpoints.
But let's asume they thought it was too expensive back then. There's still no reason not to invest in software in 2020 when their gross margin was absurd.
Yup. Have 15 of those software engineers contribute pull requests to PyTorch to make its OpenCL support on par with CUDA and take the other 15 engineers to do the same for TensorFlow and AMD would already be a serious contender in the AI space.
I'm not so sure anymore. The big reason is that now that the ML framework ecosystem has fragmented into different "layers" of the stack, very few people are directly writing CUDA kernels anymore.
As a result, with things like XLA now supporting AMD GPUs using RoCM under the hood the feature gap has closed A LOT.
Sure, Nvidia still has the performance crown lead with CuDNN, NCCL, and other libraries providing major boosts. But AMD is starting to catch up quite fast.
> it might be too late and the proprietary ecosystem has probably won.
Compiler ecosystems can and have changed rather quickly. Especially given that most NNs run on a handful of frameworks. Not _that_ many people are writing directly on top of CUDA/cuDNN.
Make an equivalent toolchain that runs on cheaper hardware and the migration would be swift.
Currently AMD hardware is a bit behind and the toolchain is frustratingly buggy, but it's probably not as big of a moat as NVIDIA are trading on. Especially since NV's toolchain isn't particularly polished either.
>AMD picked open standards and made investments on open source frameworks and libraries commensurate with their financials, the hope being that the community could help pick up some of the slack.
This has been their claim, but more often than not they haven't actually done anything to encourage the community to pick up slack. So many of their graphics tools have been released with promises of some sort of support or of working with the community yet have basically had nothing to help the community help them.
Even accepting the unreasonable idea that they can't afford the full-time developers for the various tools and libraries they come up with, they often don't even really work with the community to build and maintain those.
One of the bigger cases which contributed to turning me off from AMD GPUs was buying a 5700XT at launch, eager to work on stuff using AMD specific features, only to be led on for over a year about how ROCm support was coming soon, every few months they'd push back the date further until they eventually just stopped responding at all. Trying to develop on their OpenGL drivers was a similar nightmare as soon as you wandered off the old well worn paths to more modern pipeline designs.
Another glaring example would be Blender's OpenCL version of Cycles, which was always marred with problems and hacks to work around driver issues. They tried to work with AMD for years before finally just dropping it and going for CUDA (and thus HIP) even though AMD's HIP support, especially on Windows, is still in a very early state.
They've been getting piles of money from Ryzen for 5-6 years now. How long am I supposed to wait?
According to the latest ROCm release notes, it supports Navi 21. Well, at least the pro models. It doesn't even mention the 5000 or 7000 cards. My current understanding is that 7000 support is mostly there a few months late and 5000 was abandoned partway done after years of vague promises.
At least it might support windows soon. Not my sub-4-year-old GPU, of course, god forbid. But most of the rest of them.
AMD wasn't very profitable until 2018. The company's debt to equity ratio was terrible (due to previous CEO mistakes 2000-2012) until they paid off their huge debts with Ryzen 3 in ~2020. Be patient, grasshopper ..
> They've been getting piles of money from Ryzen for 5-6 years now
Hardware is very capital intensive. They've not been making much until much more recent. From 2012 through 2017, almost all years were a net loss. They hit $1B net profit only in 2020. I imagine quite a bit of that money went into keeping/accelerating the pace of Ryzen, and paying off debts. Only now do they have more breathing room for other endeavors. If they diverted a chunk of that change to AI, they probably would have a lower performing Ryzen right now.
Nvidia didn't pull their AI leadership out of thin air overnight, but they shipped the first CUDA capable consumer cards with the GeForce 8000 series way back in 2007 and committed to this ecosystem over the years, consistently investing in the HW, SW.
By the time AMD woke up and shipped ROCm in 2016, Nvidia already had nearly 10 years head start and a cemented moat in this field. AMD now has a huge mountain to climb to catch up to Nvidia.
AMD invested significantly into OpenCL prior to 2016. It seemed like a safe bet -- the open industry standard usually ends up beating the proprietary standard in the long run.
Especially for something like this, with massive open source / open standard companies like Google as heavy users. It seems surprising to me that Google didn't ensure that open standards won in an area that they are so heavily dependent on.
In the link provided, the CUDA example only show the compute kernel itself and not the boilerplate required to run it. On the other hand, your OpenCL example only show the boilerplate.
Google is part of the Khronos group. They were well positioned to steer the standard towards one that doesn't suck. Or they could have championed a different standard. Google has the scale that only Google is to blame that they are still heavily dependent on a closed standard.
Open Standards almost always beat closed ones. IMO AMD was right to bet on open standards. They lost the bet but I think it was the right bet.
Because all the researchers that used GPUs for CV and ML used what they had at their disposal, which was Nvidia GPUs and CUDA.
OpenCL brought no advantage here considering it only worked on AMD GPUs which were lackluster in performance and switching from CUDA to OpenCL meant extra work that researches already iterating on CUDA weren't willing to do.
OpenCL did (and still does I think) work on nvidia cards. People I talked to back in the day complained more about OpenCL being "C but on GPUs" while cuda was more akin to C++. They could move faster and do more in cuda and the nvidia lock in didn't matter as the fastest cards of the day were nvidia. I think vega cards were faster (or faster per dollar maybe) for some of the code that was relevant, but not by much and by that point legacy code lock in had taken over.
CUDA is compiled into PTX, an intermediate language. PTX is then compiled into a specific NVidia assembly language (often called SASS, though each SASS for each generation of cards is different). This way, NVidia can make huge changes to the underlying assembly code from generation-to-generation, but still have portability.
OpenCL, especially OpenCL 1.2, (which is the version of OpenCL that works on the widest set of cards), does not have an intermediate language. SPIR is an OpenCL2.+ concept.
This means that OpenCL 1.2 code is distributed in source and recompiled in practice. But that means that compiler errors can kill your code before it even runs. This is especially annoying because the OpenCL 1.2 compiler is part of a device-driver. Meaning if the end-user updates the device driver, the compiler may have a new bug (or old bug), that changes the behavior of your code.
-------------
This doesn't matter for DirectX, because like CUDA, Microsoft compiles DirectX into DXIR / DirectX intermediate language. And then has device drivers compile the intermediate-language into the final assembly code on a per-device basis.
-------------
It is this intermediate layer that AMD is missing, and IMO is the key to their problems in practice.
SPIR (OpenCL's standard intermediate layer) has spotty support across cards. I'm guessing NVidia knows that PTX intermediate language is their golden goose and doesn't want to offer good SPIR support. Microsoft probably prefers people to use DirectX / DXIR as well. So that leaves AMD and Intel as the only groups who could possibly push SPIR and align together. SPIR is a good idea, but I'm not sure if the politics will allow it to happen.
It's really difficult to tell whether the PTX layer approach is something AMD _should_ adopt. That's roughly what the (I think now abandoned) HSAIL thing was.
It's one where packaging concerns and compiler dev concerns are probably in tension. Compiling for N different GPUs is really annoying for library distribution and probably a factor in the shortish list of officially supported ROCm cards.
However translating between IRs is usually lossy so LLVM to PTX to SASS makes me nervous as a pipeline. Intel are doing LLVM to SPIRV to LLVM to machine code which can't be ideal. Maybe that's a workaround for LLVM's IR being unstable, but equally stability in IR comes at a development cost.
I think amdgpu should use a single llvm IR representation for multiple hardware revisions and specialise in the backend. That doesn't solve binary stability hazards but would take the edge off the packaging challenge. That seems to be most of the win spirv markets at much lower engineering cost.
But as an OpenCL programmer, you don't distribute PTX intermediate code. You distribute OpenCL kernels around and recompile every time. That's more or less the practice.
And the resulting PTX is worse when it's generated from OpenCL C instead of CUDA C. I tested that recently with a toy FFT kernel and the CUDA pipeline produced a lot more efficient FMA instructions.
Nvidia took a big gamble with CUDA and it took years and ton of investment to get there. Jensen Huang talks about it on commencement speech he did in Taiwan recently here https://www.youtube.com/watch?v=oi89u6q0_AY It's a big moat to cross.
I always had hope that ROCm will be able to compete with CUDA but it’s nowhere here despite the time. Seems funny to see that Intel is doing a better job at that with OneAPI.
This is my thought as well. Their devs who work on the graphics drivers are heavily underpaid in Canada. As an ex-AMD, I took AMD's offer for 20k less than another software company because I like low level stuff and have always been an AMD fan since the Athlon days. But when Amazon offered me double, I easily took Amazon's offer and left AMD after less than a year.
I think a lot of companies are overpaying their software people, but if there is any one that should pay their devs much more it would be AMD, because they are in a position to compete against Nvidia if their software integrated well with the AI training stuff.
Why do you think that people are being overpaid if you literally left a job you liked to make more money? Being underpaid is just being underpaid; just because the numbers are high at the big tech companies doesn't mean that those companies are overpaying. They're probably underpaying! The dollar just isn't worth what it used to be.
(Canada does chronically underpay its software engineers, though.)
Not going to argue whether we are being overpaid or not (I certainly hope we aren't because now I'm getting even crazier compensation than my Amazon days lol). But I think the current layoffs which are putting downward pressures on salaries will prove that we were getting overpaid.
My main point was still that AMD should really pay a lot more than what they currently are paying, they actually already increased it quite a bit compared to 3 years ago, but not nearly enough! In a way, I think this reflects poorly on Lisa Su because she didn't invest enough into AI while it should have been obvious from the start.
The AMD RX 580 was released in April 2018. AMD had already dropped ROCm/HIP support for it by 2021. They only supported the card for 3 years. 3 years. It's so lame it's bordering on fraudulent, even if not legally fraud.
I know CUDA via their HIP is a moving target they don't control making it hard and expensive for them to prevent bit rot but this is still an AMD caused problem due to opencl not getting any love by anyone anymore. AMD included.
Also, while AMD's OpenCL implementation has more features on paper, the runtime is frequently broken where NVIDIA's claimed features actually all work. Everything I've heard from people who've used it is that they ended up with so much vendor-specific code to patch around AMD's bugs and deficiencies that they might as well have just written CUDA in the first place.
This is an old article but the old "vendor B" stuff still rings incredibly true with at least AMD's OpenCL stack as well.
Thus NVIDIA actually has even less of a lock-in than people think. If you want to write a better OneAPI ecosystem and run it on OpenCL runtime... go hog wild! NVIDIA is best at that too! You just don't get the benefit of NVIDIA's engineers writing libraries for you.
I think Intel is still pushing opencl on GPUs. Maybe with other layers on top. Sycl or oneapi or similar. AMD mostly shares one implementation between hip and opencl so the base plumbing should work about as well on either, though I can believe the user experience is challenging.
I wrote some code that compiles as opencl and found it an intensely annoying experience. There's some C++ extension model to it now which might help but it was still missing function pointers last time I looked. My lasting impression was that I didn't want to write opencl code again.
Sticking to an old kernel and libraries is a pain id your hardware is not purpose-specific. Newer downstream dependencies change and become incompatible: e.g. Tensorflow 2 (IIRC) was incompatible with the ROCm versions that work with the 580. New models on places like HuggingFace tend to work with recent libraries, so not changing to a new toolchain locks you in to SoTA a few years in the past. In my case, thr benchmarking I did for my workloads showed comparable perf between my RX580 and Google Colab. So I chose to upgrade my kerbel and break ROCm
Yeah, that's fair. Staying in the past doesn't work forever.
There are scars in the implementation which suggest the HSA model was really difficult to implement on the hardware at the time.
It doesn't look like old hardware gets explicitly disabled, the code that runs them is still there. However writing new things that only work on newer hardware seems likely, as does prioritising testing on the current gen. So in practice the older stuff is likely to rot unless someone takes an interest in fixing it.
the range of technology that needs to come together for ai training is underestimated. there is cuda of course, but there is also nccl, infiniband, gpudirect, each of which requires years of sw and hw maturity. unlike the cpu which has a clean interface (instruction set) the gpu has no such thing - it is more like an octopus with tentacles into networking, compute, storage etc.
> The trouble is that AMD just didn't take AI seriously.
No worries, AI is not very complicated tech. It's just a core that can do arithmetic (something AMD already knows how to do very well) copied a very large number of times on a chip, plus some interconnect.
CPUs with all their speculative execution and random memory access patterns are much more complicated.
AI is more than just the underlying math. The software ecosystem is very important, which is what NVIDIA's lead is built on. AMD has a very hard time providing an "it just works" type experience in the way that NVIDIA offers these days.
Machine learning engineers (or most people writing GPU code) do not typically have the time, knowledge or interest to diagnose driver issues and beg AMD engineers to address them in a reasonable time frame.
That would explain my Radeon Graphics card I never managed to get working properly. It arbitrarily froze. I was told that it did that for Linux and that it was guaranteed to work on Windows. But when I tried it on Windows, it did the exact same thing. They were unresponsive.