I wonder if someone knowledgeable could comment on OneAPI vs Cuda. I feel like i...

meragrin_ · on April 9, 2024

Apparently, Google, Qualcomm, Samsung, and ARM are rallying around oneAPI:

ZoomerCretin · on April 9, 2024

I'm not familiar with the particulars of OneAPI, but it's just a matter of rewriting CUDA kernels into OneAPI. This is pretty trivial for the vast majority of small (<5 LoC) kernels. Unlike AMD, it looks like they're serious about dogfooding their own chips, and they have a much better reputation for their driver quality.

JonChesterfield · on April 9, 2024

All the dev work at AMD is on our own hardware. Even things like the corporate laptops are ryzen based. The first gen ryzen laptop I got was terrible but it wasn't intel. We also do things like develop ROCm on the non-qualified cards and build our tools with our tools. It would be crazy not to.

sorenjan · on April 9, 2024

Why isn't AMD part of the UXL Foundation? What does AMD gain from not working together with other companies do make an open alternative to Cuda?

Please make SYCL a priority, cross platform code would make AMD GPUs a viable alternative in the future.

JonChesterfield · on April 10, 2024

Like opencl was an open alternative? Or HSA? Or HIP? Or openmp? Or spir-v? There are lots of GPU programming languages for amdgpu.

Opencl and hip compilers are in llvm trunk, just bring a runtime from GitHub. Openmp likewise though with much more of the runtime in trunk, just bring libhsa.so from GitHub or debian repos. All of it open source.

There's also a bunch of machine learning stuff. Pytorch and Triton, maybe others. And non-C++ languages, notably Fortran, but Julia and Mojo have mostly third party implementations as well.

I don't know what the UXL foundation is. I do know what sycl is, but aside from using code from intel I don't see what it brings over any of the other single source languages.

At some point sycl will probably be implemented on the llvm offload infra Johannes is currently deriving from the openmp runtime, maybe by intel or maybe by one of my colleagues, at which point I expect people to continue using cuda and complaining about amdgpu. It seems very clear to me that extra GPU languages aren't the solution to people buying everything from Nvidia.

ZoomerCretin · on April 9, 2024

Yes that's why I qualified "serious" dogfooding. Of course you use your hardware for your own development work, but it's clearly not enough given that showstopper driver issues are going unfixed for half a year.

FeepingCreature · on April 10, 2024

Way more than half a year. The 7900XTX came out two years ago and still hits hardware resets with Stable Diffusion.

wmf · on April 9, 2024

IMO dogfooding Gaudi would mean training a model on it (and the only way to "prove" it would be to release that model).

pjmlp · on April 10, 2024

Only for CUDA kernels that happen to be C++, good luck with C, Fortran and the PTX toolchains for Java, .NET, Haskell, Julia, Python JITs,...

Althought at least for Python JITs, Intel seems to also be doing something.

And then there is the graphical debugging experience for GPGPU on CUDA, that feels like doing CPU debugging.

alecco · on April 9, 2024

Trivial??

ZoomerCretin · on April 9, 2024

That statement has two qualifications.

TApplencourt · on April 9, 2024

You have SYCLomatic to help.

JonChesterfield · on April 10, 2024

(reply to Zoomer from further down, moving up because I ended up writing a lot)

This experience is largely a misalignment between what AMD thinks their product is and what the Linux world thinks software is. My pet theory is it's a holdover from the GPU being primarily a games console product as that's what kept the company alive through the recent dark times. There's money now but some of the best practices are sticky.

In games dev, you ship a SDK. Speaking with personal experience here as I was on the playstation dev tools team. That's a compiler, debugger, profiler, language runtimes, bunch of math libs etc all packaged together with a single version number for the whole thing. A games studio downloads that and uses it for the entire dev cycle of the game. They've noticed that compiler bugs move so each game is essentially dependent on the "characteristics" of that toolchain and persuading them to gamble on a toolchain upgrade mid cycle requires some feature they really badly want.

HPC has some things in common with this. You "module load rocm-5.2" or whatever and now your whole environment is that particular toolchain release. That's where the math libraries are and where the compiler is.

With that context, the internal testing process makes a lot of sense. At some point AMD picks a target OS. I think it's literally "LTS Ubuntu" or a RedHat release or similar. Something that is already available anyway. That gets installed on a lot of CI machines, test machines, developer machines. Most of the boxes I can ssh into have Ubuntu on them. The userspace details don't matter much but what this does do is fix the kernel version for a given release number. Possibly to one of two similar kernel versions. Then there's a multiple month dev and testing process, all on that kernel.

Testing involves some largish number of programs that customers care about. Whatever they're running on the clusters, or some AI things these days. It also involves a lot of performance testing where things getting slower is a bug. The release team are very clear on things not going out the door if things are broken or slower and it's not a fun time to have your commit from months ago pulled out of the bisection as the root cause. That as-shipped configuration - kernel 5.whatever, the driver you build yourself as opposed to the one that kernel shipped with, the ROCm userspace version 4.1 or so - taken together is pretty solid. It sometimes falls over in the field anyway when running applications that aren't in the internal testing set but users of it don't seem anything like as cross as the HN crowd.

This pretty much gives you the discrepancy in user experience. If you've got a rocm release running on one of the HPC machines, or you've got a gaming SDK on a specific console version, things work fairly well and because it's a fixed point things that don't work can be patched around.

In contrast, you can take whatever linux kernel you like and use the amdkfd driver in that, combined with whatever ROCm packages your distribution has bundled. Last I looked it was ROCm 5.2 in debian, lightly patched. A colleague runs Arch which I think is more recent. Gentoo will be different again. I don't know about the others. That kernel probably isn't from the magic list of hammered on under testing. The driver definitely isn't. The driver people work largely upstream but the gitlab fork can be quite divergent from it, much like the rocm llvm can be quite divergent from the upstream llvm.

So when you take the happy path on Linux and use whatever kernel you happen to have installed, that's a codebase that went through whatever testing the kernel project does on the driver and reflects the fraction of a kernel dev branch that was upstream at that point in time. Sometimes it's very stable, sometimes it's really not. I stubbornly refuse to use the binary release of ROCm and use whatever driver is in Debian testing and occasionally have a bad time with stability as a result. But that's because I'm deliberately running a bleeding edge dev build because bugs I stumble across have a chance of me fixing them before users run into it.

I don't think people using apt-get install rocm necessarily know whether they're using a kernel that the userspace is expected to work with or a dev version of excitement since they look the same. The documentation says to use the approved linux release - some Ubuntu flavour with a specific version number - but doesn't draw much attention to the expected experience if you ignore that command.

This is strongly related to the "approved cards list" that HN also hates. It literally means the release testing passed on the cards in that list, and the release testing was not run on the other ones. So you're back into the YMMV region, along with people like me stubbornly running non-approved gaming hardware on non-approved kernels with a bunch of code I built from source using a different compiler to the one used for the production binaries.

None of this is remotely apparent to me from our documentation but it does follow pretty directly from the games dev / HPC design space.

ZoomerCretin · on April 10, 2024

Wow thank you for the insight. I appreciate you taking the time to write all of this out, and also for your stubbornness in testing!