Red Hat to author new Linux driver for Nvidia GPUs in Rust

doix · on March 21, 2024

I am totally out of the loop here, but wasn't the main problem with Nouveau the lack of documentation from Nvidia? They had to reverse engineer things to get it working.

I get that some of the architectural choices no longer make sense, and starting from scratch will address those. But is the goal to have performance that is somewhat comparable to the proprietary drivers? Or just good enough to run the desktop environment with hardware acceleration.

uluyol · on March 21, 2024

The problem with Nouveau was that Nvidia hardware could not be reclocked (the cores were basically always in low power mode) without a signed firmware blob. That blob couldn't be legally distributed except by Nvidia, so the open source folks more or less gave up on a high performance driver.

Now, for newer hardware, Nvidia has changed some aspects of the firmware and allows redistribution. So it's feasible to make a good open source driver.

AMD and Intel also use different drivers for different hardware generations, since eventually things change so much that it's better to start clean.

With regards to reverse engineering, Mesa has a number of reverse engineered drivers. That isn't anything new.

wengo314 · on March 21, 2024

the problem was the nouveau would do this before when firmware was easy to intercept from the driver (and could be extracted easily). the newer driver uses different methods of firmware upload and it's a real chore to do so now.

so nouveau gave up on it. they also expected nvidia to drop some firmware.

now that newer cards have GSP.bin firmware, which can be interfaced with easily - things are different. i would wager a guess that it's similar to atombios from amd. you just call a function in GSP and it knows what registers to poke with the right values to achieve what you need.

porphyra · on March 21, 2024

My understanding is that nowadays most of the heavy lifting is done by magic going on in the firmware, so the actual driver is relatively simple and is open source: https://github.com/NVIDIA/open-gpu-kernel-modules

roenxi · on March 21, 2024

Any efforts by Nvidia are welcome, but that raises the question of why Red Hat are writing a new driver. Presumably there is some aspect of the OSS component that is dissatisfying Red Hat.

It is weird for a 3rd party to be maintaining a 2nd driver when the first party has a reasonable OSS driver.

doctorpangloss · on March 21, 2024

There are a few things someone would pay Red Hat dearly to do. In my opinion the most likely are:

- No telemetry.

- Enabling software blocked features.

- Emulation.

There are few possibilities about making a performant OSS driver:

- This is impracticable, because the software you want to run on the GPU is so complex, and the hardware is so complex, that you will never get enough insights to make something that compares with the people who can see it all.

- This is eminently practicable because: (1) the software is much simpler, and the GPU hardware much simpler. Perhaps there is a lot of obfuscation of the simplicity. Or (2) the application that best utilizes the hardware only needs a limited feature set that is within scope.

I'm leaning on "application limited scope" and "telemetry." It aligns best with what is actually happening, which is NVIDIA is scooping up a lot of valuable intelligence on LLM workloads; and that there isn't enough competition for LLM "ASICs" to make them cheap enough to be worthwhile.

gigatexal · on March 21, 2024

I never understood all the hate we in the OSS world have with telemetry. Why? I mean if nvidia and red hat paid for this to be developed wouldn’t it help them continue to develop it by knowing how and where and how often it was used?

doctorpangloss · on March 21, 2024

Apple, which collects a lot of telemetry, made privacy a tenet of their brand, and they've successfully conflated the two in the mind of a layperson.

Separately, most telemetry would tell stories like, "This project is a failure." Little incentive for people to adopt it. Then again, most OSS is ordered by the mania of its programmer-creators, not product or engineering quality informed by telemetry. Maybe in a Darwinian way, we only have the OSS that can thrive without telemetry and reactive product and engineering decisions.

rstuart4133 · on March 22, 2024

Debian, RedHat and Firefox do have telemetry. Not a lot, but enough to prove it is possible to do it in a way that doesn't piss most people off.

Naturally the way they collect it is open source, it's largely de-identified, and because it open source you verify it's de-identified enough for you. And if it isn't you can turn it off.

So it's possible to do well, where "well" means gets you the data you without pissing off the users. Most proprietary don't bother to do it well for whatever reason.

But they should be careful: it a big factor in diving things like Home Assistant, Linux Desktop and now this, apparently.

speed_spread · on March 21, 2024

Because it's never just for engineering purposes anymore. Marketing departments are CRAZY about consumer data of any kind and you can be assured the value of such collected data will be maximized on the information exchange market. Advertisers pay good money for trustable data streams, which can be essentially de-anonymized with enough cross correlation between sources. A GPU happens to be a good place to capture user activity, since from the screen buffer you can tell what website they visit, what videos they watch and what games they play.

This stuff is rather hush hush not to scare people but it's also well documented and certainly not an unfounded conspiracy. Its also why Europeans have adopted far reaching privacy laws (GDPR), their industries don't rely on consumer surveillance the way Americans have developed for the last decades.

fragmede · on March 21, 2024

at the very least, crash reporting.

greggsy · on March 21, 2024

The telemetry aspect is easily resolved by any enterprise that cares enough about it. The conclusion that this is more to do with the overall architecture and deployment of the existing driver is much more plausible.

doctorpangloss · on March 21, 2024

> The telemetry aspect is easily resolved by any enterprise that cares enough about it.

Yeah, this is how, by writing your own driver. NVIDIA sells you turnkey DGX machines. It doesn't give you firmware. You have to be Internet-connected to refresh your various licenses, at some point, which is the moment the telemetry is shared. Google "NVIDIA telemetry."

If you are using NVIDIA on the cloud, well all bets are off. You are using their drivers. Amazon can't force you, in your VM, to install a different driver for the GPU you are using - there's no alternative to the proprietary one. Hence, my theory for why Red Hat could be paid to do this.

Telemetry is something that NVIDIA doesn't budge on for enterprises. You're welcome to see for yourself and start a sales call. I hear they're pretty busy.

zekica · on March 21, 2024

> but that raises the question of why Red Hat are writing a new driver

Nvidia's driver cannot be included in the upstream linux as it doesn't follow kernel coding style and code organization, but more importantly it is tightly coupled to a single version of their GSP firmware - they have to be updated at the same time.

cozzyd · on March 21, 2024

Perhaps it has something to do with getting Nvidia drivers to compile properly with RH's Frankenstein kernels?

Tuna-Fish · on March 21, 2024

Shader compilation happens on CPU, and is neither simple nor (for the nvidia compiler) open source.

GPU drivers are significantly more complicated than any other drivers because they do things that are not considered part of a driver for any other device, the proprietary nV driver contains roughly as much code as the linux kernel.

denzil · on March 21, 2024

I'm not sure if it's different for Nvidia, but the Gamer Nexus video about Intel drivers suggest a lot of the work getting good performance is done on the driver side for all GPU vendors: https://www.youtube.com/watch?v=Qp3BGu3vixk

weikju · on March 21, 2024

Cynically, I’m thinking the main “problem” being solve here is the current drivers are not written in Rust

KingOfCoders · on March 21, 2024

My first thought too.

throwaway48476 · on March 21, 2024

Unless the problem is memory safety, probably not.

logicprog · on March 21, 2024

Well, concerning memory safety, that might be a more important feature than you think. Allow me to quote a comment I just got an email notification for on a GNOME thread about NVIDIA:

> I wish I could be more optimistic about mutter!3304 coming soon, though... it's been 5 months already, and it doesn't seem anywhere closer to being merged into main To make matters worse, patched mutter 45.5 seems to be causing use-after-free on NVIDIA's kernel driver.

worthless-trash · on March 21, 2024

There is another perspective you msy not be thinking of, developer availability

In 20 years when the current crop of greybeards are dead, how many experienced c programmers with kernel experience will exist.

Rust lowers the bar for contributing and also increases the pool of programmers who can.

xorcist · on March 21, 2024

That argument is questionable. C is trivial. The PC architecture is not, and the Linux architecture absolutely not. What makes kernel programming hard is that there is a lot of knowledge that must be internalized, exactly how the DMA works for certain hardware, or why this specific bit need to be set in that lock. There's often no simple way of debugging or single stepping these things without simulating hardware.

The kernel could have been written in assembly with macros, it wouldn't make its development any harder. The best Rust can do is not make kernel more difficult than it already is.

viraptor · on March 21, 2024

You're missing on a lot of things Rust (or any language with non-toy types) can provide. Lock ordering, better accessible complex structures, enforcement of enumerated options, rich description of APIs, and many others. Atomic values are usable transparently https://github.com/AsahiLinux/linux/blob/97c628055904a7f2ef1... and multithreaded reference counting is easily enforced https://github.com/AsahiLinux/linux/blob/bd0a1a7d465fcb60685... also issues like type confusion https://www.vicarius.io/vsociety/posts/a-type-confusion-bug-... are less likely if you can easily use tagged unions checked by the compiler.

The best Rust can do is enforce a number of invariants which people extremely experienced in writing "trivial C" still miss every day.

Barrin92 · on March 21, 2024

Linus himself made that point a few months ago (https://youtu.be/YyRVOGxRKLg). If you take a look at how devoid large, important, old OSS projects are of young talent in their 20s, 30s and sometimes even 40s it's looking incredibly bleak.

Rust is technically an improvement over memory unsafe language but it also has created enthusiam among largely young and proficient coders. The ecosystem has a lot of dynamism. Drawing those people in is one of the, if not the single most important thing for the health of projects going forward with how many leaders are close to retirement.

kelnos · on March 21, 2024

> Rust lowers the bar for contributing

Not sure I agree with that. C is a very easy language to learn. The problem with it is that you have to be careful with memory management. That does take effort to learn, and I still make mistakes after writing C for over 20 years.

I love Rust, but it is a very complex language, with a complex, full-featured stdlib, and rich, sometimes-inscrutable type system. It is more difficult to learn than C, but also more difficult (and sometimes impossible) to make many of the mistakes that you can make in C.

littlestymaar · on March 21, 2024

Rust isn't that hard to learn, and many of its user come from managed languages who never dared to learn C, because “it's too hard”.

C looks easy on the surface, but the syntax is pretty dated and full of footguns (yes even just the syntax, not even talking about UBs), and learning the language is a pretty intimidating experience because every time you think you know something, you actually don't and get bitten later.

Rust on the other hand is a good language for CS students: you have a lot of things to assimilate upfront, but when you've reached the level required to fulfill the class, you're actually ready to use it in production, and the resulting code produced by a sophomore will be more stable than C code written by wizards.

mananaysiempre · on March 21, 2024

I know of two old syntactic footguns (assignment vs equality comparison; precedence of bitwise operators) and a single new one no one cares about (sizeof(int)+1 vs sizeof(int){+1}). The rest of the common bugbears look to be caused solely by bad pedagogy (the declaration syntax is not TYPE NAME; the switch statement is a computed goto not a multiarm conditional; there is no separate struct declaration). What am I missing?

unrealhoang · on March 21, 2024

> C is a very easy language to learn.

To learn the language, yes. Definitely not an easy language to learn to write production software in.

Whereas an intermediate dev can easily get Rust basic in 2-3 months and can productively contribute to the safe portion of a complex project.

SAI_Peregrinus · on March 21, 2024

Yep. I like to compare C to Brainfuck. Brainfuck is an even smaller, simpler language than C. You can learn Brainfuck in 10 minutes or less. If a language being easy to learn implies that it's easy to use productively, then Brainfuck ought to be one of the most productive languages on the planet! Of course in practice it's so small & so simple & provides so little to the programmer that it's nearly impossibly difficult to use for non-trivial programs.

The ease of writing software in a given programming language is not a linear function of the complexity of the programming language.

Very simple languages are very difficult to use. Very complex languages are very difficult to use. Languages of intermediate complexity tend to be much easier to use than those at the extremes. C is more towards the "simple" extreme than the ideal, Rust is (IMO) a bit more towards the "complex" extreme than the ideal, but is closer to the ideal than C.

sham1 · on March 21, 2024

I do somewhat doubt that. `no_std` Rust is still quite different from normal, userspace Rust. You don't get all the fancy libraries or whatever (of course Linus would probably just veto them anyway) and the Linux kernel development model is probably quite different (IIRC no cargo and whatnot).

So… what should we reckon? Would there a difference in getting new developers into `no_std` Rust in the kernel, and how different would that be versus having people learn freestanding C, with all the kernel add-ons and nicknacks?

worthless-trash · on March 21, 2024

Systems programming is always a full step sideways from traditional applications and I do see where you are coming from.

I would still reckon that having familiarity with the standard rust (even with std) will still have more programmers willing to make the leap than learn C for this one project.

I know of less than a handful of C projects starting in 2024, I know there is a bunch in rust , even with no_std.

charlotte-fyi · on March 21, 2024

`no_std` with `alloc` isn't that different.

leoedin · on March 21, 2024

Yeah it’s a weird take. The hard part about learning rust is borrow semantics and understanding the kinds of architectures the compiler will and won’t allow.

sham1 · on March 21, 2024

Well maybe it's just me, but the borrow semantics were never all that bad. Of course there were things that needed to be relearnt, but it's not all that bad all things considered.

SAI_Peregrinus · on March 21, 2024

Coming from C, the borrow semantics are basically what I try to get anyway, violating them only with suspicion and great care. Rust just makes it easy to check that I didn't screw up what I was already trying to do.

anonzzzies · on March 21, 2024

20 years is not a very long time and people are still learning c. My 17 year old intern learned c in college and prefers it over ‘modern crap’ (his words, but he wasn’t talking about rust; JavaScript). We do embedded (low powered mcu’s with a few kb of mem) and I too prefer c over rust for it; it’s a lot easier to attract c than rust people in my experience for this work. Most never touched rust and wouldn’t see a reason for it. Many of them are young.

kelnos · on March 21, 2024

> My 17 year old intern learned c in college and prefers it over ‘modern crap’

I had a lot of opinions like this too when I was 17, but age and experience has disabused me of many of them.

The embedded story on Rust still has many rough edges, but it's improving every day, and I could easily see it replace C in many places, given enough time (I wouldn't be too surprised if we eventually see companies distributing a BSP that's written in Rust).

Frankly, at this point in time, I think it's foolish to start a new project in C unless you have a really good reason to do so. Many embedded systems certainly qualify as a really good reason, but I very much hope that reason diminishes over time.

anonzzzies · on March 21, 2024

> I had a lot of opinions like this too when I was 17, but age and experience has disabused me of many of them.

Sure, I didn't say I agree with them, I barely remember when I was 17, it was that long ago.

I'm saying that young people that I meet are more interested in c so the 'in 20 years no-one knows c' is not exactly true.

We try Rust now and then; it's not worth it yet in my opinion for what we do. The tooling and libs we have for c are vast and like said, c people are really easy to get, Rust not so much.

I hope this changes, but for now it's just too much of a struggle to warrant it. And I was only responding to the fear of not having capable c devs in 20 years. There will be plenty.

consp · on March 21, 2024

> wouldn't be too surprised if we eventually see companies distributing a BSP that's written in Rust

Some niches maybe but most packages are ancient and the cost of supporting rust Vs the gain is massive.

There are also way less people who can write rust. This forum with the "c is bad rust is the new God" attitude is not a representation of the world.

steveklabnik · on March 21, 2024

> (I wouldn't be too surprised if we eventually see companies distributing a BSP that's written in Rust).

The Expressif folks are paying someone to make sure that Rust works well on the esp32.

papichulo2023 · on March 21, 2024

It is not unlikely that shared your bias though. Not like it is a bad thing, but not sure if I would generalize it.

rjsw · on March 21, 2024

> Rust lowers the bar for contributing and also increases the pool of programmers who can.

It reduces the amount of help that current experienced kernel developers can give to newer developers who want to write in rust.

flohofwoe · on March 21, 2024

That's a very optimistic view on the future of Rust, and a very pessimistic view on the future of C. Chances are that C will be longer around than Rust just because of the Lindy Effect ;)

littlestymaar · on March 21, 2024

Yup, and that's exactly Linus' argument for Rust-in-Kernel.

anthk · on March 21, 2024

C? Any Teleco, and from the C++ crew, thousands of programmers can be used to C in days.

worthless-trash · on March 21, 2024

Kernel C isn't userspace C. The bar for kernel C is much higher and less forgiving than userspace c.

I can't talk about 'telco grade c', because I have never experienced it. I have however seen telco submitted code to upstream kernel and its not above the average quality.

prussian · on March 21, 2024

Honestly? Given I've seen crashes and printk messages from AMDGPU with words like "General Protection Fault," I'd say memory safety is probably the most important thing missing in these GPU drivers.

AtlasBarfed · on March 21, 2024

The cynicism if red hat is involved is that red hat wants more control of Linux.

Christ could they just make Wayland usable first?

sph · on March 21, 2024

Please do not sidetrack this thread with Wayland flame bait.

dralley · on March 21, 2024

Wayland is usable, I've been using it for near 5 years.

inamberclad · on March 21, 2024

A coworker of mine formerly worked on Nvidia's proprietary drivers. They remarked at just how wrong the Nouveau driver was about certain things, including the functionality and reasons for certain registers.

kelnos · on March 21, 2024

Around a decade ago, I was introduced to a couple people who it turned out worked at nvidia. After a bunch of conversation, nouveau came up; they were intensely negative about it, mocking it and its developers, criticizing its existence, even, and declaring it would never amount to anything. (Meanwhile, I was running it on a computer at home and it was significantly more stable than the nvidia proprietary driver at the time, and performed well enough for my use.)

I wish I'd had more conversational courage at the time to "stand up" for nouveau with them; it was super rude to shit on someone else's hard work, especially when the reason why they had to do that hard work was because nvidia was a bunch of jerks with a shitty FOSS policy.

throwaway48476 · on March 21, 2024

Yeah look at those idiots getting 80% of the way there with no support. Clearly incompetent.

matheusmoreira · on March 30, 2024

That's completely fucked up and has soured my opinion of nvidia and the people involved with it more than anything they've ever officially done as a company.

bogantech · on March 21, 2024

Given that there's no official documentation and they can only learn through reverse engineering this is not at all surprising.

Presumably RedHat are in the same boat.

wezm · on March 21, 2024

NVIDIA started publishing more of their driver as open-source in 2022, which while not hardware documentation probably helps a lot.

You may be wondering why Red Hat is bothering with this effort then? I assume it’s so that the code can be added to Linux directly as opposed to being out of tree.

https://developer.nvidia.com/blog/nvidia-releases-open-sourc...

wmf · on March 21, 2024

They're still reverse engineering. The hardware is so good that people would rather use an Nvidia GPU with reverse engineered drivers than AMD with open documentation and official open source drivers. Same with Apple and Asahi.

lyu07282 · on March 21, 2024

I was hoping they officially partnered with NVIDIA in that effort, perhaps eventually even replacing the proprietary one on Linux. But then I awoke from that dream, it's less exciting. Still cool though.

Unfortunately Linux Desktop still seems to be too irrelevant for NVIDIA. All current drivers have issues with suspend, multiple displays and Wayland among other things.

xorcist · on March 21, 2024

Which is a bit surprising because they make their money from the AI/LLM hype now, which presumably is all Linux.

So Linux should be a priority for them, just not on the desktop. But the step should be small.

lyu07282 · on March 21, 2024

It probably makes it even worse, its the enterprise/data center cards for ML that are making huge profits and where the growth is, and those don't even have a video out.

saidinesh5 · on March 21, 2024

Does anyone know How or where would this fits alongside the recent NVK driver by collabora?

www.collabora.com/news-and-blog/news-and-events/introducing-nvk.html

uluyol · on March 21, 2024

The kernel driver manages things like display setup, memory management (isolation between processes), and funneling of commands from user space to the hardware. This is more a replacement for Nouveau than NVK.

NVK generates the commands to send to the hardware, by converting vulkan APIs to Nvidia-specific instructions, and feeds them to the kernel driver.

nan60 · on March 21, 2024

This new driver (along with the modules Nvidia have released as OSS [1]) are kernel drivers and do not break userspace. NVK is a Vulkan implementation for Nvidia GPUs as part of the Mesa project (the same project that AMD's userspace drivers are a part of, even though AMDs kernel drivers are in tree). They are both needed to make a cohesive driver.

[1] https://github.com/NVIDIA/open-gpu-kernel-modules

ta12312344 · on March 21, 2024

I don't get it - they want to sell hardware. Microsoft v Linux v MacOS aside, why don't they care at all? I can understand Linux might get a little ... fickle. But why weigh into this? Put all your energy into drivers for all, and sell more product.

someguydave · on March 21, 2024

Nvidia clearly thinks that their driver and SDK is part of their moat and they are mostly correct

thelastparadise · on March 21, 2024

> they want to sell hardware

Tight control of software and licensing is a core part of their business model.

superkuh · on March 21, 2024

>and without the Nouveau baggage that's built up over the years in supporting NVIDIA GPUs going back to its early days.

Not supporting most nvidia hardware is not a good thing. I hope it's adoption by commercial users (home users don't have the latest nvidia gpu generally) doesn't mean Nouveau bitrots from lack of companies with deep pockets paying for dev work. As for the inherent memory safety, well, remember how safe memory safe java was?

nickpsecurity · on March 21, 2024

Java the language was memory safe. The runtime and libraries weren’t all memory safe. The JIT’s can break memory safety. Applets had integration issues. So, what hurt Java were opportunities to go around memory safety instead of attacking that memory-safe code.

Memory-safe, system languages (eg Ada, Rust) address this by having as much as possible in memory safety with either little or zero runtime. Rust makes things safe that weren’t before. It has unsafe which is less used than some other languages. The type checker can also catch many logic-level errors. The safety situation is better with Rust than Java code.

That doesn’t mean it will solve NVIDIA users’ problems, though. They are usually worried about compatibility, performance, and reliability. Rust mostly helps in one area. We’ll see about the rest, especially compatibility, as you said.

mlindner · on March 21, 2024

Java is forever stuck in my head as a bad language because it's UI library performance were so bad. I grew up being annoyed with anything written in Java because of its poor performance on Mac OS X. Notable was the persistent UI lag with anything written in Java.

KerrAvon · on March 21, 2024

Java was crippled by a terrible standard UI library and a failure to recognize the opportunity for a natively-compiled systems language other than C++.

pjmlp · on March 21, 2024

JetBrains is doing quite well with such terrible UI library, with I would take any day over Electron crap, and AOT compilers have been available since 2000, even if only on commercial JDKs, which naturally did not did well on communities that won't pay for their work tools.

kelnos · on March 21, 2024

I think the GP is talking about AWT, which is indeed terrible. JetBrains uses Swing, which is... still not really the best, but IIRC they've done a lot of work to write their own abstractions and helpers and whatnot on top of it to make it more bearable.

pjmlp · on March 21, 2024

Swing is quite powerful, the only downside is that it requires reading books like "Filthy Rich Clients: Developing Animated and Graphical Effects for Desktop Java Applications" to actually understand how to take advantage of its features, and the bad decision to not use by default the platform's L&F.

KronisLV · on March 21, 2024

> JetBrains uses Swing, which is... still not really the best, but IIRC they've done a lot of work to write their own abstractions and helpers and whatnot on top of it to make it more bearable.

Swing feels pretty okay to me, at least in the times I've used it, especially when IDEs have GUI builders when you just want to do RAD.

I do wonder whatever happened to JavaFX, there was some hype around it years back but it doesn't seem like it got super widespread adoption: https://openjfx.io/

pjmlp · on March 21, 2024

It is a long story.

JavaFX was born as a scripting language[0], which most devs didn't like, and then Sun started the process to port it to Java.

In the middle of this, Sun went bankrupt and while Oracle took the JavaFX development further, they didn't see any value in adding yet another GUI framework to Java, and made it open source to the community by the time Java 11 was released.

A company, Gluon took over it [1], making their business case as means to use JavaFX to also target mobile OSes [3]. They also took over the JavaFX GUI designer. [2]

That was mostly ignored, as Java strong points focused on the server, Android is its own story, with their own frameworks, thus Swing was good enough for the market of desktop applications written in Java.

Additionally it didn't help that JavaFX is an additional dependency, with binary libraries, which kind of complicates the deployment story.

However, recently Oracle decided to be a bit more supportive of JavaFX, if it still matters, remains to be seen.

[0] - https://en.wikipedia.org/wiki/JavaFX_Script

[1] - https://gluonhq.com/products/javafx/

[2] - https://gluonhq.com/products/scene-builder/

[3] - https://gluonhq.com/products/mobile/

opless · on March 21, 2024

Actually AWT wasn't awful. It was certainly more performant than swing, in fact swing was the main thing that had me avoiding UI work and eventually dropping java in favour of .net

I thought jetbrains used their own UI layer for their IDE, or am I confusing it with Eclipse there?

pjmlp · on March 21, 2024

Eclipse, they use SWT.

uluyol · on March 21, 2024

The chances of getting Nouveau working well on older Nvidia hardware are essentially zero. There are legal issues with firmware redistribution.

> Little hope of reclocking becoming available for GM20x, GP10x and GV100 as firmware now needs to be signed by NVIDIA to have the necessary access

From https://nouveau.freedesktop.org/

kelnos · on March 21, 2024

When they say "early" days, I think they mean really old GPUs, like before the reclocking locks started becoming a thing.

cozzyd · on March 21, 2024

indeed, already new official nvidia drivers don't support the aging Nvidia card I have in my desktop at home (the desktop is newer than the video card, I moved it from an older computer).

lproven · on March 21, 2024

I had this problem at $JOB-1 where I had a big fat Dell desktop workstation with an nVidia card in it. I don't remember the model but it was a "pro" grade device: Fire or Quadro or some such BS marketing name.

There were lots of old LCD monitors around the office. As soon as I could, I went dual screen. Then I decided to go triple-head.

Problem.

I found a 2nd card, another nVidia, but it was a different GPU generation. The same Nouveau driver can run it, but not the nVidia driver, and you can't have two nVidia drivers installed side-by-side.

I tried multiple nVidia cards from the office spares pile but I couldn't find two the same generation of GPU for ages. When I finally did, the next kernel upgrade nuked the nVidia legacy driver -- it only supports a certain range of kernel versions.

In the end, a colleague took pity and lent me one of his own cards, an old gamer's card with 4 outputs, 3 usable at once. Perfect.

But when he wanted it back, to sell it, it was back into driver hell.

In the end, I managed to get a huge fat double-slot AMD card from the IT department, with 4 outputs, and it Just Worked™ with a FOSS driver.

NVidia driver versions are a massive cluster-fsck and perfectly good working cards are now e-waste because nVidia doesn't maintain its proprietary drivers, doesn't support more than a handful of GPU families in each release, and won't let you install >1 driver at a time.

I am not a gamer. I don't give a rat's ass about 3D performance or CUDA or any of those toys. I just want a shedload of pixels in front of me, updated quickly, with some screens in portrait and some in landscape. (And a desktop that properly supports vertical panels so I can use those screens in any orientation I want with effective use of space.)

rascul · on March 21, 2024

Looks like the oldest card nouveau supports is the Riva TNT from 1998. I think it's fine for nouveau to support that and a new driver not to. I didn't get the impression that nouveau is going away any time soon.

toughruck · on March 21, 2024

Does the compilation time bother anyone?

sph · on March 21, 2024

When's the last time you have compiled the kernel? I bet waiting for Rust drivers to compile on a modern CPU should not take much longer than the last time I compiled the kernel by hand on a Core 2 Duo, circa 2013

nebula8804 · on March 21, 2024

>When's the last time you have compiled the kernel?

2010. I needed to do so for my Linux Kernel university class.

It took 8 hours on this piece of junk Pentium 4 I found in the school's e-waste bin. You know those Black Dell Optiplex machines that were everywhere like a plague.

They were days of such stress but also so much bliss.

redeeman · on March 21, 2024

8 hours seems unreasonably long for a p4, especially in 2010

MrFurious · on March 21, 2024

In 2005, i compiled linux kernel in 15 minutes on AMD Duron 700. Is true that was a very optimized kernel for my architecture, but i think that the typical kernel with all modules could get 1 hour or something.

nebula8804 · on March 21, 2024

Damn, thats pretty fast. Maybe I was doing something wrong in the config step?

redeeman · on March 21, 2024

not sure how much can really be done wrong, worst case would probably be an allyesconfig, which is huge, and while it was still big in 2010, it was considerably less so than now

rincebrain · on March 22, 2024

I cannot overstate how poor the performance of various P4s was at the time.

Especially if you had some crap P4 that made modern gaming laptops look well-circulated.

rincebrain · on March 21, 2024

Given that those optiplexes also had what is called "capacitor plague", that's more literal than you might have thought.

nebula8804 · on March 21, 2024

I was also thinking the same thing when writing it ha ha.

atlasduo · on March 21, 2024

You make two short-sighted assumptions:

1. "Nobody compiles a kernel these days". Sure, maybe not for daily use for most, but lots of people do for work and experimentation. Also, Gentoo.

2. "Everybody runs a modern CPU". Plenty of people still run decade-old hardware, and even older, because it works and in that decade objectives of PC have not significantly changed, so why would hardware?

thelastparadise · on March 21, 2024

I agree with you, but:

This driver is almost certainly not targeting 10 year old hardware.

So when you define your kernel config, you wouldn't include this driver. The new code would be entirely skipped over at compile time.

(Still think "rewrite everything in rust to solve all our problems" is a bit dumb.)

sph · on March 21, 2024

I honestly do not think the Linux Kernel team should worry too much about the very small niche of people that insist on compiling on slow CPUs. Cross compiling exists, and 99% of Linux users run a precompiled vmlinuz binary.

On the other hand, the Rust language team should do their utmost to improve the speed of their compiler.

bionhoward · on March 20, 2024

This is great news. Likely to make it easier to use these GPUs in Linux.

ListeningPie · on March 21, 2024

Now that Nvidia is the leader in AI, there is more of a business case for them to work properly in Linux.

resource_waste · on March 21, 2024

As the other person mentioned.

AI has been flawless on Linux IMO(maybe not when it tries to use RAM instead of VRAM). When I use a server, I have 0 issues.

I used to get an occasional nvidia issue with 2 monitors and steam, but that went away at some-point.

ListeningPie · on March 21, 2024

Do they use different drivers than on their GeForce cards. A NVIDIA driver update broke my kernel.

ZiiS · on March 21, 2024

Their AI stuff has always been Linux first, they have just not cared about Wayland at all or X11 very much.

ListeningPie · on March 21, 2024

Is it on their graphics cards or is it a separate product line?

aumerle · on March 21, 2024

Sigh. So yet another rewrite, which will result in yet another round of regressions. Hopefully, at the end of it we will have something better than yet another unofficial driver with partial functionality that lags behind in support for newer hardware.

It's a real shame that many things in Linux land are so badly designed/maintained that they have to be re-written from scratch every few years. The major exception being the base kernel itself I suppose.

uluyol · on March 21, 2024

Which regressions for Nvidia hardware are you talking about?

Nouveau supports seriously ancient Nvidia hardware. It's perfectly reasonable to make a clean break once in a while (both AMD and Intel have done it in their open source drivers).

tristan957 · on March 21, 2024

This is completely on Nvidia. Don't buy a GPU from them if you care about Linux support.

Mathnerd314 · on March 21, 2024

My understanding was that the proprietary drivers work fine.

fayalalebrun · on March 21, 2024

Even though the newest nvidia drivers have started to support GBM, the Wayland compatibility story is still not great. In my experience on Wayland, several OpenGL programs refuse to work, and Vulkan does not work at all. This is with driver 545.

lakomen · on March 21, 2024

Ok long story, I'll try to keep it short. New motherboard, Asus w680 ace ipmi. It has an extra ipmi card. Nvidia-drm.modeset = 1 is required for Wayland to render anything other than 1920×1080. I have a 4k monitor. I insert the IPMI card, disable the aspeed gpu on the card via jumper. The card is still active. But the BCM isn't working, correct cabling and all. The Aspeed card being active clashes with the Nvidia 4070 card. I have to disable modeset. Effectively I can't do Wayland with the IPMI card plugged in. Bonus fun: I ordered the motherboard via Amazon DE, they imported it from Amazon US, because with import taxes it's over 150€ cheaper than directly from DE. I contact Asus support (which was hard enough, I can not talk to Asus US support only DE), and waste about a month doing pointless things, going past the instant return period of Amazon. They tell me to contact ASUS RMA. I can only get in touch with ASUS DE RMA. They tell me to talk to Amazon, because I don't have an invoice, Amazon refuses to give me one because it's an imported product. I contact Amazon 2 times. Both times some Indian support people promise me the heavens, I have to tell them, "I can't send the motherboard back, I'm using it and about 2400€ worth of components on it, I use the motherboard for work". They say, no problem we'll send you a replacement in 3 days. 2 months have now passed, no replacement in sight.

mrmlz · on March 21, 2024

Perhaps Wayland is to blame?

la_oveja · on March 21, 2024

no its not. wayland strangely works flawlessly on amd, such a big mystery.

JackSlateur · on March 21, 2024

On the other hand, nvidia hardware -unlike amd's- never crashed my PC.

So I guess the situation is not simply black & white but a bit more nuanced.

mrmlz · on March 21, 2024

So if I design something that works with one vendor but not the other then its their fault?

la_oveja · on March 21, 2024

wayland is a protocol that uses the gbm api.

amd uses the gbm properly, nvidia is half cooked and previously used eglstreams instead of because they did not like gbm. reality is every compositor just uses gbm.

mrmlz · on March 21, 2024

Ah fair enough! https://www.phoronix.com/news/XWayland-Drops-EGLStream

mrmlz · on March 21, 2024

I've used the prop-drivers for a couple of years now with absolutely zero issues so I'd say so. But I guess your use case might vary. I don't use Wayland.

bdjsiqoocwk · on March 21, 2024

And the church of rust is contributing to the problem.

adamnemecek · on March 21, 2024

The tide is turning.