Hacker News new | past | comments | ask | show | jobs | submit login

Ah, this is awesome! I currently run k3s on a decently spec-ed NixOS rig. I tried getting k3s to recognize my Nvidia GPU but was unsuccessful. I even used the small guide for getting GPU in k3s to work in nixpkgs[0], but without success.

For now I’m just using Docker’s Nvidia container runtime for containers that need GPU acceleration.

Will likely spend more time digging into your findings — hoping it results in me finding a solution to my setup!

[0] https://github.com/NixOS/nixpkgs/blob/master/pkgs/applicatio...




There's a bug in k8s-device-plugin that stops the plugin from even launching, as I mentioned in the article:

https://github.com/NVIDIA/k8s-device-plugin/issues/1182

And I opened a PR for fixing that here:

https://github.com/NVIDIA/k8s-device-plugin/pull/1183

I am unsure if this bug is only for the NixOS environment because its library paths and other quicks differ from those of major Linux distros.

Another major problem was that the "default_runtime_name" in the Containerd config didn't work as expected. I had to create a RuntimeClass and assign it to the pod to make it pick up the Nvidia runtime.

Other than that, I haven't tried K3S, the one I am running is a full-blown K8S cluster. I guess they should be similar.

While there's no guarantee, if you find any hints showing why your Nvidia plugin won't work here, I might be able to help, as I skip some minor issues I encountered in the articles. If it happens to be the ones I faced, I can share how I solved them.


By the way, one of the problems I encountered but didn't mention in the article was that the libnvidia-container has problem with the pathes for reading nvidia drivers and libraries under NixOS with its non-POSIX pathes. I had to create a patch for modifying the path files. I just created a Gist here with the patch content:

https://gist.github.com/fangpenlin/1cc6e80b4a03f07b79412366b...

But later on, since I am taking the CDI route, it appears that the libnvidia-container (nvidia-container-cli) is not really used. If you are going with just container runtime approach instead of CDI, you may need a patch like this for the libnvidia-container package.


Oooo, thanks for the pointers! Will be revisiting this tomorrow!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: