Building unorthodox deep learning GPU machines

kookamamie · on Feb 28, 2024

Do note that, due to NVidia's absurd EULAs, you cannot run RTX 3090 in a data-center for compute purposes (https://www.nvidia.com/content/DriverDownloads/licence.php?l...).

Should you care about the EULA? Not really, until you're a business of any significant scale.

echelon · on Feb 28, 2024

If you update your drivers. You could use an older license.

hkgirjenk · on Feb 28, 2024

Nvidia is known to retaliate (delaying future shipments) if you do stuff like this.

PrayagBhakar · on Feb 28, 2024

I run a basement compute server[^1], what’s Nvidia gonna do? Not let me buy their hella expensive H100s? At least now I get to learn ML skills without my failed experiments exponentially scaling on the cloud.

[^1]: https://prayag.bhakar.org/apollo-ai-compute-cluster-for-the-...

verditelabs · on Feb 28, 2024

I think I recognize the author of this from /r/localllama, where plenty of other people are building similar frankenstein rigs. This post only mentions Intel setups, but AMD Epyc Milan and Rome based rigs are also very viable alternatives. They're a bit more expensive, but much better perf/watt, and the incremental price increase after factoring in a lot of GPUs is fairly slim. With 7 PCIE bifurcators on a motherboard such as the AsRock ROMED8-2T and a 14 risers, you can get up to 14 GPUs at PCIE 4.0x8.

3abiton · on Feb 28, 2024

A bit sad hobbyist have to resort to such measures to get tinkering, not to mention the initial capital needed. We're all slaves to Nvidia's VRAM monopoly until AMD or Intel steps in and release a competitive alternative with beefy vrams.

loudmax · on Feb 28, 2024

Beefy VRAM is a start, but Nvidia's real moat is CUDA. If PyTorch runs on AMD's ROCm, or the Intel equivalent, as well as it runs on CUDA, then we'll see some real competition here.

Chris Lattner's Mojo programming language may present an alternative here, but it's still closed source.

3abiton · on Feb 29, 2024

How will Mojo solve what is essentially a GPU driver issue (ROCm vs cuda)? AMD's track at supporting GPU computing is very poor to be trusted so far.

gnabgib · on Feb 28, 2024

Unorthodox machines, and possibly not a load-bearing site, but luckily it's in an archive: https://archive.is/fARR4

fbdab103 · on Feb 28, 2024

Hopefully not hosted on Netlify.

brucethemoose2 · on Feb 28, 2024

Its insane that a 2020 gaming GPU is such a niche for deep learning. Thats ancient history in the world of graphics, and its only that way because the market is straight up anticompetitive (and AMD is too dense to break out of it with a 48GB consumer card).

hackerlight · on Feb 28, 2024

48GB consumer card won't change much if people are already passing over the much cheaper AMD 7900XTX that has 24GB. Nvidia is winning because of the perceived CUDA to ROCm gap, that's all it comes down to. (and they're winning in the consumer space because of the DLSS and FSR gap)

brucethemoose2 · on Feb 28, 2024

I passed over the 7900 XTX because its not worth the trouble over a 3090.

That calculus totally changes with 48GB. I would put up with A LOT of trouble for that.

shikon7 · on Feb 28, 2024

The RTX 3090 is still one of the best gaming/consumer GPUs, beat only by the 4080 and 4090. And the 5000 series might still be almost a year away.

HPsquared · on Feb 28, 2024

And that it was considered highly overpriced at the time (for gaming) but now seen as a bargain choice.

alwayslikethis · on Feb 29, 2024

It wasn't available at MSRP for a long time because of its mining performance. The price dropped a lot since ETH went PoS and the cheaper lower end 40 series are nearly as fast in gaming.

rangestransform · on Feb 28, 2024

it's not anticompetitive, amd was just too late to the GPU compute punch and was deservingly punished by the market for it. Near-unlimited upside for world-changing technology is a positive part of the system IMO

Weryj · on Feb 28, 2024

My unorthodox contribution is an Intel Optane drive as a swap drive to give my CPU only box 96GB of ram to work with.

johnklos · on Feb 28, 2024

Maybe they should build an unorthodox web server, which these days seems to be one that's self hosted, instead of trying to use a weak Digitalocean instance.

I bet the machine mentioned in the article could host this site, even with Xeons, with two orders of magnitude more traffic before being able to register the load.

nmfisher · on Feb 28, 2024

Website just gives me "Error establishing a database connection".

(Did he achieve AGI internally?)

siwakotisaurav · on Feb 28, 2024

HN hug of death