My impression is a lot of the open source action is around the just-about-runs-i...

eurekin · on Jan 8, 2024

As the ancestor commenter mentioned:

> If you’re interested in ML training

Training - at least the one I tried - requires to be run in fp16 mode. So a 7b net needs 14 GB for the model weights alone, plus some extra for the context and the stuff I don't really understand (some gradient values, oh that makes sense now that I've written it)

phero_cnstrcts · on Jan 8, 2024

I thought it was possible to use two cards and “share” the ram? Then it would still be a good deal. But maybe I’m wrong.

samspenc · on Jan 9, 2024

This is only supported with the previous generation NVidia 3090, it is apparently possible to combine two 3090s with 24 GB VRAM and 'fuse' them with NVLink to act as a single high-powered GPU with 48 GB VRAM combined.

NVidia no longer supports this for the 40-series, I think this is because they want anyone interested in using their GPUs for LLMs to buy the pricier models with more VRAM.

two_in_one · on Jan 9, 2024

Theoretically you can use as many as GPUs you want in parallel. LLMs are easy to split and run in model parallel configuration (for big models which don't fit on one card). or data parallel for performance, when the same model runs different batches on GPUs. PyTorch has full support for both modes, afaik.

BaculumMeumEst · on Jan 8, 2024

using cloud hardware for training and consumer cards for inference seems like the common sense thing to do then

eurekin · on Jan 8, 2024

In the spirit of hackernews, one can also build a rig such as this one:

https://nonint.com/2022/05/30/my-deep-learning-rig/

and land a job at OpenAI as a side effect. Of all places, I wasn't expecting such pushback here

paulmd · on Jan 8, 2024

with both you and GP, I would imagine the answer is that people tend to build models to the hardware that is available. If 12GB and 24GB are the hardware thresholds that people have, you'll get "open-source action" in the 12GB and 24GB models, because people want to build things that run on the hardware they own.

(Which is of course how CUDA built its success more generally, vs the "you have to buy the $5k workstation card to get started" strategy from ROCm.)

More generally you'd call this optimization and targeting the hardware that's available. No sense releasing crysis when everyone is running a commodore 64, after all.

baobabKoodaa · on Jan 8, 2024

I actually have a 12GB card, which I purchased specifically for AI (24GB cards are too expensive for me). You're correct that 12GB is also a sweet spot in terms of what you get per dollar spent.