My impression is a lot of the open source action is around the just-about-runs-in-12GB region - lots of models coming out with 7B/13B and 4-bit quantisation, a few 70B models (which won't fit in 24GB anyway) and only limited stuff in between.
I suppose I could be getting a biased impression though, as of course many more people are in a position to recommend the more accessible models.
What sort of things are you running that take full advantage of that 24GB?
Training - at least the one I tried - requires to be run in fp16 mode. So a 7b net needs 14 GB for the model weights alone, plus some extra for the context and the stuff I don't really understand (some gradient values, oh that makes sense now that I've written it)
This is only supported with the previous generation NVidia 3090, it is apparently possible to combine two 3090s with 24 GB VRAM and 'fuse' them with NVLink to act as a single high-powered GPU with 48 GB VRAM combined.
NVidia no longer supports this for the 40-series, I think this is because they want anyone interested in using their GPUs for LLMs to buy the pricier models with more VRAM.
Theoretically you can use as many as GPUs you want in parallel. LLMs are easy to split and run in model parallel configuration (for big models which don't fit on one card). or data parallel for performance, when the same model runs different batches on GPUs. PyTorch has full support for both modes, afaik.
with both you and GP, I would imagine the answer is that people tend to build models to the hardware that is available. If 12GB and 24GB are the hardware thresholds that people have, you'll get "open-source action" in the 12GB and 24GB models, because people want to build things that run on the hardware they own.
(Which is of course how CUDA built its success more generally, vs the "you have to buy the $5k workstation card to get started" strategy from ROCm.)
More generally you'd call this optimization and targeting the hardware that's available. No sense releasing crysis when everyone is running a commodore 64, after all.
I actually have a 12GB card, which I purchased specifically for AI (24GB cards are too expensive for me). You're correct that 12GB is also a sweet spot in terms of what you get per dollar spent.
I suppose I could be getting a biased impression though, as of course many more people are in a position to recommend the more accessible models.
What sort of things are you running that take full advantage of that 24GB?