Are you aware that cards containing “LLM” (40-80GB) levels of VRAM cost substantially more and the status quo for consumer cards hovers around 4-12GB, only going to 24GB for top end cards?
And this is exactly the way NVidia intends to keep it, methinks.
Give the consumers / gamers a consumer-priced GPU with a max of 16-24 GB VRAM for the high-end models. By consumer-priced, I mean $500-2000.
And make anyone interested in AI / ML / LLM / 3D / creatives pay $3000-10000 for GPUs that are similar in performance but have much higher VRAM.
Then top it out with six-figure (or higher) priced GPUs for the FAANG companies which can afford them for their data centers and currently contribute the most revenue (and profit) to NVidia.
Your comment, pre-edit, had something of a severe tone given that consideration.
Having said that, I've trained/finetuned image models just fine on an RTX 2070 Super with 8 GB of VRAM. This was back when doing so was more fruitful than simply training a more robust model in the first place. Given that is the current status quo - I'm curious what sort of training you're doing whole-network that actually produces results that are noticeably better than doing something few-shot during inference or doing LoRA finetuning? The latter brings you back into the realm of tuning on low-VRAM configs.
In general, a single GPU's memory constraints are one of many when training a model _from scratch_. In that case, you're bottlenecked by data and data parallelism. You don't need one or a few GPU's, you need more than would fit in a consumer setup in the first place.