I swear I'm getting Deja Vu right now, I coulda sworn I've seen this thread before. There's gonna be a guy commenting that "you don't need it" somebody else saying "but I want it!" And a few trying to figure out the economics of it and whether or not it makes any sense.
Personally I'd love to have as much VRAM as possible (and as high a bandwidth as is possible too) to mess around with simulations in- but that's definitely a pro workload.
I'd love to see like a flagship card have a stupid amounts of VRAM spec option - like an RTX 4090 with 32-48gb of VRAM just to see what happens with it on the market.
A friend just got an M3 Max with 128GB (V)RAM and he's extremely happy with it (AI workloads wise). That could be an option if you can run your simulations on macOS.
That machine with 128GB is $5000, with 48GB is still well over $3000, and has as much memory bandwidth as a $400 GPU. At the current spot price, 128GB of GDDR6 is <$400 and 48GB is <$150, implying that they could be paired with any existing <$1000 GPU to produce something significantly faster for dramatically less money. If anyone could be bothered to make one.
If all you want is VRAM you can get old P40s with 24GB for $175 and it's 144GB for $1050. Then you need a big machine to put six of them in but that doesn't cost $4000.
But all of this is kludges. The Radeon RX 7900 XTX has more than twice the memory bandwidth of the M3 Max with much better performance per watt than an array of P40s. What you want is that with more VRAM, not any of this misery.
That checks out in principle, but given that P40 doesn't support NVLink, I wouldn't count too much on using six of them together in a performant manner.
But yeah the best option remains an MI300 if you can afford that.
Yeah, my M2 MacBook has 96GB @400GB/s. For $4k or so, it feels like cheating. Does it beat 4x24GB NVIDIA cards? Absolutely not! It's slower and occasionally runs into CUDA-moat software issues. But the capability to daily drive Mixtral 8x7 locally, with great token speeds, is phenomenal.
You over estimate the 'semi-pro' market for graphics cards. Gamers are barely willing to pay for 20GB. There's no market for consumer cards with an order of magnitude more RAM until games are built to use that memory.
I would personally love that project but there are already so many versioning issues in the space it would be a nightmare if ROCm randomly broke things all the time.
And we're talking about Intel here. AMD is going to price competitively against Nvidia but they'd still rather you buy a $20,000 MI300 than a hypothetical 128GB Radeon for $2000.
Intel could very easily just put a buttload of VRAM on their existing GPUs to stick it to their competitors and make out like bandits. All they'd have to do is charge a Big markup instead of an Enterprise markup. And Intel has a better history of not making broken libraries.
Assuming 50 input tokens per second, you could still be waiting ten minutes for a full 32k token prompt.
What you are talking about is highly optimized inference using accelerators, batching and speculative decoding to achieve high throughout. Once you have that then compute is irrelevant except in terms of cost, but if all you have is a small consumer grade GPU you will be compute limited at the extreme limits of your context window.
But "basement ML" is a thing, the market of people who are interested in PC gaming but not to the point of being lifestyle gamers who throw every cent they can spare at that altar. The GPU they bought long before the pandemic is still running every game they throw at it, but they never completely stop eyeing the new stuff. Dipping their toes in ML, even if it's just getting through 80% of some stable diffusion setup tutorial, can be a very welcome excuse to upgrade their gaming. A card sold for gaming but with generously overprovisioned VRAM (ideally in the range of the lowest bin of the biggest or second-biggest chip I think) could match that market segment very well - and it would not only compete with other price points, it would actually increase the market by some buyers (those who would not upgrade without the "ML excuse").
It’s possible to come up with many strategies and different companies will. Why are you so sure that Nvidia’s strategy is right for AMD or Intel who need to offer differentiation to get over the CUDA moat?
48GB consumer cards (or 96GB pro cards) would sell like hotcakes if AMD/Intel dare to break the artificial VRAM segmentation status quo.