How about higher capacity GDDR6X? 48GB consumer cards (or 96GB pro cards) would ...

HeWhoLurksLate · on Feb 28, 2024

I swear I'm getting Deja Vu right now, I coulda sworn I've seen this thread before. There's gonna be a guy commenting that "you don't need it" somebody else saying "but I want it!" And a few trying to figure out the economics of it and whether or not it makes any sense.

Personally I'd love to have as much VRAM as possible (and as high a bandwidth as is possible too) to mess around with simulations in- but that's definitely a pro workload.

I'd love to see like a flagship card have a stupid amounts of VRAM spec option - like an RTX 4090 with 32-48gb of VRAM just to see what happens with it on the market.

omneity · on Feb 28, 2024

A friend just got an M3 Max with 128GB (V)RAM and he's extremely happy with it (AI workloads wise). That could be an option if you can run your simulations on macOS.

AnthonyMouse · on Feb 28, 2024

That machine with 128GB is $5000, with 48GB is still well over $3000, and has as much memory bandwidth as a $400 GPU. At the current spot price, 128GB of GDDR6 is <$400 and 48GB is <$150, implying that they could be paired with any existing <$1000 GPU to produce something significantly faster for dramatically less money. If anyone could be bothered to make one.

omneity · on Feb 28, 2024

I agree with you, that’s how it should be, but that’s not how it currently is.

Looking at what’s available right now. You need 3 A100 40GB to get this amount of VRAM which will cost you way north of 20000$.

Doing it with A6000s is still about 15k$.

There’s not that many high VRAM options out there you know..

AnthonyMouse · on Feb 28, 2024

If all you want is VRAM you can get old P40s with 24GB for $175 and it's 144GB for $1050. Then you need a big machine to put six of them in but that doesn't cost $4000.

But all of this is kludges. The Radeon RX 7900 XTX has more than twice the memory bandwidth of the M3 Max with much better performance per watt than an array of P40s. What you want is that with more VRAM, not any of this misery.

omneity · on Feb 28, 2024

That checks out in principle, but given that P40 doesn't support NVLink, I wouldn't count too much on using six of them together in a performant manner.

But yeah the best option remains an MI300 if you can afford that.

a_wild_dandan · on Feb 28, 2024

Yeah, my M2 MacBook has 96GB @400GB/s. For $4k or so, it feels like cheating. Does it beat 4x24GB NVIDIA cards? Absolutely not! It's slower and occasionally runs into CUDA-moat software issues. But the capability to daily drive Mixtral 8x7 locally, with great token speeds, is phenomenal.

AnthonyMouse · on Feb 28, 2024

NVLink is most needed for training. For inference a lot of the popular models can usefully be run on multiple GPUs without it:

https://www.reddit.com/r/LocalLLaMA/comments/142rm0m/llamacp...

cyanydeez · on Feb 29, 2024

lastgen A6000 is about 5k

cyanydeez · on Feb 29, 2024

last gen A6000 has 48gb. I just put two of them in a server. about 5k each.

teaearlgraycold · on Feb 28, 2024

Build it and people will start porting their CUDA stuff to run on other architectures.

Brananarchy · on Feb 28, 2024

You over estimate the 'semi-pro' market for graphics cards. Gamers are barely willing to pay for 20GB. There's no market for consumer cards with an order of magnitude more RAM until games are built to use that memory.

cinntaile · on Feb 28, 2024

48GB cards would sell like hotcakes. The problem is that they would sell way less cards aimed at professionals, where they have much higher margins.

AnthonyMouse · on Feb 28, 2024

Intel doesn't sell a lot of graphics cards whatsoever though. Be the first to offer 64GB of VRAM for under $1000 and that could change pretty fast.

xadhominemx · on Feb 28, 2024

Not without CUDA unfortunately.

KeplerBoy · on Feb 28, 2024

Don't underestimate the amount of shit people would be willing to deal with to make stuff work.

A capable GPU with 24+ GB would sell if it significantly undercuts Nvidia. Just look at geohot building his tinyboxes with AMD cards.

xadhominemx · on Feb 28, 2024

I would personally love that project but there are already so many versioning issues in the space it would be a nightmare if ROCm randomly broke things all the time.

KeplerBoy · on Feb 28, 2024

I agree, ROCm seems to be a mess from the outside, but I'm glad people are putting in the effort.

AnthonyMouse · on Feb 28, 2024

And we're talking about Intel here. AMD is going to price competitively against Nvidia but they'd still rather you buy a $20,000 MI300 than a hypothetical 128GB Radeon for $2000.

Intel could very easily just put a buttload of VRAM on their existing GPUs to stick it to their competitors and make out like bandits. All they'd have to do is charge a Big markup instead of an Enterprise markup. And Intel has a better history of not making broken libraries.

cyanydeez · on Feb 29, 2024

a lot of the true AI value is context window size limited, not compute limited.

imtringued · on Feb 29, 2024

Assuming 50 input tokens per second, you could still be waiting ten minutes for a full 32k token prompt.

What you are talking about is highly optimized inference using accelerators, batching and speculative decoding to achieve high throughout. Once you have that then compute is irrelevant except in terms of cost, but if all you have is a small consumer grade GPU you will be compute limited at the extreme limits of your context window.

cyanydeez · on Feb 29, 2024

I'm taking about context in, not out. reports I have and the knowledge base I want answers from are 500-1000k tokens.

I don't need long answers, I need by site specific knowledge base

ianbutler · on Feb 28, 2024

This is for ML, not gamers. There is an entirely different market here.

usrusr · on Feb 28, 2024

But "basement ML" is a thing, the market of people who are interested in PC gaming but not to the point of being lifestyle gamers who throw every cent they can spare at that altar. The GPU they bought long before the pandemic is still running every game they throw at it, but they never completely stop eyeing the new stuff. Dipping their toes in ML, even if it's just getting through 80% of some stable diffusion setup tutorial, can be a very welcome excuse to upgrade their gaming. A card sold for gaming but with generously overprovisioned VRAM (ideally in the range of the lowest bin of the biggest or second-biggest chip I think) could match that market segment very well - and it would not only compete with other price points, it would actually increase the market by some buyers (those who would not upgrade without the "ML excuse").

hydroreadsstuff · on Feb 28, 2024

https://semiconductor.samsung.com/news-events/tech-blog/a-br...

hkgirjenk · on Feb 28, 2024

Don't you think Nvidia analyzed, sliced and diced the market to figure out how to maximize profits?

AMD is doing the same thing, the only high memory cards they put out (MI300) are for data centers.

xadhominemx · on Feb 28, 2024

NVIDIA has actually not really sliced and diced the market. They only sell Cadillacs, which is fine for now because they're the only game in town.

andy_ppp · on Feb 28, 2024

It’s possible to come up with many strategies and different companies will. Why are you so sure that Nvidia’s strategy is right for AMD or Intel who need to offer differentiation to get over the CUDA moat?

cyanydeez · on Feb 29, 2024

until they drain the main vein, no MBA is going to greenlight artificial price inflation.