> All three GeForce RTX 40 SUPER Series GPUs are faster than their predecessors
It's weird that they're only comparing the new cards to the RTX 30's and 20's, and not the "v1" 40's. I assume the 4080 SUPER is faster than the 4080 (based on name?) but it seems cheaper and there's absolutely no comparison data
The big difference with the 4070 Ti Super is that it's using the AD103 chip (with a full 256-bit memory bus and 16GB of VRAM) found in the 4080, which is a huge leap over the AD104 chip found in the 4070 Ti (non-Super), which only touts a 192-bit memory bus and 12GB of VRAM.
While the TFLOPS of the Super variant does only see a ~10% increase as you note, memory bandwidth jumps by 42% and the memory capacity jumps by 33%, while the launch price is the same in my currency.
It basically bridges half the distance between a 4070 Ti (non-Super) and a 4080 (non-Super) for the same launch price as a 4070 Ti (non-Super).
Great card for memory intensive workloads like LLM inference with big context windows, IMO.
EDIT1: 4070 Ti Super TDP is 320W (same as 4080), higher than the anticipated 285W
EDIT2: launch price confirmed to be same as the 4070 Ti (non-Super), lower than anticipated!
Appreciate the extra insight here! I hope for the sake of purchasers it is only 12% cost increase, but I have a suspicion if there's more than 12% extra value, we'll see it in the price
Just checked the CES announcement and updated my post to reflect that it actually has the same launch price that the 4070 Ti (non-Super) had! Amazing bargain!
Not only that, the RTX 4070 Ti Super gets near the same performance as the RTX 4080 non Super for $400 less. But that's MSRP. I have a feeling this card will be selling for a lot more than that.
Cheers, the 4070 TI certainly hits a certain sweet spot for sure.
I got a 4090 a few months ago before the prices increased, and I'm beyond stoked with the performance for (typically triple qhd simulation) gaming. It's just a beast.
I have a 2nd PC I'd like to upgrade too though, and the 4070 TI looks like it would be fantastic in this.
For running AI models etc the 4070ti is the best value of the bunch by far. Memory size and bandwidth are the most important things in that order (which makes the 4050, er, 4060ti 16GB a weak card)
Ergo, there's a decent chance it won't sell for MSRP.
It sure would be nice if Nvidia just named the new card 4075 or something. The whole 4070 vs 4070 Super vs 4070 Ti vs 4070 Ti Super naming scheme sucks.
It's not weird at all. Those cards aren't meant as next buy for owners of non-Super 40xx cards. Cards are compared with cards that potential buyers currently have.
Can that really be true? I figure most people just stick with whatever they have then buy the best thing in their means when the old one gets too slow for their needs. I can't imagine upgrading from 1070 to 2070, in fact right now most people that I know who are considering upgrading are on the 900 series
You're not getting what I'm saying - people stay in the same tier when they upgrade. They might do every generation, every other generation, skip every two generations, but the point is that people who have xx70 (or xx80) will buy xx70 (or xx80) from a newer generation.
10xx to 20xx upgrade made little sense to most gamers because RTX was a thing you turn on, look at pretty reflections and turn off to regain the performance. 10xx generation was a weird generation for NVIDIA and doubt they would make such a consumer friendly generation ever again.
Im holding out for the "the RTX 6090 Ti Super MAX XXXtreme 197Hz Mr. Manager"
Ive a lot of AMd Nvidia machines - two high-end gaming machines.. the naming conventions of Nvidia cards are just odd to me and I can tell what anything actually means..
I get that it's supposed to be funny, but I wonder how many people still think bitcoin mining happens on GPUs, particularly Nvidia ones. Pretty sure that stopped being the case like 14 years ago or something? Anyway...
All tokens are effectively "slaved" to the BTC tokens due to paper thin real liquidity of all of them. Therefore GPUs were very much affected by the BTC volatility, just via proxy, and likely still are. It should be obvious really for anyone.
Litecoin and Doge and Ethereum were mined on GPUs throughout that period. And in fact all the other coins are tied to bitcoin in the first place, bitcoin price runs also trigger huge mining booms in everything else, so yeah, it's kinda understandable that people tend to view them as a single linked thing, because they kinda are.
What, precisely, is the point of making "actually you mean crypto, not bitcoin" posts, other than demonstrating that you are, indeed, "very smart"? Like, this person doesn't even exist, it's just a "heh aren't those no-coiners dumb" strawperson that you imagine to be some big dummy.
But everybody who is upgrading is trying to figure out which card to upgrade to, and are doing comparisons between the current gen cards, not between their old card and the new ones.
It is likely that there will be no choice between Super and non-Super.
At least 4070 Ti and 4080 have become completely obsolete when their Super variants are much better, and in the case of 4080 Super, even cheaper too.
I suppose that they have stopped producing the non-Super variants, as nobody would want those where the Super cards are available.
I am still using a 2060 Super from 2019, and the same has happened in that year when the RTX 2000 Super series has replaced the previous RTX 2000 series.
This stands out like a sore thumb for anyone even taking a glance at the graphs. Who there thought this was a good idea? Now I think the Supers are just going to be 10% faster while drawing 30% more power or something else hacky and desperate-seeming.
It seems they’re being very careful not to undercut their enterprise offerings or even the 4090. Assuming they’re not completely tone deaf, I can only assume this is the explanation.
5FP32 TFLOPs, if not doing sparse low precision inference it seems to be about in line with mid-high end 2014 Nvidia consumer card performance (gtx 980), one decade old.
For running sparsified/quantized llama2 it might be good, not sure about for fine tuning. I didn't see any FP16 numbers.
Per chip? Not the full story when discussing a system which can integrate multiple. The Orin has more memory bandwidth than an RTX 4050 even though the latter uses GDDR6. The M3 Max has double the bandwidth of the Orin, but also uses LPDDR5.
Are you aware that cards containing “LLM” (40-80GB) levels of VRAM cost substantially more and the status quo for consumer cards hovers around 4-12GB, only going to 24GB for top end cards?
And this is exactly the way NVidia intends to keep it, methinks.
Give the consumers / gamers a consumer-priced GPU with a max of 16-24 GB VRAM for the high-end models. By consumer-priced, I mean $500-2000.
And make anyone interested in AI / ML / LLM / 3D / creatives pay $3000-10000 for GPUs that are similar in performance but have much higher VRAM.
Then top it out with six-figure (or higher) priced GPUs for the FAANG companies which can afford them for their data centers and currently contribute the most revenue (and profit) to NVidia.
Your comment, pre-edit, had something of a severe tone given that consideration.
Having said that, I've trained/finetuned image models just fine on an RTX 2070 Super with 8 GB of VRAM. This was back when doing so was more fruitful than simply training a more robust model in the first place. Given that is the current status quo - I'm curious what sort of training you're doing whole-network that actually produces results that are noticeably better than doing something few-shot during inference or doing LoRA finetuning? The latter brings you back into the realm of tuning on low-VRAM configs.
In general, a single GPU's memory constraints are one of many when training a model _from scratch_. In that case, you're bottlenecked by data and data parallelism. You don't need one or a few GPU's, you need more than would fit in a consumer setup in the first place.
My impression is a lot of the open source action is around the just-about-runs-in-12GB region - lots of models coming out with 7B/13B and 4-bit quantisation, a few 70B models (which won't fit in 24GB anyway) and only limited stuff in between.
I suppose I could be getting a biased impression though, as of course many more people are in a position to recommend the more accessible models.
What sort of things are you running that take full advantage of that 24GB?
Training - at least the one I tried - requires to be run in fp16 mode. So a 7b net needs 14 GB for the model weights alone, plus some extra for the context and the stuff I don't really understand (some gradient values, oh that makes sense now that I've written it)
This is only supported with the previous generation NVidia 3090, it is apparently possible to combine two 3090s with 24 GB VRAM and 'fuse' them with NVLink to act as a single high-powered GPU with 48 GB VRAM combined.
NVidia no longer supports this for the 40-series, I think this is because they want anyone interested in using their GPUs for LLMs to buy the pricier models with more VRAM.
Theoretically you can use as many as GPUs you want in parallel. LLMs are easy to split and run in model parallel configuration (for big models which don't fit on one card). or data parallel for performance, when the same model runs different batches on GPUs. PyTorch has full support for both modes, afaik.
with both you and GP, I would imagine the answer is that people tend to build models to the hardware that is available. If 12GB and 24GB are the hardware thresholds that people have, you'll get "open-source action" in the 12GB and 24GB models, because people want to build things that run on the hardware they own.
(Which is of course how CUDA built its success more generally, vs the "you have to buy the $5k workstation card to get started" strategy from ROCm.)
More generally you'd call this optimization and targeting the hardware that's available. No sense releasing crysis when everyone is running a commodore 64, after all.
I actually have a 12GB card, which I purchased specifically for AI (24GB cards are too expensive for me). You're correct that 12GB is also a sweet spot in terms of what you get per dollar spent.
When you can sell an enterprise-grade card with 40-80GB of VRAM for $50k, selling consumer cards with 24GB for $2k is almost a form of charity, by comparison.
AMD and Intel GPUs do not have the software ecosystem for AI workloads that Nvidia does, though AMD is rapidly improving. Nvidia has had an effective monopoly on the AI hardware space for the last year or so, and continues to have an effective near-monopoly, but that won't last forever as AMD and Intel catch up.
The VRAM is one of the largest differentiators of their cards. Sufficient VRAM allows you to run huge LLMs like 65B in-memory, which is orders of magnitudes faster than system RAM + CPU. Smaller amounts of VRAM require swapping between VRAM and system RAM and incur a major performance penalty.
Businesses are fighting to fork over $50k+/card for 40/80GB cards with the same processor as the 24GB consumer cards - it doesn't make economic sense for Nvidia to offer more on the consumer cards, lest they start cannibalizing demand for the enterprise cards.
> When you can sell an enterprise-grade card with 40-80GB of VRAM for $50k
ADA 6000 RTX (48GB) — an enterprise (workstation) cars — is about $10K sticker price. The differentiator between those and data center cards in the 40GB-80GB range is, obviously, not just VRAM.
Also, as samspenc points out, that RTX 6000 is using the same AD102 chip found in a consumer RTX 4090, just with marginally more CUDA cores, TMUs, ROPs, etc.
The most substantial difference between the two is that the $10k card has twice the VRAM of the $2k card.
I know it sounds outrageously oversimplified, but Nvidia can indeed more or less print money by attaching a few extra memory chips to what is otherwise a flagship consumer-oriented graphics processor.
The profit margins on the H100, for reference, are estimated to be around one thousand percent (1000%), i.e., they sell for ~10x as much as it costs Nvidia to make them, and demand for them still grossly exceeds the total supply. See: https://the-decoder.com/nvidias-h100-gpu-sells-like-hot-cake...
The RTX ADA 6000 actually slightly performs worse than a consumer-grade 4090 (~10% less performant), even though it retails for 4-5x more ($8000 for ADA 6000 compared to $1500-2000 for a 4090).
Because the 5090 ostensibly targets gaming which only needs sufficient vram to display images on 4k textures on 4k, 5k and ultra-wide monitors. A large portion of the 5090 audience is not doing ML training and that vram would sit idle for a few monitor generations. As a gamer I would be kind of upset if they included that very expensive unneeded vram in their already very expensive cards.
As someone who is a gamer but also wants to dabble in ML, it kind of hurts tbh that this is probably true.
The xx90 cards (3090, 4090, etc) have always been aimed at the gamer with a lot of cash. But games aren't really designed with that hardware in mind, so you won't see games that will take advantage of that much VRAM, so NVIDIA isn't inclined to increase the memory on them.
I have a 3080 right now, so haven't really been able to play around with training a model, but I'd like to see the 5090 have 32 GB, but I'm having my doubts.
The 5880 is the same generation as the 4090 and is a workstation card (it has higher memory but lower CUDA cores available). The price is expected to be around $5-6k. What I mean by 5000 series is the non-workstation next-gen lineup.
Attention mechanism, the core of LLM, is universal enough to be brought back to standard vision models. Which is kind of ironic, since vision models were dominated by convolutions, and, the transformer is dubbed "convolution for text".
The real reason is that it doesn't deteriorate with regards to the input length in case of text, or far neighbourhood in case of vision. It's just a universal, new, building block that allows for shallower neural networks to perform more like their bigger versions
many text generation models run on my 11G 1080Ti
you can run quantized versions of these models
if you aren't running it quantized, I'd say even 24 gig is not enough
If you want to get the most bang for your buck, you definitely need to run quantized versions. Yes, there are models that run in 11G, just like there are models that run in 8G, and for any other amount of VRAM - my point is that 24G is the sweet spot.
This is another very good release that is also being somewhat bizarrely panned too. Yeah, it's essentially a 7600 16GB/4060 16GB, and that's what people wanted a few months ago, more VRAM. It also opens up a bunch of possibilities for cheap ML (in the same way the 4060 Ti 16GB does).
IMO this pretty well displaces the 4060 8GB and 16GB - it's cheaper (than even the already below-MSRP street prices) on 4060 Ti 8GB, it's way cheaper than the 16GB model. $50 over 7600 MSRP for twice the VRAM is a very fair deal, and street prices will probably float just as much as 7600 street prices have.
Clearance-priced 6700XT is a great deal but 7600/7600XT is ultimately a 6600/6600XT replacement and it's not a knock on the 7600 that it doesn't have the wider memory bus/etc - it is a lower-tier product that is only in the same price tier due to clearance markdowns.
I maintain that people are just mad about the whole last 5 years (since RTX launched) at this point and pretty much just give automatic thumbs-down to anything that isn't an absurdly out-of-band good product. The pandemic shortages and mining boom have embittered a fair number of people to the point I don't think they're coming back to hobby, and instead they sit on social media and complain.
It's tough because most of the games being benchmarked by reviewers were designed to work well on 4GB for 1080p and on 8GB high settings. So the extra VRAM doesn't help much for benchmarking.
But future games are likely to run better in >8GB simply because the PS5 and XBox Series X have more than 8.
I agree that you need to optimize for the reasonably forseeable future. You are buying games for the next 5 years, not the last 5 years - although I do weight things towards the frontend, it is better to optimize for the next 3 years and 2 more of relevance than to try to aim for 5 years of relevance or 2 years of worse/3 years of relevance. Generally tail-end relevance is of low value, by the time 5 years are up things have still shifted enough it's time to upgrade anyway, unless you just don't care by that point.
I think 8GB is going to continue to be a long-lived target especially at 1080p resolutions (with whatever gains can be squeezed from upscaling etc too - although generally DLSS needs to inference against the full-quality textures etc). Series S has 8GB (of fast-partition ram, the rest is GTX 970 style slow-partition) and even Series X only has 10GB.
People also aren't giving enough credit to mesh shaders etc, the GTX 1650 is actually still in the game with 4GB in Alan Wake 2[0], it does make a difference. The "but a 2060 super isn't relevant anymore!" argument relies on the assumption that you're deciding not to turn on upscaling etc. 1650 can run AW2 on lowest-settings 1080p with 4gb with FSR2 and it looks fine, and it'd be even better with RTX/DLSS. 2060 with DLSS can do a console-like experience on AW2 zero problem.
Consoles have always been a mixed bag. Yeah, they get a lot of specific tweaking and they also have special hardware which helps somewhat. But overall you're working from a (later in the gen) fairly low baseline. 6700 non-XT performance is ok but nothing stellar, and optimization doesn't save that. But honestly what they are good at is removing "paralysis of choice", having too many choices really hurts people and the emotional feeling of having to turn the setting onto lowest hurts people, even if that's what the console does itself! It's at least pre-tweaked lowest settings etc (although often worse tech etc - FSR3 is blown away by DLSS 3.5 image quality let alone 4.0 and future iterations which aren't far away). You don't have to think about it, you just say "framerate or quality" and you probably know which you want.
That is the problem that people will struggle with. 8GB will still work. It just also will be a Series S level experience, modulo things like mesh shading that occasionally differentiate the consoles (PS5 lacks it iirc, as well as DP4a). And that can still often look fine. Will you get more from spending more? Yes. But it also doesn't take that much - series X is the 6700, series S is like APU territory. A 3080 blows away the series X, etc. But you will have to stomach through moving that slider from "native" to "performance" and the texture quality from "ultra" to "medium". Etc. People have lost touch of the world of yesterday when "can you run crysis" was an actual question and not a meme, slamming every setting to ultra is not a given when you buy an entry-level card, and people also can't handle the fact that $200-300 is now entry-level. Midrange is $500-700, high-end is $800 to "how much have you got".
And that's not NVIDIA, that's really just wafer costs. If you want to compare die sizes and MSRPs against 10+ years ago (look up GTX 670/GK104 lol), you have to bear in mind that a given die size might cost 5x what it did back then. And it increases ~30% every node-family since 28nm, more or less. It's gonna go up over time, if you aren't moving up in price you're moving down in product-design-bracket and are going to have to deal with more design compromises to hit those lower price-points in the face of rising costs. It sucks, but nobody has any better ideas - to paraphrase what someone once told me, "the industrial and creative poles of several societies and continents are laser-focused on pushing this backwards, and yet the problems only become more difficult after each success". There is no easy answer, lots of smart people are working at this.
Keep in mind that for local LLM inference, Nvidia's software support is qualitatively superior to AMD's, for the time being. That said, it's worth noting that AMD is catching up quickly.
My 3080 laptop GPU can still play 99% of games at their highest settings. Or at least they did before the latest batch of nVidia drivers went to shit. It's not the hardware that's the problem. What's the point of upgrading to a 4-series when we're left with stuttering and defects from software issues?
I didn't expect an announcement of the Supers this early in the year, nor did I expect them to be cheaper than the non-Super cards. I bought the 4080 recently with the thought that Nvidia's trend in the past 4 or 5 years has been to increase prices, so even if the Supers were announced this early in the year the performance increase would cost proportionally more money anyway.
Sucks for me, but overall I'm glad that Nvidia is getting prices under control to some extent.
> nor did I expect them to be cheaper than the non-Super cards.
to be blunt, that's because you bought into ayymd propaganda. it is so endemic that people don't even see it for what it is anymore, people are constantly bombarded with absurdly pro-AMD and absurdly anti-NVIDIA takes, it's just the sea in which we swim on social media.
you should take it as a learning experience and not constantly buy into the ayymd bandwagon of the week next time. because there will absolutely be a next time - probably people will move onto the next insane thing within a few weeks here.
Last year it was that the 4090 was going to be >900W... people talked themselves into thinking that a two-node shrink was going to result in zero efficiency gain. This ada gen is a dud, just wait for AMD, the 7900XTX is gonna blow the doors off!
And it's happened to RDNA3, Vega, Fury X, Zen2, etc, and against every single technology deployed via RTX or DLSS. The flip on framegen the day AMD released FSR3 was amazing, and instantly all the complaints about latency etc vanished within a single day, despite being significantly worse latency because of forced vsync/incompatibility with VRR, let alone the reflex-only baseline. "Possibly the best part of FSR now" etc.
it's like the runup to the iraq war or something, there were counter-voices, but why would you want to listen to them when everybody knows the truth already? Going against the grain constantly is tiresome and frames you as an iconoclast, and even if you're right people still think you're a troublemaker for having contradicted them earlier. The people who blocked you are not gonna unblock you just because you were right. It's like trying to be the voice of reason in a failing project, even if you save the project you're still a troublemaker. So eventually the dialogue just fades into an echo chamber. It is a fast road to what was eulogized as "epistemic closure" - aka "we bandwagoned too hard and drowned out all the opposing voices, and it turns out they were correct".
So here we are: green man bad, everyone knows it, and this exception really only proves the rule. Now if you'll excuse me I've got some very important posts to make about how you'll never be able to buy one for MSRP anyway, like it's still 2020 or something.
Since crypto has crashed so much and most of these are borderline useless for AI/ML, there's not likely to be too much of an issue. The MSRP is already massively marked up.
Considering the poor generational performance uplift the original 4000 series cards had vs the 3000 series - it almost seems like these super variants are what nVidia should have originally launched. :-/
The 4080/4090 are outliers of the generation though unfortunately, and also are themselves still a rather typical generational gain at that.
Most of the rest of the 40xx stack was either unexpectedly slow or unexpectedly expensive (or both), such that performance per dollar stayed flat or regressed
Ah that makes sense. I don't pay close enough attention to the mid tier or the low end. I'm sure I saw at some point that there were problems with those cards but it's hard to remember when it doesn't stand out because people are negative about every GPU.
I checked a sampling of a few games there and the only one where it's not ahead by a solid margin at 2560x1440 is Far Cry 6... the game that I gave up trying to play because it would not run smoothly on my 3090 + 5900X even with the settings turned down.
Maybe all those years where Intel was stuck on 14nm made me forget how big leaps could be generation to generation, but to me those jumps of more than 30% are huge especially considering that while the 4080 is a gen ahead of the 3090 it's also a tier down.
Also if you look at 4K performance, the % gaps for all these games are even larger (and I'm not looking at 4K now because it's better for my point. My next monitor will be one of the 4K QD-OLEDs that were announced at CES, so those charts are now more relevant to me than the 1440p ones)
lol, not going to happen. The low MSRP is too low for partners to make any money and so they’ll keep the prices inflated as much as the market will bare, regardless of the availability of the chips
Waiting for Intel Battlemage, latest rumors says 16 GB of memory, a 256-bit wide bus, RTX 4080 level of performance in compute for 450$ MSRP. Wait and see Q3 2024. Intel software stack is more likely to challenge Nvidia's than AMD's.
All I want is a low-power passively-cooled replacement for my aging 5GB Quadro P2000 that doesn't crash out while running the new Lightroom denoise algorithms. None of these seem likely choices in that area, sigh.
Is it a noise issue? If you take a newer open-air style card and lower the power limit it should be fairly quiet and still much faster than what you've got. You can also tinker with limiting the clock speed and maybe get even lower, enough that you could take the fans off.
Generally a wise decision when it comes to anything that has this type of release schedule, from phones to 3D printers. Buy one generation, skip one, maybe upgrade the next, depending on relevant improvements.
Yeah, my mistake buying an ultrawide monitor. My 4yo rig can push 1080 okay but 3440x1440 is a bit much. I don't dare aspire to anything higher than 60fps.
Hoping this drives down the price for the 7800XT or older used models in the near future.
That being said, these look competent enough, just stingy with VRAM still making them less desirable for longer use (4+ years) in either playing games or training models.
I feel the exact opposite. There are too many! There are now nine models of the 4000-series, and that's not including laptop models or the cancelled 4080 12 GB.
IMO, there should be 5 models at the maximum. I shouldn't have to sit here and do a bunch of research to find out if the 4070 SUPER is faster than a 4070 Ti, and whether I should go for a 4070 SUPER Ti.
4060, 4070, 4080, 4090. That's all they ever needed. Budget, mid-grade, enthusiast, top-of-the-line. That's all that's needed.
They did something similar for the 30-series, I think 20-series as well.
Though given that NVidia seems to release its consumer GPUs once every 2 years (20 series in late 2018, 30-series late 2020, 40 series in late 2022), I wonder if this is more of a marketing ploy - release the main series once every 2 years, but bring our "super" refreshers in the middle of the cycle to make sure you're still in the news, and get some segment of consumers / gamers to upgrade to those.
3060 Ti - yes, it still a great card that I use today
However, if you can't find a 3060 Ti at a lower price point than the 4060 Ti... then I reluctantly have to say you are probably going to be better served with the 4060 Ti.
It really does seem to be the same branding problem as the Dragon Ball series. How do you say “this version is more powerful than any before” after having said that a dozen times in a row?
It's weird that they're only comparing the new cards to the RTX 30's and 20's, and not the "v1" 40's. I assume the 4080 SUPER is faster than the 4080 (based on name?) but it seems cheaper and there's absolutely no comparison data