E-CVTs are extremely reliable and are different from CVTs (CVTs use a belt attached to 2 cones, E-CVTs are just a single planetary gear set), but a lot of car guys and even some mechanics don't realize they're completely different.
Gas car sales peaked in 2018 globally. EVs are already >20% of new car sales worldwide, and the US is a joke when it comes to EV sales compared to Europe or China.
Part of the rust dependency issue is that the compiler only multithreads at the crate level currently (slowly being improved on nightly, but there's still some bugs before they can roll out the parallel compiler), so most libraries split themselves up into a ton of small crates because otherwise they just take too long to compile.
edit: Also, `cargo-vet` is useful for distributed auditing of crates. There's also `cargo-crev`, but afaik it doesn't have buy in from the megacorps like cargo-vet and last I checked didn't have as many/as consistent reviews.
It can do. Additionally, because each part is now smaller it's now easier to ensure that each part, in isolation, does what it says on the tin. It also means that other projects can reuse the parts. An example of the last point would be the Regex crate.
Regex is split into subcrates, one of which is regex-syntax: the parser. But that crate is also a dependency of over 150 other crates, including lalrpop, proptest, treesitter, and polars. So other projects have benefited from Regex being split up.
Linux on arm is actually pretty terrible outside of the server space due to their (Qualcomm, Imagination, and ARM) integrated GPUs being bad and having terrible drivers.
R5s has a garbage CPU that won't be able to handle QoS on probably >250mbps.
I'd avoid anything arm based that doesn't have a7x cores (ideally a76/a78 or newer, though I don't think there's any SBC socs using the a710/715/720 yet). A55 cores are old stupidly slow efficiency cores (area efficient, not power efficient).
Hell, modern audio codecs (opus and AAC, but not the ffmpeg opus/AAC encoders) are transparent at ~160-192k. MP3 is a legacy codec these days, and generally needs ~30% more bitrate for similar quality.
Intel GPU drivers have always been terrible. There's so many features that are just broken if you try to actually use them, on top of just generally being extremely slow.
Hell, the B580 is CPU bottlenecked on everything that isn't a 7800x3d or 9800x3d which is insane for a low-midrange GPU.
The amount of memory you can put on a GPU is mainly constrained by the GPU's memory bus width (which is both expensive and power hungry to expand) and the available GDDR chips (generally require 32bits of the bus per chip). We've been using 16Gbit (2GB) chips for awhile, and they're just starting to roll out 24Gbit (3GB) GDDR7 modules, but they're expensive and in limited demand. You also have to account for VRAM being somewhat power hungry (~1.5-2.5w per module under load).
Once you've filled all the slots your only real option is to do a clamshell setup that will double the VRAM capacity by putting chips on the back of the PCB in the same spot as the ones on the front (for timing reasons the traces all have to be the same length). Clamshell designs then need to figure out how to cool those chips on the back (~1.5-2.5w per module depending on speed and if it's GDDR6/6X/7, meaning you could have up to 40w on the back).
Some basic math puts us at 16 modules for a 512 bit bus (only the 5090, have to go back a decade+ to get the last 512bit bus GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080, 4080, 7800xt).
A clamshell 5090 with 2GB modules has a max limit of 64GB, or 96GB with (currently expensive and limited) 3GB modules (you'll be able to buy this at some point as the RTX 6000 Blackwell at stupid prices).
HBM can get you higher amounts, but it's extremely expensive to buy (you're competing against H100s, MI300Xs, etc), supply limited (AI hardware companies are buying all of it and want even more), requires a different memory controller (meaning you'll still have to partially redesign the GPU), and requires expensive packaging to assemble it.
What of previous generations of HBM? Older consumer AMD GPUs (Vega) and Titan V had HBM2. According to https://en.wikipedia.org/wiki/Radeon_RX_Vega_series#Radeon_V... you could get 16GB with 1TB/s for $700 at release. It is no longer use in data centers. I'd gladly pay $2800 for 48GB with 4TB/s.
Interesting. So a 32-chip GDDR6 clamshell design could pack 64GB VRAM with about 2TB/s on a 1024bit bus, consuming around 100W for the memory subsystem? With current chip prices [1], this would cost just about 200$ (!) for the memory chips, apparently. So theoretically, it should be possible to build fairly powerful AI accelerators in the 300W and < 1000$ range. If one wanted to, that is :)
Hardware-wise instead of putting the chips on the PCB surface one would mount an 16-gonal arrangement of perpendicular daughterboards, each containing 2-16 GDDR chips where there would be normally one, with external liquid cooling, power delivery and PCIe control connection.
Then each of the daughterboards would feature a multiplexer with a dual-ported SRAM containing a table where for each memory page it would store the chip number to map it to and it would use it to route requests from the GPU, using the second port to change the mapping from the extra PCIe interface.
API-wise, for each resource you would have N overlays and would have a new operation allowing to switch the resource overlay (which would require a custom driver that properly invalidates caches).
This would depend on the GPU supporting the much higher latency of this setup and providing good enough support for cache flushing and invalidation, as well as deterministic mapping from physical addresses to chip addresses, and the ability to manufacture all this in a reasonably affordable fashion.
GPUs use special DRAM that has much higher bandwidth than the DRAM that's used with CPUs. The main reason they can achieve this higher bandwidth at low cost is that the connection between the GPU and the DRAM chip is point-to-point, very short, and very clean. Today, even clamshell memory configuration is not supported by plugging two memory chips into the same bus, it's supported by having the interface in the GDDR chips internally split into two halves, and each chip can either serve requests using both halves at the same time, or using only one half over twice the time.
You are definitely not passing that link through some kind of daughterboard connector, or a flex cable.
reply