More

elabajaba · 2025-10-07T00:31:25 1759797085

No need to transcode to HLSL, DXC already accepts SPIR-V input (and both Chrome and FF are shipping DXC).

elabajaba · 2025-07-22T21:52:00 1753221120

E-CVTs are extremely reliable and are different from CVTs (CVTs use a belt attached to 2 cones, E-CVTs are just a single planetary gear set), but a lot of car guys and even some mechanics don't realize they're completely different.

elabajaba · 2025-05-24T06:17:08 1748067428

Gas car sales peaked in 2018 globally. EVs are already >20% of new car sales worldwide, and the US is a joke when it comes to EV sales compared to Europe or China.

https://ourworldindata.org/electric-car-sales

elabajaba · 2025-05-10T03:27:02 1746847622

Part of the rust dependency issue is that the compiler only multithreads at the crate level currently (slowly being improved on nightly, but there's still some bugs before they can roll out the parallel compiler), so most libraries split themselves up into a ton of small crates because otherwise they just take too long to compile.

edit: Also, `cargo-vet` is useful for distributed auditing of crates. There's also `cargo-crev`, but afaik it doesn't have buy in from the megacorps like cargo-vet and last I checked didn't have as many/as consistent reviews.

https://github.com/mozilla/cargo-vet

https://github.com/crev-dev/cargo-crev

zavec · 2025-05-10T22:44:25 1746917065

Can't believe I'd never heard of cargo vet before, that sounds really promising!

zahlman · 2025-05-10T04:28:14 1746851294

> so most libraries split themselves up into a ton of small crates because otherwise they just take too long to compile.

In practice, does this make it feasible to pick and choose the pieces you actually need?

Measter · 2025-05-10T14:41:43 1746888103

It can do. Additionally, because each part is now smaller it's now easier to ensure that each part, in isolation, does what it says on the tin. It also means that other projects can reuse the parts. An example of the last point would be the Regex crate.

Regex is split into subcrates, one of which is regex-syntax: the parser. But that crate is also a dependency of over 150 other crates, including lalrpop, proptest, treesitter, and polars. So other projects have benefited from Regex being split up.

simonask · 2025-05-10T07:10:15 1746861015

Yes, when done properly. Rust has "feature flags" that can selectively enable dependencies, and effectively act as `#ifdef` guards in the code.

elabajaba · 2025-04-28T23:02:38 1745881358

The core unity game engine is c++ that you can't access, but all unity games are written in c#.

palata · 2025-04-29T08:24:06 1745915046

And you could do that with any garbage collected language, right? You could reuse that C++ core with a JVM language.

elabajaba · 2025-02-26T01:23:05 1740532985

Linux on arm is actually pretty terrible outside of the server space due to their (Qualcomm, Imagination, and ARM) integrated GPUs being bad and having terrible drivers.

kcb · 2025-02-26T02:21:45 1740536505

Qualcomm doesn't belong in the list, Freedreno and Turnip are feature complete open source GPU drivers.

elabajaba · 2025-02-17T00:23:42 1739751822

R5s has a garbage CPU that won't be able to handle QoS on probably >250mbps.

I'd avoid anything arm based that doesn't have a7x cores (ideally a76/a78 or newer, though I don't think there's any SBC socs using the a710/715/720 yet). A55 cores are old stupidly slow efficiency cores (area efficient, not power efficient).

elabajaba · 2025-02-12T00:45:01 1739321101

Hell, modern audio codecs (opus and AAC, but not the ffmpeg opus/AAC encoders) are transparent at ~160-192k. MP3 is a legacy codec these days, and generally needs ~30% more bitrate for similar quality.

elabajaba · 2025-02-12T00:21:58 1739319718

Intel GPU drivers have always been terrible. There's so many features that are just broken if you try to actually use them, on top of just generally being extremely slow.

Hell, the B580 is CPU bottlenecked on everything that isn't a 7800x3d or 9800x3d which is insane for a low-midrange GPU.

elabajaba · 2025-02-11T19:27:20 1739302040

The amount of memory you can put on a GPU is mainly constrained by the GPU's memory bus width (which is both expensive and power hungry to expand) and the available GDDR chips (generally require 32bits of the bus per chip). We've been using 16Gbit (2GB) chips for awhile, and they're just starting to roll out 24Gbit (3GB) GDDR7 modules, but they're expensive and in limited demand. You also have to account for VRAM being somewhat power hungry (~1.5-2.5w per module under load).

Once you've filled all the slots your only real option is to do a clamshell setup that will double the VRAM capacity by putting chips on the back of the PCB in the same spot as the ones on the front (for timing reasons the traces all have to be the same length). Clamshell designs then need to figure out how to cool those chips on the back (~1.5-2.5w per module depending on speed and if it's GDDR6/6X/7, meaning you could have up to 40w on the back).

Some basic math puts us at 16 modules for a 512 bit bus (only the 5090, have to go back a decade+ to get the last 512bit bus GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080, 4080, 7800xt).

A clamshell 5090 with 2GB modules has a max limit of 64GB, or 96GB with (currently expensive and limited) 3GB modules (you'll be able to buy this at some point as the RTX 6000 Blackwell at stupid prices).

HBM can get you higher amounts, but it's extremely expensive to buy (you're competing against H100s, MI300Xs, etc), supply limited (AI hardware companies are buying all of it and want even more), requires a different memory controller (meaning you'll still have to partially redesign the GPU), and requires expensive packaging to assemble it.

lostmsu · 2025-02-11T20:51:31 1739307091

What of previous generations of HBM? Older consumer AMD GPUs (Vega) and Titan V had HBM2. According to https://en.wikipedia.org/wiki/Radeon_RX_Vega_series#Radeon_V... you could get 16GB with 1TB/s for $700 at release. It is no longer use in data centers. I'd gladly pay $2800 for 48GB with 4TB/s.

Tuna-Fish · 2025-02-12T09:34:33 1739352873

Previous generation of HBM is not any cheaper than the current ones, and it is no longer in production, the lines having shifted to the new stuff.

IanCutress · 2025-02-12T09:54:29 1739354069

HBM2 is still in volume production. New products coming out with it on the ASIC side. Gaudi 3 uses HBM2e.

mppm · 2025-02-12T12:28:46 1739363326

Interesting. So a 32-chip GDDR6 clamshell design could pack 64GB VRAM with about 2TB/s on a 1024bit bus, consuming around 100W for the memory subsystem? With current chip prices [1], this would cost just about 200$ (!) for the memory chips, apparently. So theoretically, it should be possible to build fairly powerful AI accelerators in the 300W and < 1000$ range. If one wanted to, that is :)

1. https://dramexchange.com/

devit · 2025-02-11T22:53:04 1739314384

I wonder if a multiplexer would be feasible?

Hardware-wise instead of putting the chips on the PCB surface one would mount an 16-gonal arrangement of perpendicular daughterboards, each containing 2-16 GDDR chips where there would be normally one, with external liquid cooling, power delivery and PCIe control connection.

Then each of the daughterboards would feature a multiplexer with a dual-ported SRAM containing a table where for each memory page it would store the chip number to map it to and it would use it to route requests from the GPU, using the second port to change the mapping from the extra PCIe interface.

API-wise, for each resource you would have N overlays and would have a new operation allowing to switch the resource overlay (which would require a custom driver that properly invalidates caches).

This would depend on the GPU supporting the much higher latency of this setup and providing good enough support for cache flushing and invalidation, as well as deterministic mapping from physical addresses to chip addresses, and the ability to manufacture all this in a reasonably affordable fashion.

Tuna-Fish · 2025-02-12T09:33:29 1739352809

Not at GDDR speeds.

GPUs use special DRAM that has much higher bandwidth than the DRAM that's used with CPUs. The main reason they can achieve this higher bandwidth at low cost is that the connection between the GPU and the DRAM chip is point-to-point, very short, and very clean. Today, even clamshell memory configuration is not supported by plugging two memory chips into the same bus, it's supported by having the interface in the GDDR chips internally split into two halves, and each chip can either serve requests using both halves at the same time, or using only one half over twice the time.

You are definitely not passing that link through some kind of daughterboard connector, or a flex cable.

nenaoki · 2025-02-12T15:29:18 1739374158

>A clamshell 5090 with 2GB modules has a max limit of 64GB

How does "clamshelling" get around the 32-bits per module requirement? Do the two 2GB modules act as one 4GB module when clamshelled?

m4rtink · 2025-02-12T10:53:49 1739357629

So I guess we just wait for HBM to get cheaper and better, which should not take too long, given how much money is being pumped into it ?