Hacker News new | past | comments | ask | show | jobs | submit login

> How much faster is GPU access to the unified memory model on the new Macs/how much less of a hit do you take?

Intel Core i9-13900F memory bandwidth: 89.6 GB/s, memory size up to 192 GB

Apple M3 Pro memory bandwidth: 150GB/s, memory size up to 36GB

Apple M3 Max memory bandwidth: 300GB/s, memory size up to 128GB

GeForce RTX 4090 memory bandwidth: 1008 GB/s, memory size 24GB fixed, no more than two cards per PC.




I don't think the numbers sufficiently capture the limitation. The Intel memory bandwidth speed you quoted would be for CPU-based inference, but not for gpu inference using shared system memory for spillover model size past the dedicated gpu vram. I think that would necessarily limit parts of the inference procedure (not sure how the split would work, and it would probably depend on whether you're using something like flash attention or not) to the available PCI-e 3.0 or 4.0 available bandwidth, as the gpu needs to communicate over the PCIe bus then over the chipset memory bus.

A GPU connected to a PCIe 3.0 x16 electrical uplink would be constrained to ~16GB/s, or ~32GB/s if it were a PCIe 4.0 uplink instead. Although those numbers imply slower bandwidth than CPU inference, that bottleneck would only be when paging in or out (or directly accessing?) layers overflowed to the shared system ram, so they don't really represent much on their own.


Excellent comparison. However, I am confused by

> no more than two cards per PC

I've seen quad 4090 builds, e.g. here[0]. What do you mean no more than two cards? Yes, power is definitely an issue with multiple 4090s, though you can limit the max power using `nvidia-smi`, which IME doesn't hurt (mem-bottlenecked) inference.

[0] https://old.reddit.com/r/watercooling/comments/16ed8fu/quad_...


Apple M2 Ultra: "up to 192GB of memory with 800GB/s of unified memory bandwidth for workstation-class performance."


So M2 is more advanced than M3?


For memory bandwidth at the lower tiers, yeah. M3 Max still has 400GB/s and since the M2 Ultra (800GB/s) is just two M2 Maxes glued together (400GB/s each), the eventual M3 Ultra should be comparable.


That’s like ThreadRipper, thanks for the info. That’s the bandwidth from cpu to memory controllers, is there really no bottleneck to the iGPU?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: