Hacker News new | past | comments | ask | show | jobs | submit login

Not for inferencing. M3 Ultra runs big LLMs twice as fast as RTX 5090.

https://creativestrategies.com/mac-studio-m3-ultra-ai-workst...

RTX 5090 only has 32GB RAM. M3 Ultra has up to 512 GB with 819 GB/sec bandwidth. It can run models that will not fit on an RTX card.

EDIT: Benchmark may not be properly utilizing the 5090. But the M3 Ultra is way more capable than an entry level RTX card at LLM inferencing.




My little $599 Mac Mini does inference about 15-20% slower than a 5070 in my kids’ gaming rig. They cost about the same, and I got a free computer.

Nvidia makes an incredible product, but apples different market segmentation strategy might make it a real player in the long run.


It can run models that cannot fit on TEN rtx 5090s (yes, it can run DeepSeek V3/R1, quantized at 4 bit, at a honest 18-19 tok/s, and that's a model you cannot fit into 10 5090s..).


Right, that's the $9500 Mac Studio with 512GB RAM and 80-core GPU.

16x the RAM of RTX 5090.

There are two versions of the M3 Ultra

28-core CPU, 60-core GPU

32-core CPU, 80-core GPU

Both have a 32-core Neural Engine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: