You can run the 4bit quantized version of it on a M3 Ultra 512GB. That's quite e...

lodovic · 2025-05-28T19:59:25 1748462365

If you use the excess memory for AI only it's cheaper to rent . A single H100 costs less than $2 per hour. (incl power)

diggan · 2025-05-28T20:25:34 1748463934

Vast.ai has a bunch of 1x H100 SXM available, right now the cheapest at $1.554/hr.

Not affiliated, just a (mostly) happy user, although don't trust the bandwidth numbers, lots of variance (not surprising though, it is a user-to-user marketplace).

qingcharles · 2025-05-29T17:26:43 1748539603

Every time someone asks me what hardware to buy to run these at home I show them how many thousands of hours at vast.ai you could get for the same cost.

I don't even know how these Vast servers make money because there is no way you can ever pay off your hardware from the pennies you're getting.

omneity · 2025-05-28T22:23:33 1748471013

Worth mentioning that a single H100 (80-96GB) is not enough to run R1. You're looking at 6-8 GPUs on the lower end, and factor in the setup and download time.

An alternative is to use serverless GPU or LLM providers which abstract some of this for you, albeit at a higher cost and slow starts when you first use your model for some time.

zackangelo · 2025-05-29T06:37:08 1748500628

Yeah, to run the full precision model you need either two 8xH100 nodes connected via Infiniband or one 8xH200 node or one 8xB200 node.

Not for the GPU poor, to be sure.

girvo · 2025-05-29T00:25:32 1748478332

It is enough to run the dynamically quantised 1.56 bit version I believe, which is fun to play around with.