> For example, training and serving Llama 3.1 on Google TPUs is about 30% cheape...

reissbaker · 2024-09-12T08:17:37 1726129057

Runpod charges $3.49/hr for an H100 SXM, which is fairly cheap as far as on-demand H100s go. A v5p TPU is $4.20/hr, but has 95GB RAM instead of 80GB on the H100 — so you'll need fewer TPUs to get the same amount of RAM.

Runpod is ever-so-slightly cheaper than Google TPUs on-demand on a per-GB basis: about 4.3 cents an hour per GB for Runpod vs 4.4 cents an hour per GB for a TPU. But let's look at how they compare with reserved pricing. Runpod is $2.79/hr with a 3-month commitment (the longest commitment period they offer), whereas Google offers v5p TPUs for $2.94/hr for a 1-year commitment (the shortest period they offer; and to be honest, you probably don't want to make 3-year commitments in this space, since there are large perf gains in successive generations).

If you're willing to do reserved capacity, Google is cheaper than Runpod per GB of RAM you need to run training or inference: Runpod is about 3.4 cents per GB per hour vs Google for about 3.09 cents per GB per hour. Additionally, Google presumably has a lot more TPU capacity than Runpod has GPU capacity, and doing multi-node training is a pain with GPUs and less so with TPUs.

Another cheap option to benchmark against is Lambda Labs. Now, Lambda is pretty slow to boot, and considerably more annoying to work with (e.g. they only offer preconfigured VMs, so you'll need to do some kind of management on top of them). They offer H100s for $2.99/hr "on-demand" (although in my experience, prepare to wait 20+ minutes for the machines to boot); if cold boot times don't matter to you, they're even better than Runpod if you need large machines (they only offer 8xH100 nodes, though: nothing smaller). For a 1-year commit, they'll drop prices to $2.49/hr... Which is still more expensive on a per-GB basis than TPUs — 3.11 cents per GB per hour vs 3.09 cents per GB per hour — and again I'd trust Google's TPU capacity more than Lambda's H100 capacity.

It's not dramatically cheaper than the cheapest GPU options available, but it is cheaper if you're working with reserved capacity — and probably more reliably available in large quantities.

felarof · 2024-09-12T15:22:39 1726154559

Thank you for the detailed analysis. We need to spend some time thinking and coming up with a price comparison like this. We’ll use this as inspiration!

Palmik · 2024-09-12T09:45:38 1726134338

VRAM per GPU isn't such an interesting metric. If it was, everyone would be fine tuning on A100 80gb :)

What matters is steps per $ and to some degree also speed (I'm happy to pay premium sometimes to get the fine tuning results faster).

reissbaker · 2024-09-12T11:23:51 1726140231

True, but a TPU v5p is supposedly much closer to an H100 than an A100 (the A100 and TPU v4 were fairly similar) — and you need the RAM as a baseline just to fit the model. I haven't seen super thorough benchmarking done between the two but the Google claims similar numbers. So, $/RAM/hr is all I can really look at without benchmarking sadly.

spullara · 2024-09-12T05:34:51 1726119291

GCP is one of the cheapest places you can get them at scale.

danvdb · 2024-09-12T08:06:33 1726128393

Wouldn't really say it's the cheapest option...there are other providers like Lambda Labs or Ori.co where you can find them way cheaper

Palmik · 2024-09-12T09:42:58 1726134178

Tell me more.

At what scale were you able to get a significant discount and how much?

Most people will be (full) fine tuning on 8xh100 or 16xh100 for few days at a time.