Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For example, training and serving Llama 3.1 on Google TPUs is about 30% cheaper than NVIDIA GPUs

When you say this, you should specify which Nvidia GPU you mean (I assume h100 SXM) and that price you are assuming for such GPU.

One can't simply compare based on the on demand price on GCP, because the Nvidia GPUs there are extremely overpriced.



Runpod charges $3.49/hr for an H100 SXM, which is fairly cheap as far as on-demand H100s go. A v5p TPU is $4.20/hr, but has 95GB RAM instead of 80GB on the H100 — so you'll need fewer TPUs to get the same amount of RAM.

Runpod is ever-so-slightly cheaper than Google TPUs on-demand on a per-GB basis: about 4.3 cents an hour per GB for Runpod vs 4.4 cents an hour per GB for a TPU. But let's look at how they compare with reserved pricing. Runpod is $2.79/hr with a 3-month commitment (the longest commitment period they offer), whereas Google offers v5p TPUs for $2.94/hr for a 1-year commitment (the shortest period they offer; and to be honest, you probably don't want to make 3-year commitments in this space, since there are large perf gains in successive generations).

If you're willing to do reserved capacity, Google is cheaper than Runpod per GB of RAM you need to run training or inference: Runpod is about 3.4 cents per GB per hour vs Google for about 3.09 cents per GB per hour. Additionally, Google presumably has a lot more TPU capacity than Runpod has GPU capacity, and doing multi-node training is a pain with GPUs and less so with TPUs.

Another cheap option to benchmark against is Lambda Labs. Now, Lambda is pretty slow to boot, and considerably more annoying to work with (e.g. they only offer preconfigured VMs, so you'll need to do some kind of management on top of them). They offer H100s for $2.99/hr "on-demand" (although in my experience, prepare to wait 20+ minutes for the machines to boot); if cold boot times don't matter to you, they're even better than Runpod if you need large machines (they only offer 8xH100 nodes, though: nothing smaller). For a 1-year commit, they'll drop prices to $2.49/hr... Which is still more expensive on a per-GB basis than TPUs — 3.11 cents per GB per hour vs 3.09 cents per GB per hour — and again I'd trust Google's TPU capacity more than Lambda's H100 capacity.

It's not dramatically cheaper than the cheapest GPU options available, but it is cheaper if you're working with reserved capacity — and probably more reliably available in large quantities.


Thank you for the detailed analysis. We need to spend some time thinking and coming up with a price comparison like this. We’ll use this as inspiration!


VRAM per GPU isn't such an interesting metric. If it was, everyone would be fine tuning on A100 80gb :)

What matters is steps per $ and to some degree also speed (I'm happy to pay premium sometimes to get the fine tuning results faster).


True, but a TPU v5p is supposedly much closer to an H100 than an A100 (the A100 and TPU v4 were fairly similar) — and you need the RAM as a baseline just to fit the model. I haven't seen super thorough benchmarking done between the two but the Google claims similar numbers. So, $/RAM/hr is all I can really look at without benchmarking sadly.


GCP is one of the cheapest places you can get them at scale.


Wouldn't really say it's the cheapest option...there are other providers like Lambda Labs or Ori.co where you can find them way cheaper


Tell me more.

At what scale were you able to get a significant discount and how much?

Most people will be (full) fine tuning on 8xh100 or 16xh100 for few days at a time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: