30% is a conservative estimate (to be precise, we went with this benchmark: http...

p1esk · on Sept 11, 2024

Did you try running this task (finetuning Llama) on Nvidia GPUs? If yes, can you provide details (which cloud instance and time)?

I’m curious about your reported 30-70% speedup.

felarof · on Sept 11, 2024

I think you slightly misunderstood, and I wasn't clear enough—sorry! It's not a 30-70% speedup; it's 30-70% more cost-efficient. This is mainly due to non-NVIDIA chipsets (e.g., Google TPU) being cheaper, with some additional efficiency gains from JAX being more closely integrated with the XLA architecture.

No, we haven't run our JAX + XLA on NVIDIA chipsets yet. I'm not sure if NVIDIA has good XLA backend support.

p1esk · on Sept 11, 2024

Then how did you compute the 30-70% cost efficiency numbers compared to Nvidia if you haven’t run this Llama finetuning task on Nvidia GPUs?

felarof · on Sept 11, 2024

Check out this benchmark where they did an analysis: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blo....

At the bottom, it shows the calculations around the 30% cost efficiency of TPU vs GPU.

Our range of 30-70% is based on some numbers we collected from running fine-tuning runs on TPU and comparing them to similar runs on NVIDIA (though not using our code but other OSS libraries).

p1esk · on Sept 11, 2024

It would be a lot more convincing if you actually ran it yourself and did a proper apples to apples comparison, especially considering that’s the whole idea behind your project.

KaoruAoiShiho · on Sept 11, 2024

It's also comparing prices on google cloud, which has its own markup, a lot more expensive than say runpod. Runpod is $1.64/hr for the A100 on secure cloud while the A100 on Google is $4.44/hr. A lot more expensive... yeah. So in that context a 30% price beat is actually a huge loss overall.

spullara · on Sept 12, 2024

who trains on a100 at this point lol

KaoruAoiShiho · on Sept 12, 2024

It's the chosen point of comparison on the linked paper.

felarof · on Sept 11, 2024

Totally agree, thanks for feedback! This is one of the TODOs on our radar.