Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know a lot depends on architecture and number representation, but do people have a sense for how big a compute cluster is needed to train these classes of models from 1.5B, 3B, 7B, 13B, 70B?

Didn’t Meta say they trained on 2k A100s for LLama 2?



We're on a budget :) trained on 128 H100-80GB GPUs for a week (200B tokens over 5 epochs, ie 1T tokens).

Tech talk here with timestamp: https://www.youtube.com/live/veShHxQYPzo?si=UlcU9j2kC-C4oWvj...


Each H100 is ~$30,000, so $3.8M in capex cost.

Roughly $1/hr/GPU in power cost so looking at 128247 = $21,504.

Cheap compared to OpenAI, but not something an indiehacker can do by themselves unless they have millions to burn.


The Huggingface page of Replit 3Bs says "The model has been trained on the MosaicML platform on 128 H100-80GB GPUs."

Source: https://huggingface.co/replit/replit-code-v1_5-3b

I'm not an ML engineer, just interested in the space - but as a general ballpark, training these models from scratch needs hundreds to thousands of GPUs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: