I know a lot depends on architecture and number representation, but do people ha...

amasad · on Oct 11, 2023

We're on a budget :) trained on 128 H100-80GB GPUs for a week (200B tokens over 5 epochs, ie 1T tokens).

Tech talk here with timestamp: https://www.youtube.com/live/veShHxQYPzo?si=UlcU9j2kC-C4oWvj...

nojvek · on Oct 11, 2023

Each H100 is ~$30,000, so $3.8M in capex cost.

Roughly $1/hr/GPU in power cost so looking at 128247 = $21,504.

Cheap compared to OpenAI, but not something an indiehacker can do by themselves unless they have millions to burn.

grey8 · on Oct 11, 2023

The Huggingface page of Replit 3Bs says "The model has been trained on the MosaicML platform on 128 H100-80GB GPUs."

Source: https://huggingface.co/replit/replit-code-v1_5-3b

I'm not an ML engineer, just interested in the space - but as a general ballpark, training these models from scratch needs hundreds to thousands of GPUs.