You're onto something. BabyLM competition had caps. Many LLM's were using 1TB tr...

You're onto something. BabyLM competition had caps. Many LLM's were using 1TB training data for some time.

In many cases, I can't even see how many GPU hours or what size cluster of what GPU's the pretraining required. If I can't afford it, then it doesn't matter what it achieved. What I can afford is what I have to choose from.