They outlined the methodology. They didn't publish their code or the training se...

cruffle_duffle · 2025-01-29T19:46:41 1738180001

How could they publish the terabytes of training data? A million RAR files?

Honestly would that part even be useful? Like I want to know how they did the training so I can repro it with my own set of training data, right?

I mean, isn't that the future? Somebody figures out how to do P2P distributed training and groups can crawl the web training their own open source models?

tgtweak · 2025-01-29T22:29:10 1738189750

I'd torrent it :D