> How does it help to know the steps when creating a base model still costs >ten...

Miraste · 2025-01-30T16:26:54 1738254414

Learning resources are nice, but I don't think it's analogous to web dev. I can download nginx and make a useful website right now, no fleet of servers needed. I can even get it hosted for free. Making a useful LLM absolutely, 100% requires huge GPU clusters. There is no entry level, or rather that is the entry level. Because of the scale requirements, FOSS model training frameworks (see GPT-NeoX) are only helpful for large, well-funded labs. It's also difficult to open-source training data, because of copyright.

Finetuning weights and building infrastructure around that involves almost all the same things as building a model, except it's actually possible. That's where I've seen most small-scale FOSS development take place over the last few years.

fzzzy · 2025-01-30T17:58:55 1738259935

This isn't true. Learning how to train a 124M is just as useful as a 700B, and is possible on a laptop. https://github.com/karpathy/nanoGPT

Miraste · 2025-01-30T18:14:19 1738260859

To clarify my point:

Learning how to make a small website is useful, and so is the website.

Learning how to finetune a large GPT is useful, and so is the finetuned model.

Learning how to train a 124M GPT is useful, but the resulting model is useless.

diggan · 2025-01-30T18:18:04 1738261084

> Finetuning weights and building infrastructure around that involves almost all the same things as building a model

Those are two completely different roles? One is mostly around infrastructure and the other is actual ML. There are people who know both, I'll give you that, but I don't think that's the default or even common. Fine-tuning is trivial compared to building your own model and deployments/infrastructure is something else entirely.