Hacker News new | past | comments | ask | show | jobs | submit login

> How does it help to know the steps when creating a base model still costs >tens of millions of dollars?

You can still learn web development even though you don't have 10,000s of users with a large fleet of servers and distributed servers. Thanks to FOSS, it's trivial to go through GitHub and find projects you can learn a bunch from, which is exactly what I did when I started out.

With LLMs, you don't have a lot of options. Sure, you can download and fine-tune the weights, but what if you're interested in how the weights are created in the first place? Some companies are doing a good job (like the folks building OLMo) to create those resources, but the others seems to just want to use FOSS because it's good marketing VS OpenAI et al.




Learning resources are nice, but I don't think it's analogous to web dev. I can download nginx and make a useful website right now, no fleet of servers needed. I can even get it hosted for free. Making a useful LLM absolutely, 100% requires huge GPU clusters. There is no entry level, or rather that is the entry level. Because of the scale requirements, FOSS model training frameworks (see GPT-NeoX) are only helpful for large, well-funded labs. It's also difficult to open-source training data, because of copyright.

Finetuning weights and building infrastructure around that involves almost all the same things as building a model, except it's actually possible. That's where I've seen most small-scale FOSS development take place over the last few years.


This isn't true. Learning how to train a 124M is just as useful as a 700B, and is possible on a laptop. https://github.com/karpathy/nanoGPT


To clarify my point:

Learning how to make a small website is useful, and so is the website.

Learning how to finetune a large GPT is useful, and so is the finetuned model.

Learning how to train a 124M GPT is useful, but the resulting model is useless.


> Finetuning weights and building infrastructure around that involves almost all the same things as building a model

Those are two completely different roles? One is mostly around infrastructure and the other is actual ML. There are people who know both, I'll give you that, but I don't think that's the default or even common. Fine-tuning is trivial compared to building your own model and deployments/infrastructure is something else entirely.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: