1. Training took 21 yottaflops. When was the last time you saw the yotta- prefix for anything?
2. The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.
>> The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.
That really doesn't change anything at all. The more training large models gets cheaper, the more large corporations are able to train larger models than everyone else.
Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.
At a certain point though, models become good enough for particular tasks. Once that happens for whatever my application is, I don't care if OpenAI has a model that's twice as good on some metric, because it's overkill for my use-case. I'm going to be happy using a smaller, cheaper model from a competitor.
I think we're far from that point though. For the vast majority of use cases, I always wish that the answers could be more accurate.
Sure - they might be 'good enough' to build a business on. But if a competitor builds their business on top of a more accurate model, their product will work better, and they will win the market.
Yea but the bench being discussed here is FOSS. Which for me, and many, translates to can i run something useful in my closet or on my phone. I've found LLaMA neat and yea, some FOSS models are getting decent - but they're a far cry from GPT4. I pay for GPT4, use it almost daily and that's my bench.
Yes, when i can run GPT4 in my closet, OpenAI will have GPT7 or w/e - but it doesn't change the fact that i have something useful running in my closed network and that opens up all kinds of data integration that i'm unwilling to ship to OpenAI. In that day i'll probably still use GPT7, but i'll _also_ have GPT4 running in my closet and integrating with a ton of things on my local network.
Am I right in my layman's understanding that context windows scaling up requires (mainly) much more compute at run time? Or do longer context models require different/longer training?
One important milestone a model that is good enough to produce an acceptable quality of answer to x% of public users questions without any data being sent to the megacorps.
> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.
I think a better frame is, if rice got so absolutely cheap to make that anybody could spin up a bag of rice on a demand, anybody whose business model was based on selling rice sacks would be in trouble, especially if their specialty was selling rice in bulk instead of, eg, mom-and-pop restaurants selling cooked rice with flavors and a focus on customer experience.
(Not sure the metaphor is a good fit for AI. Maybe OpenAI comes up with GPT-5 and makes something so powerful that by the time OSS projects get to GPT-4 level nobody cares. But if GPT-5 is only incrementally better than GPT-4, then yeah, they have no moat.)
Surely there are diminishing returns for the AI computing though? I mean, is a model with 10x the parameter count 10x better? I think it is still possible that the training costs will be irrelevant for all players at some point with this non-linear scale. Access to data is another story
10x the parameters? Maybe not in a single model, but maybe 10x the expert models has 10x the value. I'm sure there are diminishing returns eventually, but we're probably not close to that.
It's not clear. Scaling laws still seem to hold AFAICT.
Right now the bottleneck is "how big a model can you fit on an H100 TPU". It's possible that in a few years, when bigger cards come out and/or we get better at compressing models, we'll get even better models just by increasing the scale.
> if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.
And all that rice would be useless since you could only eat one cup a day.
The richest person in the world and someone who is solidly middle class both use the exact same iPhone. After a point more dollars doesn't necessarily mean better or more useful technology. If training "good enough" models becomes cheap enough to be achievable by small-time developers then OpenAI/Google/Anthropic etc. will definitely lose some of their edge in the space.
...the market for rice will totally collapse because it would cost more to transport it than the farmer would make by selling it. Feel free to substitute "rice" for whatever commodity which becomes "too cheap to meter".
The "invisible hand" has a tendency to bitchslap people who don't have an even modest understanding of economic principles.
Training data quality and quantity is the bottleneck.
"Chinchilla showed that we need to be using 11× more data during training than that used for GPT-3 and similar models. This means that we need to source, clean, and filter to around 33TB of text data for a 1T-parameter model." https://lifearchitect.ai/chinchilla/
GPT4 has been trained on images exactly for this reason (it might not have been worth it separately from multi-modality, but together these two advantages seem decisive).
>Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.
...and billions would be lifted out of poverty, and world hunger would be solved. The rice metaphor doesn't quite apply here.
If the price of GPU training continues to drop at the present rate, then it would be possible to train a GPT-4 level LLM on a $3000 card in 10 years. The ability to run inference on it would come way sooner.
Well open AI raised eye brows by crawling the internet and using everyone's data to make a commercial product
One day some new startup will train on all of libgen and torrent networks, but it will be very hard to prove. You'll keep getting these gaps up in questionable morality and legality, and even openai will complain about playing fair
Google Classroom, teenager's essays, written by humans, for learning what it means to be human, and graded by humans, is a richer dataset than anything else I can think of that anyone else couldn't get their hands on.
Yup, and they're doing it the whole country over, and putting that data in to Google Classrooms for Bard to know "this is C-grade work" and "this is A-grade work". Knowing what's deemed good and bad writing is where I'm thinking this dataset shines for training LLMs.
Yeah they have the internet from before LLMs were used for anything, so the data is not poisoned. Not unlike carbon dating becoming useless for estimating age of anything made after nuclear atmospheric tests, or low-background steel.
And those blogs took a decade+ to make, and now in another year we'll make that much information again. Then it will be that much information in a month. Then that much pap in a day.
And in the past it was still a million people making that much crap. Now it's a single "entity" making that much crap with it's own style and mistakes.
> The google memo was right about the lack of a moat.
5 months on, and nobody has yet beaten their result quality. I think there is a moat.
Also, I think for many usecases, smarter is better. If a few cents can buy a more accurate answer, then it is always worth paying those few cents. So, while more hardware and more data can train a bigger better model, then that is the moat.
And that gets more difficult every day, as previously accesible sources of data turn off their api.
though google may have something up its sleeve with the corpus of google books! I have been wondering if openAI secretly pulled in scihub or zlibrary to neutralize that potential advantage.
Are there any stats on how many words are in google books, vs how many words are on the open web?
My feeling is that the web has a lot more on it than the total of all libraries - simply because anyone can start a blog, but publishing a book requires quite some commitment.
I think you're right but I also think the text in published books would be at least an order of magnitude more valuable than the same length of text from the web
Yes, and great news for shills, bad actors, agitators, trolls, foreign intel, and propagandists. I'm impressed by the tech but terrified because for once I cannot conceive of what this means for the future. My guess is that this kills the open web and laws get passed which bury it.
Everybody is self-soothing with the idea that OpenAI's (frankly, half hearted) push for regulation is just mundane regulatory capture and profit seeking, and not the fact that it will, at best, absolutely destroy everything about the internet and technology that we've come to love and know. Should a 4chan torrent show up like LLaMA, with weights and code for a base GPT4-level model, modern society is done. Golden age over.
GPT-4 is far more capable than LLaMA. Just as one area of impact - captchas would become permanently ineffective. If you're experienced in developing captchas and everything they do for us, you know the implications of that alone lead to a very dystopian internet and world.
I like to answer a question with a question: if you sit and think about it, what both unintentional misuses and intentional abuses can you think of? It helps to write down a list of known abilities, then thinking up several "what if..." negative utilities or implications of each, then iterating further to see second, third, fourth order effects.
1. Training took 21 yottaflops. When was the last time you saw the yotta- prefix for anything?
2. The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.