If this is true, then: 1. Training took 21 yottaflops. When was the last time yo...

YeGoblynQueenne · on July 11, 2023

>> The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.

That really doesn't change anything at all. The more training large models gets cheaper, the more large corporations are able to train larger models than everyone else.

Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

jetrink · on July 11, 2023

At a certain point though, models become good enough for particular tasks. Once that happens for whatever my application is, I don't care if OpenAI has a model that's twice as good on some metric, because it's overkill for my use-case. I'm going to be happy using a smaller, cheaper model from a competitor.

londons_explore · on July 11, 2023

I think we're far from that point though. For the vast majority of use cases, I always wish that the answers could be more accurate.

Sure - they might be 'good enough' to build a business on. But if a competitor builds their business on top of a more accurate model, their product will work better, and they will win the market.

unshavedyak · on July 11, 2023

Yea but the bench being discussed here is FOSS. Which for me, and many, translates to can i run something useful in my closet or on my phone. I've found LLaMA neat and yea, some FOSS models are getting decent - but they're a far cry from GPT4. I pay for GPT4, use it almost daily and that's my bench.

Yes, when i can run GPT4 in my closet, OpenAI will have GPT7 or w/e - but it doesn't change the fact that i have something useful running in my closed network and that opens up all kinds of data integration that i'm unwilling to ship to OpenAI. In that day i'll probably still use GPT7, but i'll _also_ have GPT4 running in my closet and integrating with a ton of things on my local network.

mvkel · on July 12, 2023

My guess is you'll be running GPT4 equivalent in your closet, but with a 4K context window.

Where the big guys will have GPT-who-cares-what-version with a 100K context window.

Context size is as much of a big deal as newer generations of models imo.

mptest · on July 12, 2023

Am I right in my layman's understanding that context windows scaling up requires (mainly) much more compute at run time? Or do longer context models require different/longer training?

kmstout · on July 11, 2023

> their product will work better, and they will win the market

Like Betamax?

thelittleone · on July 11, 2023

One important milestone a model that is good enough to produce an acceptable quality of answer to x% of public users questions without any data being sent to the megacorps.

PoignardAzur · on July 11, 2023

> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

I think a better frame is, if rice got so absolutely cheap to make that anybody could spin up a bag of rice on a demand, anybody whose business model was based on selling rice sacks would be in trouble, especially if their specialty was selling rice in bulk instead of, eg, mom-and-pop restaurants selling cooked rice with flavors and a focus on customer experience.

(Not sure the metaphor is a good fit for AI. Maybe OpenAI comes up with GPT-5 and makes something so powerful that by the time OSS projects get to GPT-4 level nobody cares. But if GPT-5 is only incrementally better than GPT-4, then yeah, they have no moat.)

thifhi · on July 11, 2023

Surely there are diminishing returns for the AI computing though? I mean, is a model with 10x the parameter count 10x better? I think it is still possible that the training costs will be irrelevant for all players at some point with this non-linear scale. Access to data is another story

swalsh · on July 11, 2023

10x the parameters? Maybe not in a single model, but maybe 10x the expert models has 10x the value. I'm sure there are diminishing returns eventually, but we're probably not close to that.

PoignardAzur · on July 11, 2023

It's not clear. Scaling laws still seem to hold AFAICT.

Right now the bottleneck is "how big a model can you fit on an H100 TPU". It's possible that in a few years, when bigger cards come out and/or we get better at compressing models, we'll get even better models just by increasing the scale.

mvkel · on July 12, 2023

It's still SO early. We are in the "640K [of memory] ought to be enough for anybody" phase of LLMs. So much more to go.

paxys · on July 11, 2023

> if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And all that rice would be useless since you could only eat one cup a day.

The richest person in the world and someone who is solidly middle class both use the exact same iPhone. After a point more dollars doesn't necessarily mean better or more useful technology. If training "good enough" models becomes cheap enough to be achievable by small-time developers then OpenAI/Google/Anthropic etc. will definitely lose some of their edge in the space.

weird-eye-issue · on July 11, 2023

> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And?

UncleEntity · on July 11, 2023

And...

...the market for rice will totally collapse because it would cost more to transport it than the farmer would make by selling it. Feel free to substitute "rice" for whatever commodity which becomes "too cheap to meter".

The "invisible hand" has a tendency to bitchslap people who don't have an even modest understanding of economic principles.

pas · on July 11, 2023

Training data quality and quantity is the bottleneck.

"Chinchilla showed that we need to be using 11× more data during training than that used for GPT-3 and similar models. This means that we need to source, clean, and filter to around 33TB of text data for a 1T-parameter model." https://lifearchitect.ai/chinchilla/

GPT4 has been trained on images exactly for this reason (it might not have been worth it separately from multi-modality, but together these two advantages seem decisive).

xeckr · on July 11, 2023

>Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

...and billions would be lifted out of poverty, and world hunger would be solved. The rice metaphor doesn't quite apply here.

If the price of GPU training continues to drop at the present rate, then it would be possible to train a GPT-4 level LLM on a $3000 card in 10 years. The ability to run inference on it would come way sooner.

theLiminator · on July 11, 2023

The real moat is an abundance of high quality data.

JimmyRuska · on July 11, 2023

Well open AI raised eye brows by crawling the internet and using everyone's data to make a commercial product

One day some new startup will train on all of libgen and torrent networks, but it will be very hard to prove. You'll keep getting these gaps up in questionable morality and legality, and even openai will complain about playing fair

pas · on July 11, 2023

ThePile already contains some content from a torrent, and there's as lawsuit alleging that Meta has committed copyright infringement by using it.

https://www.theverge.com/2023/7/9/23788741/sarah-silverman-o...

why_only_15 · on July 11, 2023

Many people train on libgen/torrent in the form of books3 (e.g. LLaMa does this).

fragmede · on July 11, 2023

Google Classroom, teenager's essays, written by humans, for learning what it means to be human, and graded by humans, is a richer dataset than anything else I can think of that anyone else couldn't get their hands on.

londons_explore · on July 11, 2023

An awful lot of teachers can grade a 10 page essay in about 90 seconds...

Skim read it, mark out some grammar errors, assign it a grade based on the quality of the opening and closing paragraphs.

fragmede · on July 13, 2023

Yup, and they're doing it the whole country over, and putting that data in to Google Classrooms for Bard to know "this is C-grade work" and "this is A-grade work". Knowing what's deemed good and bad writing is where I'm thinking this dataset shines for training LLMs.

baq · on July 11, 2023

Yeah they have the internet from before LLMs were used for anything, so the data is not poisoned. Not unlike carbon dating becoming useless for estimating age of anything made after nuclear atmospheric tests, or low-background steel.

joiqj · on July 11, 2023

You talk as if humans weren't perfectly capable of coming up with nonsense.

Blogs upon blogs full of worthless pap that is there for SEO reasons have existed for like a decade already.

pixl97 · on July 11, 2023

And those blogs took a decade+ to make, and now in another year we'll make that much information again. Then it will be that much information in a month. Then that much pap in a day.

And in the past it was still a million people making that much crap. Now it's a single "entity" making that much crap with it's own style and mistakes.

quickthrower2 · on July 11, 2023

IMO the real moat right now is expertise / smart teams and cash.

hospitalJail · on July 11, 2023

The infrastructure/training libraries already exists. I'm sure you can get people who worked at scale that can figure out how to glue things together.

Reddit, twitter, etc.. raising prices is going to make it more expensive.

quickthrower2 · on July 11, 2023

If you are right then it just becomes who wants to throw the most cash in like a giant game of poker but where you don’t know the pot odds.

classified · on July 11, 2023

... stolen without regard for copyright and licensing.

pas · on July 11, 2023

Fair use!? /s

londons_explore · on July 11, 2023

> The google memo was right about the lack of a moat.

5 months on, and nobody has yet beaten their result quality. I think there is a moat.

Also, I think for many usecases, smarter is better. If a few cents can buy a more accurate answer, then it is always worth paying those few cents. So, while more hardware and more data can train a bigger better model, then that is the moat.

makestuff · on July 11, 2023

The moat is there until someone releases (or leaks) comparable training data.

droopyEyelids · on July 11, 2023

And that gets more difficult every day, as previously accesible sources of data turn off their api.

though google may have something up its sleeve with the corpus of google books! I have been wondering if openAI secretly pulled in scihub or zlibrary to neutralize that potential advantage.

londons_explore · on July 11, 2023

Are there any stats on how many words are in google books, vs how many words are on the open web?

My feeling is that the web has a lot more on it than the total of all libraries - simply because anyone can start a blog, but publishing a book requires quite some commitment.

droopyEyelids · on July 11, 2023

I think you're right but I also think the text in published books would be at least an order of magnitude more valuable than the same length of text from the web

2OEH8eoCRo0 · on July 11, 2023

> great news for open source.

Yes, and great news for shills, bad actors, agitators, trolls, foreign intel, and propagandists. I'm impressed by the tech but terrified because for once I cannot conceive of what this means for the future. My guess is that this kills the open web and laws get passed which bury it.

flangola7 · on July 11, 2023

Everybody is self-soothing with the idea that OpenAI's (frankly, half hearted) push for regulation is just mundane regulatory capture and profit seeking, and not the fact that it will, at best, absolutely destroy everything about the internet and technology that we've come to love and know. Should a 4chan torrent show up like LLaMA, with weights and code for a base GPT4-level model, modern society is done. Golden age over.

netsec_burn · on July 11, 2023

From your perspective, how would modern society be "done" if GPT-4 was generally available? How would it be substantially different from LLaMA?

flangola7 · on July 11, 2023

GPT-4 is far more capable than LLaMA. Just as one area of impact - captchas would become permanently ineffective. If you're experienced in developing captchas and everything they do for us, you know the implications of that alone lead to a very dystopian internet and world.

I like to answer a question with a question: if you sit and think about it, what both unintentional misuses and intentional abuses can you think of? It helps to write down a list of known abilities, then thinking up several "what if..." negative utilities or implications of each, then iterating further to see second, third, fourth order effects.

dragonwriter · on July 11, 2023

> If you're experienced in developing captchas and everything they do for us

What they have done, fairly overtly for a long time, is train AI to defeat captchas.

That this was self-limiting was somewhat obvious.

ccooffee · on July 11, 2023

Hey, captchas also prevent disabled people from using the internet!