> I have no idea how OpenAI can make money on this. I did some quick calculation...

freeqaz · on March 1, 2023

It's also worth mentioning that, because Microsoft is an investor, they're likely getting these at cost or subsidized.

OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).

As the cost of GPUs goes down, or they develop at ASIC or more efficient model, they can keep their pricing the same and then make money later.

They also likely can make money other ways like by allowing fine-tuning of the model or charging to let people use the model with sensitive data.

somenameforme · on March 2, 2023

Who will they be making money from? OpenAI is looking for companies willing to:

- tolerate the current state of the chatbots

- tolerate the high per-query latency

- tolerate having all queries sent to OpenAI

- tolerate OpenAI [presumably] having 0 liability for ChatGPT just randomly hallucinating inappropriate nonsense

- be willing to pay a lot of money for the above

I'm kind of making an assumption on that last point, but I suspect this is going to end up being more small market business to business than mass market business to consumer. A lot of these constraints make it not really useable for many things. It's even somewhat suspect for the most obvious use case of search, not only because of latency but also because the provider needs to make more money per search after the bot than before. There's also the caching issue. Many potential uses are probably going to be more inclined to get the answers and cache them to reduce latency/costs/'failures' than endlessly pay per-use.

Anyhow, probably a lack of vision on my part. But I'd certainly like to know what I'm not seeing.

Too · on March 2, 2023

Sadly, it will likely mostly be used to generate endless streams of SEO spam. Not for interactive use.

CyanBird · on March 2, 2023

> Who will they be making money from?

Videogames maybe?

https://www.youtube.com/watch?v=ejw6OI4_lJw

This prototype is certainly something to have an eye out for

gimili · on March 2, 2023

Lots of usecases actually need creative "hallucinations" where they are valuable.

Even e.g. to develop hardware such as planes and cars: https://assistedeverything.substack.com/p/todays-ai-sucks-at...

dpkirchner · on March 2, 2023

A lot of companies use third parties to provide customer support, and the results are often very low quality and full of misunderstandings and what we now call hallucinations. I think a good LLM could do a better job and I bet it'd be cheaper, too. And as a bonus training the bots to handle new products is practically instant when compared to training humans.

whatshisface · on March 1, 2023

Their new AI safety strategy is to slow the development of the technology by dumping, to lower the price too much to fund bootstrapped competitors.

Silverback_VII · on March 2, 2023

I highly doubt it. OpenAI, Google and Meta are not the only ones who can implement these systems. The race for AGI is one for power and power is survival.

nr2x · on March 3, 2023

LLM can do amazing things, but it’s a basically just an autocomplete system. It has the same potential to take over the world as your phones keyboard. It’s just a tool.

dev1ycan · on March 2, 2023

They want this, the interview from their CEO sorta confirmed that to me, he said some crap about wanting to release it slowly for "safety" (we all know this is a lie).

But he can't get away with it with all the competition in other companies coming on top of China, Russia and others also adopting AI development

npunt · on March 1, 2023

Yeah we're in an AI landgrab right now where at- or below-cost pricing is buying marketshare, lock-in, and underdevelopment of competitors. Smart move for them to pour money into it.

whatshisface · on March 1, 2023

We have got to find a word for plans that are plainly harmful yet advantageous to their executors that's more descriptive than "smart..."

mgreg · on March 2, 2023

It's called "predatory pricing".

https://www.ftc.gov/advice-guidance/competition-guidance/gui...

npunt · on March 1, 2023

Agree. I didn't want to moralize, just wanted to point out it's a shrewd business move. It's rather anticompetitive, though that is hard to prove in such a dynamic market. Who knows, we may soon be calling it 'antitrust'.

aaronblohowiak · on March 1, 2023

Shrewd or cunning

ugh123 · on March 1, 2023

For that you need 2 words: venture capital

dankwizard · on March 1, 2023

Tactical

pharmakom · on March 2, 2023

Economists call this price dumping

jimbokun · on March 1, 2023

Uberly?

JW_00000 · on March 2, 2023

I prefer Webvan-esque. From https://en.wikipedia.org/wiki/Webvan:

> The company's investors pressured it to grow very fast to obtain first-mover advantage. This rapid growth was cited as one of the reasons for the downfall of the company.

IMO, selling at a loss to gain market share only makes sense if there are network effects that lead to a winner-takes-all situation. Of which there are some for ChatGPT (training data when people press the thumbs up/down buttons), but is that sufficient?

badloginagain · on March 2, 2023

Also useful for bootstrapping a dev ecosystem.

If engineers are getting into AI development through OpenAI, they're using tools and systems within the OpenAI ecosystem.

Daily on HN there's a post on some AI implementation faster than chatgpt. But my starting point is OpenAI. If you can capture the devs, especially at this stage, you get a force multiplier.

jeron · on March 2, 2023

I prefer uber-esque

FractalHQ · on March 2, 2023

anti-competitive predatory pricing

am44jnsf · on March 2, 2023

capitalism

Retr0id · on March 1, 2023

capitalistic, monopolistic

mach1ne · on March 2, 2023

Not very effective considering that it will be remade in open source 1-2 years from now.

zpeti · on March 2, 2023

Yeah, if I was an owner or investor like Jasper.ai (AI written content generation SaaS) I'd be pretty worried right now.

andrepd · on March 2, 2023

OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).

Maybe I'm just old but back in my day this would be called "dumping" or "anti-competitive" or "market distortion" or "unfair competition". Now it's just the standard way of doing things.

kcatskcolbdi · on March 2, 2023

Sure it would be called those things and then nothing would come of it. If a country uses morally compromised methods to win a war history just calls it winning the war.

komali2 · on March 2, 2023

That seems to be changing. I've seen an uptick in criticism against the usa for unnecessarily (according to top military advisors, experts, generals etc at the time) dropping the atom bomb on Japan for example.

Doctor_Fegg · on March 2, 2023

Absolutely. The bombing of Dresden has been viewed as a mistake - verging on a war crime - in Britain for the last 20 or so years.

counttheforks · on March 2, 2023

Verging on? It was a mass murder of civilians. The US holocausted Japan and got away with it.

Tostino · on March 2, 2023

And Japan did similar levels of atrocities to Korea, China, and others in the region.

We can acknowledge that things were historically pretty horrible and strive to be better in the future.

timthorn · on March 2, 2023

By some people - that's certainly not a universal view.

SturgeonsLaw · on March 2, 2023

Wouldn't stop anyone from doing it again if the stakes were high enough

SHARSKY · on March 2, 2023

The winners write the history books

nr2x · on March 3, 2023

You kidding? What do you think a business loan is? Almost every business needs some form of subsidy to get off the ground.

panarky · on March 2, 2023

Microsoft isn't using Nvidia A100s for inference are they? Seems like they'd use their Project Brainwave custom FPGAs.

UncleOxidant · on March 1, 2023

> As the cost of GPUs goes down

Has that been happening? I guess there's been a bit of a dip after the crypto crash, but are prices staying significantly lower?

> or they develop at ASIC or more efficient model

This seems likely. Probably developing in partnership with Microsoft.

freeqaz · on March 1, 2023

It's definitely not happening at the high end of the market (NVIDIA A100s with 40GB or 80GB of RAM).

The cards that were used for mining have since crashed in terms of prices, but those were always gamer cards and very rarely Datacenter cards.

inciampati · on March 1, 2023

The market segmentation is likely a result of Nvidia's monopoly position. They double the RAM and flops, improve the thermals and housing and sell for ten fold the price. It doesn't make sense to me. A cheap 4090 theoretically outperforms even the A6000 RTX Ada. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

Nvidia needs to satisfy gamers, who individually can't spend more than a few $k on a processor. But they also have the server sector on lockdown due to CUDA. Seems they can easily make money in both places. Maybe those H100s aren't such a good deal...

If someone understands these dynamics better I'd be curious to learn!

dgacmu · on March 2, 2023

Nope, this is about it. They try to force the larger users into the expensive cards by prohibiting datacenter use in the driver EULA. This works sufficiently well in America, but it also means that you can find German companies like Hetzner that will happily rent you lots of consumer cards.

(There are also some density advantages to the SMX form factor and the datacenter cards are passively cooled so you can integrate them into your big fan server or whatnot. But those differences are relatively small and certainly not on their own worth the price difference. It's mostly market segmentation.)

lumost · on March 2, 2023

The main limiter in the data center setting is licensing, interconnects, and ram.

By contract - you can’t sell 4090s in a data center. You’ll find a few shops skirting this, but nobody can get their hands on 100k 4090s without raising legal concerns.

Likewise, nvidia A100s have more than a few optimizations through nvlink which are only available on data center chips.

Lastly, per card memory matters a lot Nvidia has lead the market on the high end here.

jhrmnn · on March 1, 2023

I understood this as $/FLOP, I think it's plausible that that has been happening.

cubefox · on March 2, 2023

"We know the number of floating point operations per token for inference is approximately twice the number of parameters"

Does someone have a source for this?

(By the way, it is unknown how many parameters GPT-3.5 has, the foundation model which powers finetuned models like ChatGPT and text-davinci-003. GPT-3 had 175 billion parameters, but per the Hoffmann et al Chinchilla paper it wasn't trained compute efficiently, i.e. it had too many parameters relative to its amount of training data. It seems likely that GPT-3.5 was trained on more data with fewer parameters, similar to Chinchilla. GPT-3: 175B parameters, 300B tokens; Chinchilla: 70B parameters, 1.4T tokens.)

vishal0123 · on March 2, 2023

https://arxiv.org/pdf/2001.08361.pdf. See the C_forward formula approxiamtion.

cubefox · on March 2, 2023

Thank you. Though it isn't quite clear to me whether the additive part is negligible?

vishal0123 · on March 2, 2023

From the paper

> For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute.

For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.

sytelus · on March 2, 2023

From eq 2.2, additive part is usually in few 10s of millions. So, for N > 1B, approximation should be good but it doesn't work. For example, GPT3 inference flops is actually 3.4E+18 so the ratio is 19,000 not 2.

minimaxir · on March 1, 2023

It's speculated that ChatGPT uses 8x A100s, which flips the conclusion. Although the ChatGPT optimizations done to reduce costs could have also reduced the number of GPUs needed to run it.

mlyle · on March 1, 2023

No, the amount of math done is (approximately) the same; if you make the denominator 8x bigger, you make the numerator 8x bigger too.

refulgentis · on March 1, 2023

Would multiplying the GPUs by 8 decrease another part of the equation by 1/8, i.e. X flops on 1 GPU = Y seconds, X flops on 8 GPUs = Y / 8?

(Btw I keep running into you or your content the past couple months, thanks for all you do and your well thought out contributions -@jpohhhh)

pelasaco · on March 1, 2023

I checked the price of a A100, and its costs 15k? Is that right?

alchemist1e9 · on March 1, 2023

And $2.25 per hour on 1 year reservation means 8,760 hours x 2.25 = $19,710 rent for the year. Not a bad yield for the provider at all, but makes sense given overheads and ROI expected.

rictic · on March 1, 2023

Cost of power usage is marginal compared to that too:

300W per A100 * 8766 hours per year * $0.12 per kWh = $316 to power an A100 for a year

Tepix · on March 2, 2023

$0.12 per kWh is a very low price these days

15155 · on March 2, 2023

Is this a low price for a datacenter negotiating their load with a utility provider (as most do?)

pelasaco · on March 1, 2023

yes, specially that you don't have to deal with buying it, maintaining it, etc...

sroussey · on March 1, 2023

Not sure why people are so scared of this (in general). Yes, it’s a pain, but only an occasional pain.

I’ve had servers locked up in a cage for years without seeing them. And the cost for bandwidth has plummeted over the last two decades. (Not at AWS, lol)

Sebb767 · on March 2, 2023

The problem isn't the good times, the problem is when something happens in the middle of the night, when a RAM stick goes bad or when you suddenly need triple the compute power. Usually, you get to feel the pain when you need it the least.

I'm hosting a lot of stuff myself on my own hardware, so I do sympathize with this argument, but in a time>>money situation, going to the cloud makes a lot of sense.

pelasaco · on March 2, 2023

exactly, you pay for the case where a down time on Sunday happens or you are in vacation out of the city and something happens.. I had this issue back in the days with my bitcoin miners.. Always when I was out of the city, one of them went down and I wanted to go back ASAP

thewataccount · on March 1, 2023

Wait 8x total? For everyone at once?

freeqaz · on March 1, 2023

Per instance (worker serving an API request) it requires 8x GPUs. I believe they have thousands of these instances and they scale them up with load.

Because the model isn't dynamic (it doesn't learn) it is stateless and can be scaled elastically.

thewataccount · on March 1, 2023

Ah okay, that makes a lot more sense thank you!

pharmakom · on March 2, 2023

I expect some level of caching and even request bucketing by similarity is possible.

How many users come with the same prompt?

thewataccount · on March 2, 2023

In my experience running the same prompt always get's different results. Maybe they cache between different people but I'm not sure that'd be worth the cache space at that point? although 8x A100s is a lot to not have caching...

vineyardmike · on March 1, 2023

Each model needs 8x to run at the same time per request.

cavisne · on March 2, 2023

Does openai actually specify the size of the model?

InstructGPT 2B outperformed gpt 3 175B, and chatgpt has a huge corpus of distilled prompt -> response data now.

I’m assuming most of these requests are being served from a much smaller model to justify the price.

OpenAI is fundamentally about training larger models, I doubt they want to be in the business of selling A100 capacity at cost when it could be used for training

kkielhofner · on March 1, 2023

But those A100s only come by eight and it’s speculated the model requires eight (VRAM).

For a three year reservation that comes to over $96k/yr - to support one concurrent request.

ALittleLight · on March 1, 2023

What do you mean one concurrent request? Can't you have a huge batch size to basically support a huge number of concurrent requests?

e.g. Endpoint feeds a queue, queue fills a batch, batched results generate replies. You are simultaneously fulfilling many requests.

kkielhofner · on March 1, 2023

Hopefully they’re doing plenty of batching - you don’t even need to roll your own as you’re describing. Inference servers like Triton will dynamically batch requests with SLA params for max response time (for example).

That said I don’t think anyone anyone outside of OpenAI knows what’s going on operationally. Same goes for VRAM usage, potential batch sizes, etc. This is all wild speculation. Same goes for whatever terms OpenAI is getting out of MS/Azure.

What isn’t wild speculation is that even with three year reserve pricing last gen A100x8 (H100 is shipping) will set you back $100k/yr - plus all of the usual cloud bandwidth, etc fees that would likely increase that by at least 10-20%.

We’re talking about their pricing and costs here. This gives a general idea what anyone trying to self host this would be up against - even if they could get the model.

vishal0123 · on March 2, 2023

> will set you back $100k/yr

This is 6 month of salary of one average developer's salary there. And BTW they are likely doing inference on 100s or 1000s of GPUs, not just 8.

kkielhofner · on March 2, 2023

Yes and a devops engineer to manage an even moderately complex cloud deployment is an average of an extra $150k/yr. I don't know where this "cloud labor skill, knowledge, experience, and time is free" thinking comes from.

8, 80k, or 800k GPUs depending on requirements and load - the point remains the same.

madelyn-goodman · on March 1, 2023

I really wonder if one way they are able to make money on it is by monetizing all the data that pours into these products by the second.

bboygravity · on March 1, 2023

The could probably live off of the NSA sponsoring alone.

ddmma · on March 1, 2023

Spot on

ilaksh · on March 2, 2023

They also mention in the new API docs that they are no longer keeping data submitted to ChatGPT. Or at least not to the ChatGPT API.

_just7_ · on March 2, 2023

Would probably pile up to an inhuman amount of data storage. Imagine having to pay for storing the equivalent of 1000 tokens of text within that budget of only 0.0002 dollars

Tepix · on March 2, 2023

That's one zero too many. Storage cost of 1000 tokens (6000 bytes) on a single HDD is $0.000000096 assuming $16/TB

drexlspivey · on March 1, 2023

the only one making money on this is NVIDIA

bigfudge · on March 1, 2023

Selling shovels in the goldrush…

Dave_Rosenthal · on March 1, 2023

Note that they also charge equally for input and output tokens but, as far as I understand, processing inputs tokens is much computationally cheaper, which drops their price further.

lumost · on March 1, 2023

Isn’t it 2.25 per hour per a100?

TheMagicHorsey · on March 1, 2023

Yes, he means 2.25 per hour with a 1 yr reservation.

p1esk · on March 2, 2023

You can get A100 on Lambda Labs cloud for $1.1/hr ($8.8/hr per 8xA100) without any reservation.

gyrovagueGeist · on March 1, 2023

Its a good baseline, but I very much doubt that openAI is paying anywhere near the public cost for their compute allocation.

lumost · on March 1, 2023

Direct purchasing isn’t too much cheaper. An H100 costs 35k new. OpenAI and MS are probably getting those for around 16k about 1.82 per hour.

osigurdson · on March 1, 2023

This would be a really fun optimization challenge for sure!

smy20011 · on March 2, 2023

The 600t performance is with sparsity in the spec. I think the price is nearly break even if sparsity is not used in the model.

dharma1 · on March 1, 2023

Reckon they will (if not already) use 4bit or 8bit precision and may not need 175b params