Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I have no idea how OpenAI can make money on this.

I did some quick calculation. We know the number of floating point operations per token for inference is approximately twice the number of parameters(175B). Assuming they use 16 bit floating point, and have 50% of peak efficiency, A100 could do 300 trillion flop/s(peak 624[0]). 1 hour of A100 gives openAI $0.002/ktok * (300,000/175/2/1000)ktok/sec * 3600=$6.1 back. Public price per A100 is $2.25 for one year reservation.

[0]: https://www.nvidia.com/en-us/data-center/a100/

[1]: https://azure.microsoft.com/en-in/pricing/details/machine-le...



It's also worth mentioning that, because Microsoft is an investor, they're likely getting these at cost or subsidized.

OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).

As the cost of GPUs goes down, or they develop at ASIC or more efficient model, they can keep their pricing the same and then make money later.

They also likely can make money other ways like by allowing fine-tuning of the model or charging to let people use the model with sensitive data.


Who will they be making money from? OpenAI is looking for companies willing to:

- tolerate the current state of the chatbots

- tolerate the high per-query latency

- tolerate having all queries sent to OpenAI

- tolerate OpenAI [presumably] having 0 liability for ChatGPT just randomly hallucinating inappropriate nonsense

- be willing to pay a lot of money for the above

I'm kind of making an assumption on that last point, but I suspect this is going to end up being more small market business to business than mass market business to consumer. A lot of these constraints make it not really useable for many things. It's even somewhat suspect for the most obvious use case of search, not only because of latency but also because the provider needs to make more money per search after the bot than before. There's also the caching issue. Many potential uses are probably going to be more inclined to get the answers and cache them to reduce latency/costs/'failures' than endlessly pay per-use.

Anyhow, probably a lack of vision on my part. But I'd certainly like to know what I'm not seeing.


Sadly, it will likely mostly be used to generate endless streams of SEO spam. Not for interactive use.


> Who will they be making money from?

Videogames maybe?

https://www.youtube.com/watch?v=ejw6OI4_lJw

This prototype is certainly something to have an eye out for


Lots of usecases actually need creative "hallucinations" where they are valuable.

Even e.g. to develop hardware such as planes and cars: https://assistedeverything.substack.com/p/todays-ai-sucks-at...


A lot of companies use third parties to provide customer support, and the results are often very low quality and full of misunderstandings and what we now call hallucinations. I think a good LLM could do a better job and I bet it'd be cheaper, too. And as a bonus training the bots to handle new products is practically instant when compared to training humans.


Their new AI safety strategy is to slow the development of the technology by dumping, to lower the price too much to fund bootstrapped competitors.


I highly doubt it. OpenAI, Google and Meta are not the only ones who can implement these systems. The race for AGI is one for power and power is survival.


LLM can do amazing things, but it’s a basically just an autocomplete system. It has the same potential to take over the world as your phones keyboard. It’s just a tool.


They want this, the interview from their CEO sorta confirmed that to me, he said some crap about wanting to release it slowly for "safety" (we all know this is a lie).

But he can't get away with it with all the competition in other companies coming on top of China, Russia and others also adopting AI development


Yeah we're in an AI landgrab right now where at- or below-cost pricing is buying marketshare, lock-in, and underdevelopment of competitors. Smart move for them to pour money into it.


We have got to find a word for plans that are plainly harmful yet advantageous to their executors that's more descriptive than "smart..."



Agree. I didn't want to moralize, just wanted to point out it's a shrewd business move. It's rather anticompetitive, though that is hard to prove in such a dynamic market. Who knows, we may soon be calling it 'antitrust'.


Shrewd or cunning


For that you need 2 words: venture capital


Tactical


Economists call this price dumping


Uberly?


I prefer Webvan-esque. From https://en.wikipedia.org/wiki/Webvan:

> The company's investors pressured it to grow very fast to obtain first-mover advantage. This rapid growth was cited as one of the reasons for the downfall of the company.

IMO, selling at a loss to gain market share only makes sense if there are network effects that lead to a winner-takes-all situation. Of which there are some for ChatGPT (training data when people press the thumbs up/down buttons), but is that sufficient?


Also useful for bootstrapping a dev ecosystem.

If engineers are getting into AI development through OpenAI, they're using tools and systems within the OpenAI ecosystem.

Daily on HN there's a post on some AI implementation faster than chatgpt. But my starting point is OpenAI. If you can capture the devs, especially at this stage, you get a force multiplier.


I prefer uber-esque


anti-competitive predatory pricing


capitalism


capitalistic, monopolistic


Not very effective considering that it will be remade in open source 1-2 years from now.


Yeah, if I was an owner or investor like Jasper.ai (AI written content generation SaaS) I'd be pretty worried right now.


OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).

Maybe I'm just old but back in my day this would be called "dumping" or "anti-competitive" or "market distortion" or "unfair competition". Now it's just the standard way of doing things.


Sure it would be called those things and then nothing would come of it. If a country uses morally compromised methods to win a war history just calls it winning the war.


That seems to be changing. I've seen an uptick in criticism against the usa for unnecessarily (according to top military advisors, experts, generals etc at the time) dropping the atom bomb on Japan for example.


Absolutely. The bombing of Dresden has been viewed as a mistake - verging on a war crime - in Britain for the last 20 or so years.


Verging on? It was a mass murder of civilians. The US holocausted Japan and got away with it.


And Japan did similar levels of atrocities to Korea, China, and others in the region.

We can acknowledge that things were historically pretty horrible and strive to be better in the future.


By some people - that's certainly not a universal view.


Wouldn't stop anyone from doing it again if the stakes were high enough


The winners write the history books


You kidding? What do you think a business loan is? Almost every business needs some form of subsidy to get off the ground.


Microsoft isn't using Nvidia A100s for inference are they? Seems like they'd use their Project Brainwave custom FPGAs.


> As the cost of GPUs goes down

Has that been happening? I guess there's been a bit of a dip after the crypto crash, but are prices staying significantly lower?

> or they develop at ASIC or more efficient model

This seems likely. Probably developing in partnership with Microsoft.


It's definitely not happening at the high end of the market (NVIDIA A100s with 40GB or 80GB of RAM).

The cards that were used for mining have since crashed in terms of prices, but those were always gamer cards and very rarely Datacenter cards.


The market segmentation is likely a result of Nvidia's monopoly position. They double the RAM and flops, improve the thermals and housing and sell for ten fold the price. It doesn't make sense to me. A cheap 4090 theoretically outperforms even the A6000 RTX Ada. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

Nvidia needs to satisfy gamers, who individually can't spend more than a few $k on a processor. But they also have the server sector on lockdown due to CUDA. Seems they can easily make money in both places. Maybe those H100s aren't such a good deal...

If someone understands these dynamics better I'd be curious to learn!


Nope, this is about it. They try to force the larger users into the expensive cards by prohibiting datacenter use in the driver EULA. This works sufficiently well in America, but it also means that you can find German companies like Hetzner that will happily rent you lots of consumer cards.

(There are also some density advantages to the SMX form factor and the datacenter cards are passively cooled so you can integrate them into your big fan server or whatnot. But those differences are relatively small and certainly not on their own worth the price difference. It's mostly market segmentation.)


The main limiter in the data center setting is licensing, interconnects, and ram.

By contract - you can’t sell 4090s in a data center. You’ll find a few shops skirting this, but nobody can get their hands on 100k 4090s without raising legal concerns.

Likewise, nvidia A100s have more than a few optimizations through nvlink which are only available on data center chips.

Lastly, per card memory matters a lot Nvidia has lead the market on the high end here.


I understood this as $/FLOP, I think it's plausible that that has been happening.


"We know the number of floating point operations per token for inference is approximately twice the number of parameters"

Does someone have a source for this?

(By the way, it is unknown how many parameters GPT-3.5 has, the foundation model which powers finetuned models like ChatGPT and text-davinci-003. GPT-3 had 175 billion parameters, but per the Hoffmann et al Chinchilla paper it wasn't trained compute efficiently, i.e. it had too many parameters relative to its amount of training data. It seems likely that GPT-3.5 was trained on more data with fewer parameters, similar to Chinchilla. GPT-3: 175B parameters, 300B tokens; Chinchilla: 70B parameters, 1.4T tokens.)


https://arxiv.org/pdf/2001.08361.pdf. See the C_forward formula approxiamtion.


Thank you. Though it isn't quite clear to me whether the additive part is negligible?


From the paper

> For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute.

For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.


From eq 2.2, additive part is usually in few 10s of millions. So, for N > 1B, approximation should be good but it doesn't work. For example, GPT3 inference flops is actually 3.4E+18 so the ratio is 19,000 not 2.


It's speculated that ChatGPT uses 8x A100s, which flips the conclusion. Although the ChatGPT optimizations done to reduce costs could have also reduced the number of GPUs needed to run it.


No, the amount of math done is (approximately) the same; if you make the denominator 8x bigger, you make the numerator 8x bigger too.


Would multiplying the GPUs by 8 decrease another part of the equation by 1/8, i.e. X flops on 1 GPU = Y seconds, X flops on 8 GPUs = Y / 8?

(Btw I keep running into you or your content the past couple months, thanks for all you do and your well thought out contributions -@jpohhhh)


I checked the price of a A100, and its costs 15k? Is that right?


And $2.25 per hour on 1 year reservation means 8,760 hours x 2.25 = $19,710 rent for the year. Not a bad yield for the provider at all, but makes sense given overheads and ROI expected.


Cost of power usage is marginal compared to that too:

300W per A100 * 8766 hours per year * $0.12 per kWh = $316 to power an A100 for a year


$0.12 per kWh is a very low price these days


Is this a low price for a datacenter negotiating their load with a utility provider (as most do?)


yes, specially that you don't have to deal with buying it, maintaining it, etc...


Not sure why people are so scared of this (in general). Yes, it’s a pain, but only an occasional pain.

I’ve had servers locked up in a cage for years without seeing them. And the cost for bandwidth has plummeted over the last two decades. (Not at AWS, lol)


The problem isn't the good times, the problem is when something happens in the middle of the night, when a RAM stick goes bad or when you suddenly need triple the compute power. Usually, you get to feel the pain when you need it the least.

I'm hosting a lot of stuff myself on my own hardware, so I do sympathize with this argument, but in a time>>money situation, going to the cloud makes a lot of sense.


exactly, you pay for the case where a down time on Sunday happens or you are in vacation out of the city and something happens.. I had this issue back in the days with my bitcoin miners.. Always when I was out of the city, one of them went down and I wanted to go back ASAP


Wait 8x total? For everyone at once?


Per instance (worker serving an API request) it requires 8x GPUs. I believe they have thousands of these instances and they scale them up with load.

Because the model isn't dynamic (it doesn't learn) it is stateless and can be scaled elastically.


Ah okay, that makes a lot more sense thank you!


I expect some level of caching and even request bucketing by similarity is possible.

How many users come with the same prompt?


In my experience running the same prompt always get's different results. Maybe they cache between different people but I'm not sure that'd be worth the cache space at that point? although 8x A100s is a lot to not have caching...


Each model needs 8x to run at the same time per request.


Does openai actually specify the size of the model?

InstructGPT 2B outperformed gpt 3 175B, and chatgpt has a huge corpus of distilled prompt -> response data now.

I’m assuming most of these requests are being served from a much smaller model to justify the price.

OpenAI is fundamentally about training larger models, I doubt they want to be in the business of selling A100 capacity at cost when it could be used for training


But those A100s only come by eight and it’s speculated the model requires eight (VRAM).

For a three year reservation that comes to over $96k/yr - to support one concurrent request.


What do you mean one concurrent request? Can't you have a huge batch size to basically support a huge number of concurrent requests?

e.g. Endpoint feeds a queue, queue fills a batch, batched results generate replies. You are simultaneously fulfilling many requests.


Hopefully they’re doing plenty of batching - you don’t even need to roll your own as you’re describing. Inference servers like Triton will dynamically batch requests with SLA params for max response time (for example).

That said I don’t think anyone anyone outside of OpenAI knows what’s going on operationally. Same goes for VRAM usage, potential batch sizes, etc. This is all wild speculation. Same goes for whatever terms OpenAI is getting out of MS/Azure.

What isn’t wild speculation is that even with three year reserve pricing last gen A100x8 (H100 is shipping) will set you back $100k/yr - plus all of the usual cloud bandwidth, etc fees that would likely increase that by at least 10-20%.

We’re talking about their pricing and costs here. This gives a general idea what anyone trying to self host this would be up against - even if they could get the model.


> will set you back $100k/yr

This is 6 month of salary of one average developer's salary there. And BTW they are likely doing inference on 100s or 1000s of GPUs, not just 8.


Yes and a devops engineer to manage an even moderately complex cloud deployment is an average of an extra $150k/yr. I don't know where this "cloud labor skill, knowledge, experience, and time is free" thinking comes from.

8, 80k, or 800k GPUs depending on requirements and load - the point remains the same.


I really wonder if one way they are able to make money on it is by monetizing all the data that pours into these products by the second.


The could probably live off of the NSA sponsoring alone.


Spot on


They also mention in the new API docs that they are no longer keeping data submitted to ChatGPT. Or at least not to the ChatGPT API.


Would probably pile up to an inhuman amount of data storage. Imagine having to pay for storing the equivalent of 1000 tokens of text within that budget of only 0.0002 dollars


That's one zero too many. Storage cost of 1000 tokens (6000 bytes) on a single HDD is $0.000000096 assuming $16/TB


the only one making money on this is NVIDIA


Selling shovels in the goldrush…


Note that they also charge equally for input and output tokens but, as far as I understand, processing inputs tokens is much computationally cheaper, which drops their price further.


Isn’t it 2.25 per hour per a100?


Yes, he means 2.25 per hour with a 1 yr reservation.


You can get A100 on Lambda Labs cloud for $1.1/hr ($8.8/hr per 8xA100) without any reservation.


Its a good baseline, but I very much doubt that openAI is paying anywhere near the public cost for their compute allocation.


Direct purchasing isn’t too much cheaper. An H100 costs 35k new. OpenAI and MS are probably getting those for around 16k about 1.82 per hour.


This would be a really fun optimization challenge for sure!


The 600t performance is with sparsity in the spec. I think the price is nearly break even if sparsity is not used in the model.


Reckon they will (if not already) use 4bit or 8bit precision and may not need 175b params




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: