> Please don't waste your breath saying "costs will come down." They haven't bee...

jjfoooo4 · 2025-09-05T18:30:47 1757097047

I think what we'll eventually see is frontier models getting priced dramatically more expensive (or rate limited), and more people getting pickier about what they send to frontier models vs cheaper, less powerful ones. This is already happening to some extent, with Opus being opt-in and much more restricted than Sonnet within Claude Code.

An unknown to me: are the less powerful models cheaper to serve, proportional to how much less capable they are than frontier models? One possible explanation for why e.g. OpenAI was eager to retire GPT 4 is that those older models are still money losers.

simonw · 2025-09-05T19:15:08 1757099708

Everything I've seen makes me suspect that models have continually got more efficient to serve.

The strongest evidence is that the models I can run on my own laptop got massively better over the last three years, despite me keeping the same M2 64GB machine without upgrading it.

Compare original LLaMA from 2023 to gpt-oss-20b from this year - same hardware, huge difference.

The next clue is the continuing drop in API prices - at least prior to the reasoning rush of the last few months.

One more clue: o3. OpenAI's o3 had a 80% price drop a few months ago which I believe was due to them finding further efficiencies in serving that model at the same quality.

My hunch is that there are still efficiencies to be wrung out here. I think we'll be able to tell if that's not holding if API prices stop falling over time.

jjfoooo4 · 2025-09-08T17:21:07 1757352067

Why do you think OpenAI wanted to get rid of GPT-4 etc so aggressively?

I suppose there's a distinction between new less capable models, I can see why those would be more efficient. But maybe the older frontier models are less efficient to serve?

simonw · 2025-09-08T23:42:03 1757374923

Definitely less efficient to serve. They used to charge $60/million input tokens for GPT-3 Da Vinci. They charge $1.25/million for GPT-5.

Plus I believe they have to keep each model in GPU memory to serve it, which means that any GPU serving an older model is unavailable to serve the newer ones.

realz · 2025-09-07T01:07:39 1757207259

If developers and enterprises can host their own OSS/fine-tuned models, why will they pay Anthropic or OpenAI?

simonw · 2025-09-07T06:03:30 1757225010

Because hosting a really GOOD model requires hardware that costs tens of thousands of dollars.

It's much cheaper to timeshare that hardware with other users than to buy and run it yourself.

realz · 2025-09-07T14:16:05 1757254565

That may be true for independent devs or startups. Larger companies have enough demand to justify a few A/H100s.

cwmma · 2025-09-05T17:45:46 1757094346

yes but due to reasoning models the same query uses VASTLY more tokens today then a couple years ago

simonw · 2025-09-05T17:54:51 1757094891

Sure, if you enable reasoning for your prompt. A lot of prompts don't need that.

orbital-decay · 2025-09-05T19:31:40 1757100700

In most use cases the main cost is always input, not output. Agentic workflows, on the other hand, do eat up a ton of tokens on multiple calls. Which can usually be optimized but nobody cares.

simonw · 2025-09-05T19:38:45 1757101125

With multiple calls an important factor to consider is token caching, where repeat inputs are discounted.

This is particularly important if you constantly replay the previous conversation and grow it with each subsequent prompt.

GPT-5 offers a 90% discount on these cached tokens! That's a huge saving for this kind of pattern.

Eddy_Viscosity2 · 2025-09-05T19:40:12 1757101212

Are these the costs (what the supplier pays) or the prices (what the consumer pays)?

simonw · 2025-09-05T20:25:31 1757103931

The price OpenAI charge users of their API.

Lariscus · 2025-09-05T22:18:55 1757110735

The price of a token doesn't necessarily reflect the true cost of running a model. After Claude Opus 4 released the price of OpenAIs o3 tokens where slashed practically over night.[0] If you think this happened because inference cost went down, I have a bridge to sell to you.

[0] https://venturebeat.com/ai/openai-announces-80-price-drop-fo...

simonw · 2025-09-06T04:50:37 1757134237

Sell me that bridge then, because I believe OpenAI's staff who say it was because inference costs went down: https://twitter.com/TheRealAdamG/status/193244032829380632

Generally I'm skeptical of the idea that any of the major providers are selling inference at a loss. Obviously they're losing money when you include the cost of research and training, but every indication I've seen is that they're not keen to sell $1 for 80 cents.

If you want a hint at the real costs of inference look to the companies that sell access to hosted open source models. They don't have any research costs to cover so their priority is to serve as inexpensively as possible while still turning a profit.

Or take a good open weight model and price out what it would cost to serve at scale. Here's someone who tried that recently: https://martinalderson.com/posts/are-openai-and-anthropic-re...

rcarmo · 2025-09-06T08:16:38 1757146598

s/Cost/Price/g