Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These two trends seem somewhat contradictory:

> Generalist scaling is stalling. This was the whole message behind the release of GPT-4.5: capacities are growing linearly while compute costs are on a geometric curve. Even with all the efficiency gains in training and infrastructure of the past two years, OpenAI can't deploy this giant model with a remotely affordable pricing.

> Inference cost are in free fall. The recent optimizations from DeepSeek means that all the available GPUs could cover a demand of 10k tokens per day from a frontier model for… the entire earth population. There is nowhere this level of demand. The economics of selling tokens does not work anymore for model providers: they have to move higher up in the value chain.

Wouldn’t the market find a balance then, where the marginal utility of additional computation is aligned with customer value? That fix point could potentially be much higher than where are now in terms of compute.



>Wouldn’t the market find a balance then, where the marginal utility of additional computation is aligned with customer value? That fix point could potentially be much higher than where are now in terms of compute.

I think the author's point here is that the costs are going to continue to fall for inference at an astonishing rate. We're in a situation where the large frontier companies were all consolidated around "inference is computationally expensive", and then DeepSeek - the talented R&D arm of a hedge fund - was able to cut orders of magnitude out of that cost. To me, that hints that nobody was focusing on inference efficiency. It's unlikely that DeepSeek found 100% of the efficiency gains available, so we can expect the cost of inference to continue to be volatile for some time to come.

It's difficult for any market to find equilibrium when price points move around that much.


I don't think those statements are contradictory at all. Making the thing is getting more expensive, but using it is getting cheaper. Electric cars could be a good analogy here, compared to an ICE, the upfront cost is higher, but once you have it, it's cheaper to use.


That doesn’t make sense though if scaling is actually stalling. The reason so much compute goes into training now is scaling, which keeps base model lifetime short.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: