I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?
The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.
Am I missing something?
It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.
> but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices
There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.
> The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
At 4-bit quant, R1 takes 300+ gigs just for weights. You can certainly run smaller models into which R1 has been distilled on a modest laptop, but I don't see how you can run R1 itself on anything that wouldn't be considered extreme for a laptop in at least one dimension.
There are 7 providers on that page which have higher output token price than $3.08. There is even 1 which has higher input token price than that. So that "all" is not true either.
> I should be able to get a cheap / run my own 600B param model.
if the margins on hosted inference are 80%, then you need > 20% utilization of whatever you build for yourself for this to be less costly to you (on margin).
i self-host open weight models (please: deepseek et al aren't open _source_) on whatever $300 GPU i bought a few years ago, but if it outputs 2 tokens/sec then i'm waiting 10 minutes for most results. if i want results in 10s instead of 10m, i'll be paying $30000 instead. if i'm prompting it 100 times during the day, then it's idle 99% of the time.
coordinating a group buy for that $30000 GPU and sharing that across 100 people probably makes more sense than either arrangement in the previous paragraph. for now, that's a big component of what model providers, uh, provide.
I also have no idea on the numbers. But I do know that these same companies are pouring many billions of dollars into training models, paying very expensive staff, and building out infrastructure. These costs would need to be factored in to come up with the actual profit margins.
There's zero basis for assuming any of that. The most likely situation is a power law curve where the vast majority of users don't use it much at all and the top 10% of users account for 90% of the usage.
It is very likely that you are in the top 10% of users.
True. the article also has zero basis in its estimating the average usage from each tier's user base.
I somewhat doubt my usage is so close to the edge of the curve since I don't even pay for any plan. It could be that I'm very frugal with money and fat on consumption while most are more balanced, but 1M token per day in any case sounds slim for any user who pays for the service.
Another giant problem with this article is we have no idea the optimizations used on their end. There are some widly complex optimizations these large AI companies use.
What I'm trying to say is that hosting your own model is in an entierly different leauge than the pros.
If we account for error in article implies higher cost I would argue it would return back to profit directly because how advanced optimization of infer3nce has become.
If actual model intelligence is not a moat (looking likely this is true) the real sauce of profitable AI companies is advanced optimizations across the entire stack.
Openai is NEVER going to release their specialized kernels, routing algos, quanitizations or model comilation methods. These are all really hard and really specific.
> I'm here to provide helpful, respectful, and appropriate content for all users. If you have any other requests or need assistance with a different type of story or topic, feel free to ask!
I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?
The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?
Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.
Am I missing something?
It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.