Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> As a result, Indeed was able to improve cost and latency by reducing the number of tokens in prompt by 80%.

There's some exact words shenanigans here. Indeed may have reduced the number of tokens in the prompt by 80%, but they didn't reduce the cost by 80%: the prompt cost of inferring from a fine-tuned GPT-3.5-turbo ($3.00 / 1M tokens) is 6x the prompt cost of inferring from the base GPT-3.5-turbo ($0.50 / 1M tokens). If prompt tokens are cut to 20% of normal, then that implies the overall cost of the prompt tokens is 120% relative to their normal prompt: a cost increase! That's not even getting into the 4x cost of the completion tokens for a finetuned model.

Of course, Indeed likely has an enterprise contract reducing costs further.




I was somewhat involved in this project. Can't get into details but there were other factors/efforts not mentioned which allowed us to scale this while reducing cost per recommendation. As someone mentioned, I do believe we benefited from a price drop over time.

Regarding the monthly scale mentioned in article–we are way beyond that now.

A lot of really smart people worked on this and it was fun to watch unfold.


Number of tokens is still a useful metric, as their endpoints have Tokens Per Minute quotas. Decreasing number of tokens used means increasing throughput, up until the Request Per Minute quota.


But the sentence was "Indeed was able to improve cost..."


From OpenAI’s perspective the cost improved!


"...and latency" Higher throughput = lower latency*

*Under certain conditions


It may simply be a timing thing. 3.5-turbo saw a price drop between the launch of fine tuning and now.


I'm surprised by their fine tuned models pricing.

Google & Cohere give same pricing for non-tuned models vs tuned models.

OpenAIs fine tuning offering is non competitive.

I hope they will adjust their pricing


The SK telecom one is highly suspect too. My guess is we are not going to see ChatGPT-5 until the AI bubble begins to deflate.

My question is will it be before or after US elections?


The "bubble" you're speaking of now is completely divorced from reality in both directions, as the doomer narrative spread across the blogosphere/news world with the recent inflection talent acquisition and the issues with stability.ai/emad.

The truth is that foundation model companies, while in the limelight now, are probably the smallest slice of the pie, with cloud providers and end user applications going to take the lion share of the profit, and GPU manufacturers in line next. Foundation model companies got hyped because every VC wanted to invest in the company that might monopolize AGI, but in reality we're going to have a landscape dominated by fine tuned open source models in a few years, with closed source models being used for certain niches but generally not worth the cost.


I remember reading somewhere GPT5 will be released in summer this year


The costs when using reserved instances are much better (comparable to non-fined-tuned models)


Indeed was going to use a fine tuned endpoint on their enterprise contract regardless..




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: