> As a result, Indeed was able to improve cost and latency by reducing the number of tokens in prompt by 80%.
There's some exact words shenanigans here. Indeed may have reduced the number of tokens in the prompt by 80%, but they didn't reduce the cost by 80%: the prompt cost of inferring from a fine-tuned GPT-3.5-turbo ($3.00 / 1M tokens) is 6x the prompt cost of inferring from the base GPT-3.5-turbo ($0.50 / 1M tokens). If prompt tokens are cut to 20% of normal, then that implies the overall cost of the prompt tokens is 120% relative to their normal prompt: a cost increase! That's not even getting into the 4x cost of the completion tokens for a finetuned model.
Of course, Indeed likely has an enterprise contract reducing costs further.
I was somewhat involved in this project. Can't get into details but there were other factors/efforts not mentioned which allowed us to scale this while reducing cost per recommendation. As someone mentioned, I do believe we benefited from a price drop over time.
Regarding the monthly scale mentioned in article–we are way beyond that now.
A lot of really smart people worked on this and it was fun to watch unfold.
Number of tokens is still a useful metric, as their endpoints have Tokens Per Minute quotas. Decreasing number of tokens used means increasing throughput, up until the Request Per Minute quota.
The "bubble" you're speaking of now is completely divorced from reality in both directions, as the doomer narrative spread across the blogosphere/news world with the recent inflection talent acquisition and the issues with stability.ai/emad.
The truth is that foundation model companies, while in the limelight now, are probably the smallest slice of the pie, with cloud providers and end user applications going to take the lion share of the profit, and GPU manufacturers in line next. Foundation model companies got hyped because every VC wanted to invest in the company that might monopolize AGI, but in reality we're going to have a landscape dominated by fine tuned open source models in a few years, with closed source models being used for certain niches but generally not worth the cost.
There's some exact words shenanigans here. Indeed may have reduced the number of tokens in the prompt by 80%, but they didn't reduce the cost by 80%: the prompt cost of inferring from a fine-tuned GPT-3.5-turbo ($3.00 / 1M tokens) is 6x the prompt cost of inferring from the base GPT-3.5-turbo ($0.50 / 1M tokens). If prompt tokens are cut to 20% of normal, then that implies the overall cost of the prompt tokens is 120% relative to their normal prompt: a cost increase! That's not even getting into the 4x cost of the completion tokens for a finetuned model.
Of course, Indeed likely has an enterprise contract reducing costs further.