Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Number of tokens is still a useful metric, as their endpoints have Tokens Per Minute quotas. Decreasing number of tokens used means increasing throughput, up until the Request Per Minute quota.



But the sentence was "Indeed was able to improve cost..."


From OpenAI’s perspective the cost improved!


"...and latency" Higher throughput = lower latency*

*Under certain conditions




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: