Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you for pointing that out - I had assumed that things were not how they are.

Although performance has varied over time https://arxiv.org/pdf/2307.09009.pdf I also notice that the API allows you to use a frozen version of the model which avoids the worries I mentioned.



That was a pretty deeply flawed paper, one of the largest drops recorded was simple parsing errors in their testing:

https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-tim...

Overall evals and pinning against checkpoints are how you avoid those worries, but in general, if you solve a problem robustly, it's going to be rare for changes in the LLM to suddenly break what you're doing. Investing in handling a wide range of inputs gracefully also pays off on handling changes to the underlying model.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: