Thank you for pointing that out - I had assumed that things were not how they are.
Although performance has varied over time https://arxiv.org/pdf/2307.09009.pdf I also notice that the API allows you to use a frozen version of the model which avoids the worries I mentioned.
Overall evals and pinning against checkpoints are how you avoid those worries, but in general, if you solve a problem robustly, it's going to be rare for changes in the LLM to suddenly break what you're doing. Investing in handling a wide range of inputs gracefully also pays off on handling changes to the underlying model.
Although performance has varied over time https://arxiv.org/pdf/2307.09009.pdf I also notice that the API allows you to use a frozen version of the model which avoids the worries I mentioned.