Ya, I find this hard to imagine aging well. Gemini 2.5 solved (at least much better than) multiple real world systems questions I've had in the past that other models could not. Its visual reasoning also jumped significantly on charts (e.g. planning around train schedules)
Even Sonnet 3.7 was able to do refactoring work on my codebase sonnet 3.6 could not.
Even Sonnet 3.7 was able to do refactoring work on my codebase sonnet 3.6 could not.
Really not seeing the "LLMs not improving" story