That sort of "REPL" system is why I really liked when they integrated a Python VM into ChatGPT - it wasn't perfect, but it could at least catch itself when the code didn't execute properly.
Sure. But it's 2025 and however you want to get this feature, be it as something integrated into VSCode (Cursor, Windsurf, Copilot), or a command line Python thing (aider), or a command line Node thing (OpenAI codex and Claude Code), with a specific frontier coding model or with an abstracted multi-model thingy, even as an Emacs library, it's available now.
I see people getting LLMs to generate code in isolation and like pasting it into a text editor and trying it, and then getting frustrated, and it's like, that's not how you're supposed to be doing it anymore. That's 2024 praxis.
The churn of staying on top of this means to me that we'll also chew through experts of specific times much faster. Gone are the day of established, trusted top performers, as every other week somebody creates a newer, better way of doing things. Everybody is going to drop off the hot tech at some point. Very exhausting.
It is a little crazy how fast this has changed in the past year. I got VSCode's agent mode to write, run, and read the output of unit tests the other day and boy it's a game changer.