I think we still need an LLM to enable the system as a whole to understand vague and half-baked human input.
I can easily ask an LLM to write be a function in a random programming language, then feed the output to a compiler, and pipe errors from the compiler back to the LLM.
What doesn't work so well is typing "pong in java" into a bash shell.
This isn't a perfect solution (not even for small projects), but it does demonstrate that automated validation can improve the output.
This is what ChatGPT's Code Interpreter does (writes code in Python and then runs it to check for errors). I'm not sure if it's enabled for everyone yet though.
Even something as simple as censoring swear words would be in line with what openAI are trying to accomplish but they keep lobotomizing the model instead.