If the old metric is right, that it is ten times harder to debug code than to write it, having something that writes buggy code 100x faster than you can understand it is a problem.
Especially given that you can ask an LLM to optimise code and on multiple runs it can not tell if it's is improving or degenerating.
Especially given that you can ask an LLM to optimise code and on multiple runs it can not tell if it's is improving or degenerating.