The more interesting question IMO is not how good the code can get. It is what must change for the AI to attain the introspective ability needed to say "sorry, I can't think of any more ideas."
You should get decent results by asking it to do that in the prompt. Just add "if you are uncertain, answer I don't know" or "give the answer or say I don't know" or something along those lines
LLM are far from perfect at knowing their limits, but they are better at it than most people give them credit for. They just never do it unless prompted for it.
Fine tuning can improve that ability. For example the thinking tokens paper [1] is at some level training the model to output a special token when it doesn't reach a good answer (and then try again, thus "thinking")