I see it a lot where it doesn't catch terminal output from it's own tests, and a...

I see it a lot where it doesn't catch terminal output from it's own tests, and assumes it was wrong when it passed, so it goes through a everal iterations of trying simpler approaches until it succeeds in reading the terminal output. Lots of wasted time and tokens.

(Using Claude sonnet with vscode where it consistently has issues reading output from terminal commands it executes)