This is exactly how dumb these SOTA models feel. A real AI would stop and tell me it doesn't know for sure how to continue and that it needs more information from me instead of wild guessing. Sonnet, Opus, Gemini, Codex, they all have this fundamental error that they are unable to stop in case of uncertainty. Therefore producing shit solutions to problems i never had but now have..
This is a feature, not a bug. In chatbot mode and in coding, the vast majority of consumers do not have the critical thinking skills necessary to realise the models are making stuff up, so the AI companies are incentivized to train accordingly. When the same models are used for agent mode the problem is just way more glaring, they don't respect (or fear) the terminal as much as they should, try to give the user some positive output and here we are
I don't see a reason to believe that this is a "fundamental error". I think it's just an artifact of the way they are trained, and if the training penalized them more for taking a bad path than for stopping for instructions, then the situation would be different.
It seems fundamental, because it’s isomorphic to the hallucination problem which is nowhere near solved. Basically, LLMs have no meta-cognition, no confidence in their output, and no sense that they’re on ”thin ice”. There’s no difference between hard facts, fiction, educated guesses and hallucinations.
Humans who are good at reasoning tend to ”feel” the amount of shaky assumptions they’ve made and then after some steps it becomes ridiculous because the certainty converges towards 0.
You could train them to stop early but that’s not the desired outcome. You want to stop only after making too many guesses, which is only possible if you know when you’re guessing.
Fine. I'll cancel all other ai subscriptions if finally an ai doesn't aim to please me but behaves like a real professional. If your ai doesn't assume that my personality is trump-like and needs constant flattery . If you respect your users on a level that don't outsource RLHF to the lowest bider but pay actual senior (!) professionals in the respective fields you're training the model for. No Provider does this - they all went down the path to please some kind of low-iq population. Yes, i'm looking at you sama and fellows.