I agree, I often see Opus 4.1 and GPT5 (Thinking) make astoundingly stupid decisions with full confidence, even on trivial tasks requiring minimal context. Assuming they would make better decisions "if only they had more context" is a fallacy
Is there a good example you could provide of that? I just haven’t seen that personally so I’d be interested in any examples on these current models. I’m sure we all remember in the early days lots of examples of stupidity being posted and it was interesting. It be great if people kept doing that so we could get a better sense of which types of problems they are failing with astounding levels of stupidity on.
One example I ran into recently is asking Gemini CLI to do something that isn't possible: use multiple tokens in a Gemini CLI custom command (https://github.com/google-gemini/gemini-cli/blob/main/docs/c...). It pretended it was possible and came up with a nonsense .toml defining multiple arguments in a way it invented so it couldn't be read, even after multiple rounds of "that doesn't work, Gemini can't load this."
So in any situation where something can't actually be done my assumption is that it's just going to hallucinate a solution.
Has been good for busywork that I know how to do but want to save time on. When I'm directing it, it works well. When I'm asking it to direct me, it's gonna lead me off a cliff if I let it.
I've had every single LLM I tried (Opus, Sonnet, GPT-5-(codex) and Grok light) all tell me that Go embeds[0] support relative paths UPWARDS in the tree.
They all have a very specific misunderstanding. Go embeds _do_ support relative paths like:
//go:embed files/hello.txt
But they DO NOT support any paths with ".." in it
//go:embed ../files/hello.txt
is not correct.
All confidently claimed that .. is correct and will work and tried to make it work multipled different ways until I pointed each to the documentation.
I don’t really find that so surprising or particularly stupid. I was hoping to learn about serious issues with bad logic or reasoning not missing dots on i’s type stuff.
I can’t remember the example but there was another frequent hallucination that people were submitting bug reports that it wasn’t working, so the project looked at it and realized well actually that kinda would make sense and maybe our tool should work like that, and changed the code to work just like the LLM hallucination expected!
Also in general remember human developers hallucinate ALL THE TIME and then realize it or check documentation. So my point is I feel hallucinations are not particularly important or bother me as much as flawed reasoning.