I agree with you at this point. Even though Google is performing well on benchmarks and releasing impressive models like World Models Genie 3, the Gemini CLI suggestions/changes feel overly formulaic. Almost like its priorities are that of an OCD coder that cares more about tabs vs spaces instead of building a useful feature. For example, in a recent project, Google CLI spent all of my token allotment for that day on trivial tasks like tweaking ESLint configs or modularizing code that didn't need modularization.
In contrast, Claude Code seems to interpret my prompts better and helps me ship real product features for users.
Maybe it’s a system prompt issue. Its likely my prompting causing the problem. But Claude Code seems to understand my intent better.
It's how these models/their-harnesses (e.g. the Claude Code js program) are being trained together in the RL stages.
I think the software is now a very important part of the training process. Which is why I think frontier labs are only capable of shipping "actual" agents.
Anthropic has figured something out here that others have not.
I’m being down voted. I don’t have an agenda. I’m simply sharing my experience. If you’re getting good results with Gemini CLI as an alternative to Claude Code, please let me know what you’re doing to get that performance.
I’m impressed by Gemini Pro 2.5’s NLP capabilities. I use that model in production on several projects. My comments are directed only at Gemini CLI. Which FWIW is better than OpenAI Codex CLI, but much worse (for me) than Claude Code.
Even with Pro, the strict token limits combined with the model's tendency to add unrequested modifications means I run out of tokens before completing my intended tasks. Others have the same issue
https://github.com/google-gemini/gemini-cli/issues/4300
Perhaps this is the modern version of "every company ships its own org chart"? Maybe Gemini's priorities are those of a Google engineer, Claude's are those of an engineer at Anthropic....
Thinking the same. I don't want Github approval process to sit in between me and the changes - the killer feature of claude code is being able to head it off as it starts to go down a bad path, and to code myself in between its steps.
Do you let juniors complete full features without asking questions or make them check in when they get flustered?
I do want to try out some background agents, but from my experience with Cursor’s (frontier model agents) frequency of going off the rails despite having rules and context to help avoid producing slop, I can’t see background agents being that generally useful yet.
o3 is a great oracle I use as well - in my dumb reddit/theater mode I mention that.
I'm building integrations for both Claude Code and AMP! AMP also provides really important features of a harness that others haven't quite caught up on. OpenCode, sort of, but that is driven in a bit of a cultish open source way.
That said, Gemini is very powerful for it's quality long-context capabilities: https://www.reddit.com/r/ClaudeAI/comments/1miweuv/comment/n...