Claude has always been better at making pretty frontends, which is crucial for people that vibecode entire apps in a couple of prompts. And that people drive a lot of the hype.
Codex ever since ~5.2 has been better at long tasks in large codebases.
I also work with C++, and I use Codex (desktop) which writes 99.99% of my code, plus Visual Studio, which is nice for reading and navigating code. For webdev I do VSCode + Codex.
I started with Cursor back in the day, but switched to Claude Code and then Codex when Cursor got too expensive.
If price wasn't an issue, maybe I'd prefer Cursor only because I can easily switch between models. But that's it. I always disliked the "accept/reject" workflow in cursor, but that's probably optional nowadays I guess?
I love the accept reject flow because I still constantly have to stop AI models from writing awful architecture or reimplementing code we already wrote elsewhere
Yeah, I have found the same. A lot of times it does get things right, but if it deviates man it can just drift hard.
For example, sometimes Claude just obsessively reads files and goes on massive tangents. Then when I stop it and ask, "why are you doing that?", it kindly apologizes and admits it shouldn't have gone on a tangent.
The token burn if I don't stop it would be quite high.
Granted, this might be because I'm not giving it optimal prompt/negative-prompt instructions though.
How is it different from Keep / Discard in other tools? I've been slowly converting my git repositories to jj locally because that gives me more granular fallback and mix and match options.
Well I tried CLaude Code for the first time in a while (I am building my own coding app www.propelcode.app so I can code on my phone when I take my kids to classes and such) and it literally ignored my question and suggestion and just kept coding away.
Fable makes any IDE AI integration almost entirely unnecessary. Claude one shots pretty much everything, and fixing any small errors is easier when just talking to Claude again.
Anthropic is going to offer better pricing using their agentic harness. Why pay more for less?
An IDE at this point is best as a tool for code review. They need to start building better code review tools.
I can't quite understand the "fixing small errors is easier when just talking to Claude" flow.
I tried having it write some tests today. It got very close to what I want, but picked a stupid set of input values (two fields that look independent that should only be used with related values). I thought about "how do I explain this" and then just went in and fixed it myself.
How is it easier to write "Okay, go back to testBlah and change xxx to yyy" versus clicking on XXX in the IDE and typing YYY by hand? Maybe if you had 500 faulty tests and were forbidden from using search-and-replace for some reason.
It makes sense when code generation is the limiting factor, but I end up with a lot of changes where the actual code delta is smaller than the necessary prompt to convince the bot to produce it.
Try the superpowers plugin, let it write a spec (what do you want?) and a plan (how is it implemented). Then let it implement the plan.
Review each step as much as you care. These things take time so you can just do other stuff while it’s cooking.
With proper isolation of projects you can easily have multiple sessions in parallel. I frequently have 4 to 8 parallel Claude Code sessions, each with whole trees of agents reproducing, speccing, planning, implementing and reviewing things.
For common mistakes, you can make it remember things or rely on reviews.
RSI most likely does not exist. At least not in the sci fi sense that AI becomes super intelligent over night.
It will be like any other technology. Do computers make it faster and easier to design better computers? Yes. But that doesn't mean a step change overnight.
Models that you can run at home (Like Qwen 35B) aren't remotely close to Opus or GPT 5.5. Not even close. The only open models that are in that neighbor are around 1T params, so forget about running at home.
It's kind of like driving a shitbox. It can often drive you from A to B, and some people will try to convince you it's fine. It's not.
There's no logical reason other than absolutely requiring the privacy, doing it for fun, or niche use cases like airplanes and so on. If you can't spend the insanely subsidized $20 for codex, you can use an API for chinese models which will run circles around these tiny models.
You need tools sufficient to do the job in an economical way, optimizing for both cost and quality. That is what 'best' means. We don't give every engineer all the resources under the sun, only what is appropriate.
I suspect many will realize millions more dollars are being spent than needed to achieve the highest marginal productivity gains, and reallocate accordingly. Who wants more of their money going to developer tooling, rather than bonuses?
Of course. I have a $20/mo Codex subscription that has been serving me very well. Occasionally when I run out of quota, I switch to another one of my backup $20/mo subscription.
That's way more economical and produces far better result than any self hosted models today.
Number of parameters doesn't make the model smarter, it just makes it know more stuff out of the box.
At some point there's diminishing returns and your coding LLM performs worse because you encoded useless stuff like Pokemon combinations or languages you don't speak into its parameter space.
The "smartness" of the model comes from RLHF post-training, which is orthogonal to model size.
Also, if you're using an agentic harness a much better approach is to let the model control its own context. If you ever reach a point where your coding LLM needs to know about Pokemon, just give it a web search tool and let it google the Pokemons.
Nukes are much cheaper to build actually. And easier. Most countries that have nukes probably couldn't train a frontier model, and that's considering you can "just buy" GPUs. Imagine if they had to make semiconductors too.
Codex ever since ~5.2 has been better at long tasks in large codebases.
reply