Hacker Newsnew | past | comments | ask | show | jobs | submit | 1ucky's commentslogin

Why would they use their most expensive model when sonnet or opus can do the job as well?

In my experience sonnet<opus by a long shot for code review. Sonnet often flags things as errors that are not, because it fails to grasp the big picture… and also fails to grasp structural issues that are perfectly coded and only show up as problems at the meta scale.

I have no reason to believe that the next generation won’t offer similar gains in verification, and there is some evidence to support that the cybersecurity implications are the result of exactly this expansion of ability.


It depends on how you review. In an orchestrated per-task review workflow with clearly defined acceptance criteria and implementation requirements, using anything other than Sonnet (handed those criteria and requirements) hasn’t really led to much improvement, but it drives up usage and takes longer. I even tried Haiku, but, yeah, Haiku is just not viable for review, even tightly scoped, lol.

Siccing Sonnet on a codebase or PR without guidance does indeed lead to worse results than using Opus, though.


That makes sense, if your scope is tight enough, good enough is good enough. I’ve got the expected specifications and code style guides, including some aerospace engineering ones, but in complex systems I still run into difficult to sus out corner cases where the code works but the system breaks, usually due to unresolved conflicts in operational requirements.

There’s definitely a ceiling for what LLMs are capable of, and I think aerospace engineering might just currently be it, haha.

Lol yeah, I don’t think I’m ready to ride in the jet that Claude built lol. I should clarify that I use the code guidelines because they are solid guardrails for making things that perform predictably, not because I’m building MCAS lol. Let’s hope that “vibe aerospace engineering” is a way off for now.

Anthropic is also doing this for long context >= 200k Tokens on Sonnet 4.5


It hard resets limits every 5 hours instead of a sliding window?


That’s what their usage warning prompts seem to indicate.


Since last week it’s possible to use Claude Code in the VSCode terminal where it now automatically installs a plugin to display the diffs.


thanks! i never set this up properly. did it now though, really cool!


Prompt preprocessing is heavily compute-bound, so relying significantly on processing capabilities. Bandwidth mostly affects token generation speed.


You should check out out MCP by Anthropic, which solves some of the issues you mentioned.


Wait until you find out that the company behind it was doing a crypto scam before they worked on the new AI hype


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: