More

redox99 · 2026-06-16T21:20:36 1781644836

Claude has always been better at making pretty frontends, which is crucial for people that vibecode entire apps in a couple of prompts. And that people drive a lot of the hype.

Codex ever since ~5.2 has been better at long tasks in large codebases.

redox99 · 2026-06-16T17:55:17 1781632517

Dumping github into a model is not post training, thats pre training. And every base model already has all of github.

Composer post training is clearly very good, only second to Anthropic and OpenAI.

It does irk me a bit that they try to hide the fact that it's based on a chinese pretrained model though.

redox99 · 2026-06-16T17:38:05 1781631485

I also work with C++, and I use Codex (desktop) which writes 99.99% of my code, plus Visual Studio, which is nice for reading and navigating code. For webdev I do VSCode + Codex.

I started with Cursor back in the day, but switched to Claude Code and then Codex when Cursor got too expensive.

If price wasn't an issue, maybe I'd prefer Cursor only because I can easily switch between models. But that's it. I always disliked the "accept/reject" workflow in cursor, but that's probably optional nowadays I guess?

digitaltrees · 2026-06-16T18:17:04 1781633824

I love the accept reject flow because I still constantly have to stop AI models from writing awful architecture or reimplementing code we already wrote elsewhere

flyingoat · 2026-06-16T19:48:03 1781639283

Yeah, I have found the same. A lot of times it does get things right, but if it deviates man it can just drift hard.

For example, sometimes Claude just obsessively reads files and goes on massive tangents. Then when I stop it and ask, "why are you doing that?", it kindly apologizes and admits it shouldn't have gone on a tangent.

The token burn if I don't stop it would be quite high.

Granted, this might be because I'm not giving it optimal prompt/negative-prompt instructions though.

chamomeal · 2026-06-16T22:11:09 1781647869

I just check the git diff after claude code writes stuff. Stage things before letting it run wild so I can undo whatevs.

tclancy · 2026-06-16T23:53:28 1781654008

How is it different from Keep / Discard in other tools? I've been slowly converting my git repositories to jj locally because that gives me more granular fallback and mix and match options.

digitaltrees · 2026-06-17T01:37:43 1781660263

Well I tried CLaude Code for the first time in a while (I am building my own coding app www.propelcode.app so I can code on my phone when I take my kids to classes and such) and it literally ignored my question and suggestion and just kept coding away.

echelon · 2026-06-16T18:21:39 1781634099

Fable makes any IDE AI integration almost entirely unnecessary. Claude one shots pretty much everything, and fixing any small errors is easier when just talking to Claude again.

Anthropic is going to offer better pricing using their agentic harness. Why pay more for less?

An IDE at this point is best as a tool for code review. They need to start building better code review tools.

hakfoo · 2026-06-17T02:42:18 1781664138

I can't quite understand the "fixing small errors is easier when just talking to Claude" flow.

I tried having it write some tests today. It got very close to what I want, but picked a stupid set of input values (two fields that look independent that should only be used with related values). I thought about "how do I explain this" and then just went in and fixed it myself.

How is it easier to write "Okay, go back to testBlah and change xxx to yyy" versus clicking on XXX in the IDE and typing YYY by hand? Maybe if you had 500 faulty tests and were forbidden from using search-and-replace for some reason.

It makes sense when code generation is the limiting factor, but I end up with a lot of changes where the actual code delta is smaller than the necessary prompt to convince the bot to produce it.

tobyhinloopen · 2026-06-17T05:54:40 1781675680

Try the superpowers plugin, let it write a spec (what do you want?) and a plan (how is it implemented). Then let it implement the plan.

Review each step as much as you care. These things take time so you can just do other stuff while it’s cooking.

With proper isolation of projects you can easily have multiple sessions in parallel. I frequently have 4 to 8 parallel Claude Code sessions, each with whole trees of agents reproducing, speccing, planning, implementing and reviewing things.

For common mistakes, you can make it remember things or rely on reviews.

slopinthebag · 2026-06-17T00:49:09 1781657349

Some of us are working on things that Claude can't one shot. Like, not even close.

Also https://xcancel.com/mitchellh/status/2066657032938442833#m

I really don't see IDE's going out of fashion anytime soon.

redox99 · 2026-06-16T12:34:03 1781613243

>"fix this code"

>it fixes it

oh my god.

itopaloglu83 · 2026-06-17T01:53:17 1781661197

> oh my god.

Sounds like fake movie prop, doesn’t it. Makes me think that the ban was caused by other reasons.

redox99 · 2026-06-16T08:04:06 1781597046

RSI most likely does not exist. At least not in the sci fi sense that AI becomes super intelligent over night.

It will be like any other technology. Do computers make it faster and easier to design better computers? Yes. But that doesn't mean a step change overnight.

redox99 · 2026-06-16T07:54:57 1781596497

This is called denial.

redox99 · 2026-06-15T18:37:47 1781548667

Models that you can run at home (Like Qwen 35B) aren't remotely close to Opus or GPT 5.5. Not even close. The only open models that are in that neighbor are around 1T params, so forget about running at home.

It's kind of like driving a shitbox. It can often drive you from A to B, and some people will try to convince you it's fine. It's not.

There's no logical reason other than absolutely requiring the privacy, doing it for fun, or niche use cases like airplanes and so on. If you can't spend the insanely subsidized $20 for codex, you can use an API for chinese models which will run circles around these tiny models.

pbasista · 2026-06-15T18:56:18 1781549778

> Models that you can run at home (Like Qwen 35B) aren't remotely close to Opus or GPT 5.5.

Is that characterization based on some objective facts or benchmarks?

kube-system · 2026-06-15T19:03:06 1781550186

Yes, there aren't any 35B models that are beating frontier models at just about anything generalized

redox99 · 2026-06-15T18:59:53 1781549993

Based on private test prompts I've run through OpenRouter.

xgulfie · 2026-06-15T20:33:14 1781555594

I don't need a Ferrari to get to work

orangeisthe · 2026-06-15T20:44:51 1781556291

But you need the best tools to do the job

cayley_graph · 2026-06-16T00:30:09 1781569809

You need tools sufficient to do the job in an economical way, optimizing for both cost and quality. That is what 'best' means. We don't give every engineer all the resources under the sun, only what is appropriate.

I suspect many will realize millions more dollars are being spent than needed to achieve the highest marginal productivity gains, and reallocate accordingly. Who wants more of their money going to developer tooling, rather than bonuses?

orangeisthe · 2026-06-16T05:28:24 1781587704

Of course. I have a $20/mo Codex subscription that has been serving me very well. Occasionally when I run out of quota, I switch to another one of my backup $20/mo subscription.

That's way more economical and produces far better result than any self hosted models today.

redox99 · 2026-06-14T08:27:02 1781425622

Qwen 35B isn't even remotely close to the big models. It's just people over hyping small models. Ignore the benchmarks they are almost meaningless.

If you want something comparable you need the trillion parameter open models like deepseek.

otabdeveloper4 · 2026-06-14T13:07:28 1781442448

Number of parameters doesn't make the model smarter, it just makes it know more stuff out of the box.

At some point there's diminishing returns and your coding LLM performs worse because you encoded useless stuff like Pokemon combinations or languages you don't speak into its parameter space.

The "smartness" of the model comes from RLHF post-training, which is orthogonal to model size.

Also, if you're using an agentic harness a much better approach is to let the model control its own context. If you ever reach a point where your coding LLM needs to know about Pokemon, just give it a web search tool and let it google the Pokemons.

redox99 · 2026-06-14T13:39:45 1781444385

That's just... not true. Just compare any open model which is trained with the same recipe but multiple sizes.

oneshtein · 2026-06-15T08:29:20 1781512160

You can compare models at OpenRouter site. Qwen 3.6 dense is in top 24% for coding.

otabdeveloper4 · 2026-06-15T10:33:06 1781519586

> Just compare any open model which is trained with the same recipe but multiple sizes.

That's exactly what I did.

redox99 · 2026-06-14T08:18:40 1781425120

Probably like 1% of the energy an average person spends on driving.

Raphael_Amiard · 2026-06-14T08:25:33 1781425533

Average american is what you mean

redox99 · 2026-06-14T07:48:05 1781423285

Nukes are much cheaper to build actually. And easier. Most countries that have nukes probably couldn't train a frontier model, and that's considering you can "just buy" GPUs. Imagine if they had to make semiconductors too.