> it's not just about cost reduction, it's about solving some long-term structural deficiencies of industry.
You know, I hate that this is a world where I have to ask myself if this is LLM written because it is one of those patterns.
But that is besides the point of what I wanted to say anyway. Those deficiencies aren't going to be solved by LLMs I recon. In fact, they likely will make things worse. As you said, a lot of human devs didn't understand the context when they wrote code previously. True, but LLMs are even worse at context in many areas and still need human prompting for input.
The only thing I really see happening is that the blast radius of people not fully grasping the context and still producing something is going to be larger. More specifically, it is already larger. Previously incompetence limited the damage people could do, now that is less of a factor.
Except that a lot of software likely is already broken in fun ways we currently don't know about. That is what makes it such a "fun" challenge. Supply chain attacks are one thing, but CVEs in already released software allowing other attackers are another.
As always, I know most of us work in IT, but things rarely are actually binary.
With Noctua I highly doubt that is the case given their track record for quality overall and all other information available around their design and engineering process. As far as I know based on all the information I have seen all the design and engineering is done in Austria. They also have a track record of only releasing things once they are satisfied something performs within their standards. Something that would be next to impossible when solely relying on external fabs and process engineering.
They also utilize different manufactures afaik (historically Taiwan, but also China these days) meaning they need to have pretty solid in house knowledge and expertise to make sure different factories produces similar results. When they first started utilizing Chinese factories people noticed visual differences and were worried about that. But Noctua at the time claimed that they made sure that performance was still the same. A claim that was put to the test by various review outlets at the time (I want to say gamer nexus did a big piece about it?) and confirmed to be true.
Having said that, if you do utilize external factories you automatically are making use of their process engineering to some degree as well. But, and this is difficult for many people to understand, that isn't a binary thing either. You can entirely rely on the factory to basically do everything for you and just send feedback on iterations but you can also work closely with them and actually get involved in the process itself.
The other thing to consider is that while China is known for making cheap items for the American market (because that is what Americans want) they have become experts in the tool and die needed to make those cheap items.
If you want top class injection molding tooling / machines or process you are probably going to contract a Chinese company to do it.
Yep, if you're a big enough source of income for the factory you can basically do whatever you want, up to and including stationing your employees in their factory year round.
> If you’re using agents to program, what are you doing while they work?
If I am using agents I try to do something that is closely related to the task they are on. Otherwise I am just context switching once they are done and I want to review the work, which makes it difficult to focus on that task.
I also don't try to run too many agents at the same time as that is just madness. That's just herding cats at that point.
> As a side note, having Codex review Claude’s work (or vice-versa) throws up so many show-stopper issues (even with plan, revise, implement, review loops), I feel like you’d have to be nuts to just have a bunch of agents YOLOing it
Solely relying on agents is bad regardless. It certainly is the easy route and our brains are wired to take the easy/lazy approach. But even with how good models have gotten in the past year they still do make mistakes. In fact, they are now at a level where the hallucinations aren't obvious making it even more important to keep a close eye on the result.
If you do want to lean more heavily into agents doing most of the work, try to make sure they are following proper development practices. Something they don't actually do by default but using something like the superpowers skills makes a world of difference: https://github.com/obra/superpowers
Having them follow TDD helps a lot. I've even considered adding a QA agent/skill in here expanding things further to not just unit tests and some basic manual tests but also creating proper automated tests (following the test automation pyramid principles) to create an entire test suite.
Not to actually give more control to LLMs, but because it allows me to more easily spot where things go sideways. Since more tests, including e2e tests and UI tests where possible let me review more aspects of the work they do.
Having said that, I haven't created that skill yet. As reviewing the work of multiple agents is already exhausting as is. I am not in a position where I have to use AI or else in my company so instead I have dialed down my agentic usage by quite a lot to the point where I barely use agents anymore. To me using LLMs mostly as tools outside the process still is the sweet spot.
At the very least I'd add release cadence to it and the quality of releases. Mature, good software will have hotfixes and patch releases every now and then. But not in every release and certainly not 50% of the changes. In the same sense I will often look at the effort put in changelogs. If they took the effort of putting things in category, writing about possible breaking changes, etc it is a possible indicator of some level of quality. At the very least I will have a lot more faith in software with good changelogs compared to something that is just a list of the last N commit messages.
To be honest, these days I have more faith in an application or library with a moderate development pace where maybe the last commit wasn't 2 seconds ago co-authored by claude (in the most blatant examples).
The same is true for amount of commits, the type of commits, release cadence and the amount of fixes and hotfixes in releases. I don't feel like being a glorified alpha tester so I look for maturity in a project.
Which more often than not means that, yes there needs be activity. But, it is also fine if it was two days ago and there is a clear sign of the same pattern over a longer period. Combined with a stable release cycle, sane versioning and clear changelogs that aren't just a list of the last 10 commit messages.
On your point of stars, I think they used to be a valid metric in a similar category. Namely, community behind the software. But it has been a while since that has been true. It certainly hasn't been for a while, ever since I saw these star tracking graphs pop up on repos I knew that there was no sense in paying attention to them anymore.
There is truth in that. A lot of claude co-authored repos look frantic and unstable. It will still depend on the contributors managing things properly to maintain stability and not succumb to AI addiction and insanity.
> community behind the software
Right. You can't just look at stars. You have to look to see that there is an actual community, along with other contributors.
> That’s not to say that there is no microplastics pollution, the U-M researchers are quick to say.
>
> “We may be overestimating microplastics, but there should be none. There’s still a lot out there, and that’s the problem,”
> That’s not to say that there is no microplastics pollution, the U-M researchers are quick to say.
>
> “We may be overestimating microplastics, but there should be none. There’s still a lot out there, and that’s the problem,”
And with some actual numbers, when digging in further:
> They found that on average, the gloves imparted about 2,000 false positives per millimeter squared area.
> Clough prepared the substrates while wearing nitrile gloves, which is recommended by the guidance of literature in the microplastics field. But when she examined the substrates to estimate how many microplastics she captured, the results were many thousands of times greater than what she expected to find.
The reason this is important is that one flawed dataset reports a hopeless situation; the other at least provides a “if we stop now” message.
You know, I hate that this is a world where I have to ask myself if this is LLM written because it is one of those patterns.
But that is besides the point of what I wanted to say anyway. Those deficiencies aren't going to be solved by LLMs I recon. In fact, they likely will make things worse. As you said, a lot of human devs didn't understand the context when they wrote code previously. True, but LLMs are even worse at context in many areas and still need human prompting for input.
The only thing I really see happening is that the blast radius of people not fully grasping the context and still producing something is going to be larger. More specifically, it is already larger. Previously incompetence limited the damage people could do, now that is less of a factor.
reply