The llm is forced to eat its own output. If the output is garbage, its inputs will be garbage in future passes. How code is structured makes the llm implement new features in different ways.
Why would “messy” code be garbage? Also LLMs do a great job even today at assessing what code is trying to do and/or asking you for more context. I think the article is well balanced though: it’s probably worth it for the next few months to try to help the agent out a bit with code quality and high level guidance on coding practices. But as OP says this is clearly temporary.
The definitions of what is messy or clean will change will llms…
But there will always be a spectrum of structures that are better for the llm to code with, and coding with less optimal patterns will have negative feedback effects as the loop goes on.
I agree with you but you can dedicate tokens to fixing the bad code that agents do today. I don’t disagree with anything you’re saying. I think the practical implication is instead of pain and jira we’ll just have dedicated audit and refactor token budgets.
I'm dealing with a situation right now where a critical mass of "messy" code means that nobody, human or LLM, can understand what it is trying to do or how a straightforward user-specified update should be applied to the underlying domain objects. Multiple proposed semantics have failed so far.
On the plus side.. AI is pretty good at creating (often excessive) tests around a given codebase in order to (re)implement the utility using different backends or structures. The one thing to look out for is that the agent does NOT try to change a failing test, where the test is valid, but the code isn't.
Interesting that in English we had special pronoun for plurals of exactly 2, but in Russian for instance they have special case declensions for plurals less than 5.
Is that significant? I have no idea. Is there a language with special case for exactly 2 with another case for a “few” and with yet another for “a lot”? Interesting to compare different cultures.
Russian used to have dual pronouns too, but they all were lost somewhere in the 13th century, as in all other Slavic languages other than Slovenian.
The system used for small numbers is probably a broad extension of an earlier dual number for nouns, i.e. something like a plural but just for two things. For (some) male nouns, the nominative dual ending was the same as the genitive singular, which was then extended to all other nouns even when this correspondence didn't hold, and from just 2 things to 3 and 4 as well. Nowadays the dual has been completely forgotten for nouns, and the only interpretation of the rule is that it's a genitive singular.
It’s not just 5, it’s also 21 to 25, 31 to 35 etc. However, some Slavic languages (e.g. Slovak and Czech) don’t do that, and only have those special numerals for under 5.
I once knew a guy who was disabled and walked on crutches. Jobs got mad at him for being late to a meeting, and the guy replied "well someone parked in the handicapped parking spot, and it took me awhile to walk from a normal parking spot.
No joke, Jobs looks him (a disabled person) directly in the eye, and says "oh, that was me; I think the country built an excess of disabled parking spaces after WW2." To the disabled guy!!!
Yeah it’s so frustrating to have to constantly ask for the best solution, not the easiest / quickest / less disruptive.
I have in Claude md that it’s a greenfield project, only present complete holistic solutions not fast patches, etc. but still I have to watch its output.
With respect to the market, every single sandbox sucks. I'm not gonna shit talk competitors but there is not a good sandboxing platform out there yet — including me — compared to where we'll be in 6 months.
We've heard all the platforms have consistent uptime, feature completeness, networking and debugging issues. And in our own platform we're not 1/10ths of the way through solving the requests we've gotten.
Next generation of Agents needs computers, and those computers are gonna look really different than "sandboxes" do today.
I don't think you're wrong, but if you really want to really re-think the approach, building an orchestration layer for Firecracker like every other company in the space is doing is probably not it.
They can’t write maintainable code because they don’t have real world experience of getting your hands dirty in a company. The only way to get startup experience is to build a startup or work for one
What. Are you saying maintainable code is specifically related to startups? I can accept companies as an answer (although there are other places to cut your teeth), but startups is a weird carveout.
Writing maintainable code is learned by writing large codebases. Working in an existing codebase doesn't teach you it, so most people working at large companies do not build the skill since they don't build many large new projects. Some do but most don't. But at startups you basically have to build a big new codebase.
Duh, the only way to get startup experience is indeed to get startup experience.
My point is that getting into the weeds of writing CRUD software is not the only way to gain the ability to write complex algorithms, or to debug complex issues, or do performance optimization. It's only common because the stuff you make on the journey used to be economically valuable
> write complex algorithms, or to debug complex issues, or do performance optimization
That’s the stuff that ai is eating. The stuff I’m talking about (scaling orgs, maintaining a project long term, deciding what features to build or not build etc) is stuff very hard for ai
AI is only eating some of that though. For instance, everyone who does performance work knows that perhaps the most important part of optimization is constructing the right benchmark. This is already the thing that makes intractable problems tractable. That effect is now exacerbated — AI can optimize anything given a benchmark —- but AI isn’t making great progress at constructing the benchmark itself.
what about open-source projects?
Much as how aspiring authors can learn to write fiction from reading the fiction of others and then imitating that, getting feedback on their work, and iterating, it seems like aspiring programmers could learn by reading/contributing to the open-source projects of others and then writing their own.
Example- Linus Torvalds, never worked for a company, made the original Linux while a grad student, and seems to be doing fine (I'm writing this message on a ThinkPad running Linux Mint).
Or Bill Joy with BSD at Berkley, before his time at Sun.
Or heck, why not go all the way back to Ken Thompson and Dennis Ritchie building Unix and C?
Sounds like maybe you might have some mixed feelings about becoming more effective with ai, but then at the same time everyone else is too so the praise youre expecting is diluted.
I see it all the time now too. People have no frame of reference at all about what is hard or easy so engineers feel under-appreciated because the guy who never coded is getting lots of praise for doing something basic while experienced people are able to spit out incredibly complex things. But to an outsider, both look like they took the same work.
I'm sure that HN's preferred app would be <5MB, and has zero third party SDKs or telemetry, but half a dozen SDKs and third party domains is basically most mass market apps these days. Is it bad? Yes, but the whitehouse isn't being egregiously bad, but "whitehouse app is bad, just like most other apps" isn't going to get clicks.
If only. It would be a far better state of of affairs if the US government sucks like every other first world country. No other first country are waging war in the middle east, having paramilitary forces terrorize residents, or are undergoing a partial government shutdown.
For all our faults I am geniunely impressed by gov.uk. its not pretty, its not particularly fast, and its certainly not flashly, but I've never once not been able to find what I needed or have a flow not work.
The llm is forced to eat its own output. If the output is garbage, its inputs will be garbage in future passes. How code is structured makes the llm implement new features in different ways.
reply