> * I want to apply some repetitive change across a large codebase that's just too complicated for a clever regex, bam work you literally would have never bothered to do before done in 2 minutes.
You would naively think that, as did I, but I've tested it against several big name models and they are all eventually "lazy", sometimes make unrelated changes, and worse as the context fills up.
On a small toy example they will do it flawlessly, but as you scale up to more and more code that requires repetitive changes the errors compound.
Agentic loops help the situation, but now you aren't getting it done in 2 minutes because you have to review to find out it wasn't done and then tell it to do it again N times until it's done.
Having the LLM write a program to make the changes is much more reliable.
> Having the LLM write a program to make the changes is much more reliable.
I ended up doing this when switching our 50k-LOC codebase to pnpm workspaces, and it was such a good experience. It still took me a day or two of moulding that script to get it to handle the dozens of edge cases, but it would have taken me far longer to split things up by hand.
I still feel like I am under-using the ability of LLMs to spit out custom scripts to handle one-off use-cases.
there is more to it than that. it's about modularization as well.
I run LLMs against a 500k LoC poker engine and they do well because the engine is modularized into many small parts with a focus on good naming schemes and DRY.
If it doesn't require a lot of context for an LLM to figure out how to direct effort then the codebase size is irrelevant -- what becomes relevant in those scenarios is module size and the amount of modules implicated with any change or problem-solving. The LLM codebase 'navigation' becomes near-free with good naming and structure. If you code in a style that allows an LLM to navigate the codebase via just an `ls` output it can handle things deftly.
The LLMification of things has definitely made me embrace the concept of program-as-plugin-loader more-so than ever before.
The app I work on is fairly highly modular, to the point that we split the app in half and unwinding the two halves of the code only took about 2 weeks.
> The LLM codebase 'navigation' becomes near-free with good naming and structure
I have not found this to be true. They seem to break badly if you have a lot of files with similar-ish names even if they're descriptive.
Yeah was thinking about this recently. A semantic patch is more reliable, but prompting an ai might be easier. So why not prompt the ai to wrote the semantic patch.
You would naively think that, as did I, but I've tested it against several big name models and they are all eventually "lazy", sometimes make unrelated changes, and worse as the context fills up.
On a small toy example they will do it flawlessly, but as you scale up to more and more code that requires repetitive changes the errors compound.
Agentic loops help the situation, but now you aren't getting it done in 2 minutes because you have to review to find out it wasn't done and then tell it to do it again N times until it's done.
Having the LLM write a program to make the changes is much more reliable.