I'm most interested in how well these tools can tackle complex legacy systems.
We have tonnes of code that's been built over a decade with all kinds of idioms and stylistic conventions that are enforced primarily through manual review. This relates in part to working in a regulated environment where we know certain types of things need radical transparency and auditability, so writing code the "normal" way a developer would is problematic.
So I am curious how well it can see the existing code style and then implicitly emulate that? My current testing of other tools seems to suggest they don't handle it very well; typically I am getting code that looks very foreign to the existing code. It exhibits the true "regression to the mean" spirit of LLMs where it's providing me with "how would the average competent engineer write this", which is not at all how we need the code written.
Currently, this is the main barrier to us using these tools in our codebase.
You need to provide agentic tools with enough context about the project so they can find their way around. In Claude Code this is typically done via a CLAUDE.md document at the root of the codebase.
I work on Chromium and my experience improved immensely by using a detailed context document (~3,000 words) with all sorts of relevant information, from the software architecture and folder organisation to the C++ coding style.
(The first draft of that document was created by Claude itself from the project documentation.)
I've had a lot of luck with Claude on my 8 year old, multi-language codebase. But I do have to babysit it and provide a lot of context.
I created some tutorial files which contain ways to do a lot of standard things. Turns out humans found these useful too. With the examples, I've found Opus generally does a good job following existing idioms, while Sonnet struggles.
Ultimately it depends on how many examples in that language showed up on stackoverflow or in public GitHub repos. Otherwise, ymmv if it's not python, c++, rust or JavaScript
We have tonnes of code that's been built over a decade with all kinds of idioms and stylistic conventions that are enforced primarily through manual review. This relates in part to working in a regulated environment where we know certain types of things need radical transparency and auditability, so writing code the "normal" way a developer would is problematic.
So I am curious how well it can see the existing code style and then implicitly emulate that? My current testing of other tools seems to suggest they don't handle it very well; typically I am getting code that looks very foreign to the existing code. It exhibits the true "regression to the mean" spirit of LLMs where it's providing me with "how would the average competent engineer write this", which is not at all how we need the code written.
Currently, this is the main barrier to us using these tools in our codebase.