Thank you for the thoughtful response. I really appreciate the willingness to di...

Thank you for the thoughtful response. I really appreciate the willingness to discuss what you've seen in your experience. I think your observations are pretty much exactly correct in terms of where agents do best. I'd qualify in just a couple areas:

1. In my experience, Claude Code (I've used several other models and tools, but CC performs the best for me so that's my go-to) can do well with APIs and services that are proprietary as long as there's some sort of documentation for them it can get to (internal, Swagger, etc.), and you ensure that the model has that documentation prominently in context.

2. CC can also do well with brownfield development, but the scope _has_ to be constrained, either to a small standalone program or a defined slice of a larger application where you can draw real boundaries.

The best illustration I've seen of this is in a project that is going through final testing prior to release. The original "application" (I use the term loosely) was a C# DLL used to generate data-driven prescription monitoring program reporting.

It's not ultra-complicated but there's a two step process where you retrieve the report configuration data, then use that data to drive retrieval and assembly of the data elements needed for the final report. Formatting can differ based on state, on data available (reports with no data need special formatting), and on whether you're outputting in the context of transmission or for user review.

The original DLL was written in a very simplistic way, with no testing and no way to exercise the program without invoking it from its link points embedded in our main application. Fixing bugs and testing those fixes were both very painful as for production release we had to test all 50 states on a range of different data conditions, and do so by automating the parent application.

I used Claude Code to refactor this module, add DI and testing, and add a CLI that could easily exercise the logic in all different supported configurations. It took probably $50 worth of tokens (this was before I had a Max account, so it was full price) over the course of a few hours, most of which time I was in other meetings.

The final result did exhibit some classic LLM problems -- some of the tests were overspecific, it restructured without always fully cleaning up the existing functions, and it messed up a couple of paths through the business logic that I needed to debug and fix. But it easily saved me a couple days of wrestling with it myself, as I'm not super strong with C#. Our development teams are fully committed, and if I hadn't used CC for this it wouldn't have gotten done at all. Being able to run this on the side and get a 90% result I could then take to the finish line has real value for us, as the improved testing alone will see an immediate payback with future releases.

This isn't a huge application by any means, but it it's one example of where I've seen real value that is hitting production, and seems representative of a decently large category of line-of-business modules. I don't think there's any reason this wouldn't replicate on similarly-scoped products.