I’ve used it for everything from “change this text on a webpage” to squashing co...

I’ve used it for everything from “change this text on a webpage” to squashing complex migrations in multiple apps in a Django monolith where migrations in one app depends on migrations in other apps.

My apologies if anyone finds this offensive, but I sorta see Devin as a fresh junior SWE hire. It doesn’t do well with tasks that require deep knowledge sometimes, but it has shallow or better knowledge of everything. I would describe it as working with a brand new SWE with an IQ of about 85 who is also on the low end of being high-functioning autistic. By that I mean that it takes most things literally and sometimes has difficulty with nuance.

> burns a ton of money, gets stuck, doesn’t implement the changes you want

The first time you use it, I think that’s pretty fair. Every time it gets stuck or does the wrong thing, when you correct it, it gives you the option to add to its “knowledge base”. That’s a bunch of additional context that it applies in only certain situations. Within a week or so of using it regularly, it’s significantly more valuable. It “learns” much faster than a human.

Example:

About a dozen of our projects all rely on a shared repository (“Enki”) that contains a Composefile, configs, and some light automation. Tests are run in Docker, and you have to navigate to the other repo’s directory to bring up the service. Some of those projects have service names in the Composefile that differ from the project name. I was able to run the steps interactively on “Devin’s machine”, tell Devin what I had done, and then tell it that this is the correct approach for any project that depends on that repository. I didn’t tell it what projects those are, or how to find out.

The next time I used Devin on a project like that, it tried to run the tests directly in a local Python environment. That didn’t work, but it tried the correct approach next. That worked, so it added a line to its knowledge base “Project <foo> uses Enki.” From that point forward it did the right thing the first time.

> For large codebases (greater than 15k or 20k LOC) the context size seems like a real problem right now.

The primary project I’m working on is a Django app. I don’t have it in front of me right now, but it’s about five years old, has been under very active development the entire time, and is comprised of about twenty apps. It’s not the largest codebase I’ve worked on, but it’s far from the smallest. I can do a line count tomorrow if you’d like.