Personally, I'd just use one of my local MacBook models (e.g. Mixtral 8x7b) and forget about any wasted branches & cents. My debugging time costs many orders of magnitude more than SWE-agent, so even a 5% backlog savings would be spectacular!
> My debugging time costs many orders of magnitude more than SWE-agent
Unless your job is primarily to clean up somebody else's mess, your debugging time is a key part of a career-long feedback loop that improves your craft. Be careful not to shrug it off as something less. Many many people are spending a lot of money to let you forget it, and once you do, you'll be right there in the ranks of the cheaply replaceble.
(And on the odd chance that cleaning up other people's mess is your job, you should probably be the one doing it; and for largely the same reasons)
I totally agree. My solution to this was limiting my AI use to (a) whatever didn't impair creativity and (b) just in general to keep the brain sharp. If using AI regularly, one could just manually solve a percentage of the problems.
I’ve tried this with another similar system. FOSS LLMs including Mixtral are currently too weak to handle something like this. For me they run out of steam after only a few turns and start going in circles unproductively
That's assuming that the other 95% stays the same with this new agent (vs creating more work for you to now also have to parse what the model is saying).
Given that they got 12% with GPT-4, which is vastly better than any open model, I doubt this would be particularly productive. And powering compute at full load is going to add up.