Think we've got a long time yet for that. We're going to be writing code a lot faster but getting these things to 90-95% on such a wide variety of tasks is going to be a monumental effort, the first 60-70% on anything is always much easier than the last 5-10%.
Also there's a matter of taste, as commented above, the best way to use these is going to be running multiple runs at once (that's going to be super expensive right now so we'll need inference improvements on today's SOTA models to make this something we can reasonably do on every task). Then somebody needs to pick which run made the best code, and even then you're going to want code review probably from a human if it's written by machine.
Trusting the machine and just vibe coding stuff is fine for small projects or maybe even smaller features, but for a codebase that's going to be around for a while I expect we're going to want a lot of human involvement in the architecture. AI can help us explore different paths faster, but humans need to be driving it still for quite some time - whether that's by encoding their taste into other models or by manually reviewing stuff, either way it's going to take maintenance work.
In the near-term, I expect engineering teams to start looking for how to leverage background agents more. New engineering flows need to be built around these and I am bearish on the current status quo of just outsource everything to the beefiest models and hope they can one-shot it. Reviewing a bunch of AI code is also terrible and we have to find a better way of doing that.
I expect since we're going to be stuck on figuring out background agents for a while that teams will start to get in the weeds and view these agents as critical infra that needs to be designed and maintained in-house. For most companies, foundation labs will just be an API call, not hosting the agents themselves. There's a lot that can be done with agents that hasn't been explored much at all yet, we're still super early here and that's going to be where a lot of new engineering infra work comes from in the next 3-5 years.
Also there's a matter of taste, as commented above, the best way to use these is going to be running multiple runs at once (that's going to be super expensive right now so we'll need inference improvements on today's SOTA models to make this something we can reasonably do on every task). Then somebody needs to pick which run made the best code, and even then you're going to want code review probably from a human if it's written by machine.
Trusting the machine and just vibe coding stuff is fine for small projects or maybe even smaller features, but for a codebase that's going to be around for a while I expect we're going to want a lot of human involvement in the architecture. AI can help us explore different paths faster, but humans need to be driving it still for quite some time - whether that's by encoding their taste into other models or by manually reviewing stuff, either way it's going to take maintenance work.
In the near-term, I expect engineering teams to start looking for how to leverage background agents more. New engineering flows need to be built around these and I am bearish on the current status quo of just outsource everything to the beefiest models and hope they can one-shot it. Reviewing a bunch of AI code is also terrible and we have to find a better way of doing that.
I expect since we're going to be stuck on figuring out background agents for a while that teams will start to get in the weeds and view these agents as critical infra that needs to be designed and maintained in-house. For most companies, foundation labs will just be an API call, not hosting the agents themselves. There's a lot that can be done with agents that hasn't been explored much at all yet, we're still super early here and that's going to be where a lot of new engineering infra work comes from in the next 3-5 years.