As far as I understand, coding ability of AIs is now driven mostly entirely by RL, as well synthetic data generated by inference time compute combined with code execution tool use.
Coding is arguably the single thing least affected by a shortage of training data.
We're still in the very early steps of this new cycle of AI coding advancements.
Yeah... There are improvements to be made by increasing the context window and having agents reference documentation more. Half the issues I see are with agents just doing their own thing instead of following established best practices they could/should be referencing in a codebase or looking up documentation.
Which, believe it or not, is the same issue I see in my own code.
Coding is arguably the single thing least affected by a shortage of training data.
We're still in the very early steps of this new cycle of AI coding advancements.