I agree retrieval will need to be aligned with semantic layout of the codebase. But that should be pretty straight forward, given the number of static analysis and refactoring tools we already have available to us and that we use daily as part of our IDE workflows.
This also implies that the first codebases to really benefit from LLM collaboration will be those written in strongly typed languages which are already amenable to static analysis.
And in terms of context windows, it's not like humans keep the entire codebase in their head at all times either. As a developer, when I'm focused on a single task, I'm only ever switching between a handful of files. And that's by design; we build our codebases using abstractions that are understandable to humans, given our limited context window, with its well-known limit of about seven simultaneous registers. So if anything, perhaps the risk of introducing an LLM to a codebase is that it could create abstractions that are more complicated than what a human would prefer to read and maintain.
This also implies that the first codebases to really benefit from LLM collaboration will be those written in strongly typed languages which are already amenable to static analysis.
And in terms of context windows, it's not like humans keep the entire codebase in their head at all times either. As a developer, when I'm focused on a single task, I'm only ever switching between a handful of files. And that's by design; we build our codebases using abstractions that are understandable to humans, given our limited context window, with its well-known limit of about seven simultaneous registers. So if anything, perhaps the risk of introducing an LLM to a codebase is that it could create abstractions that are more complicated than what a human would prefer to read and maintain.