Before I started using advanced IDEs that could navigate project structures very quickly, it was normal to have a relatively poor visibility -- call it "fog of war/code". In a 500,000 line C++ project (I have seen a few in my career), as a junior dev, I might only understand a few thousand lines from a few files I have studied. And, I had very little idea of the overall architecture. I see LLMs here as a big opportunity. I assume that most huge software projects developed by non-tech companies look pretty similar -- organic, and poorly documented and tested.
I have a question: Many people have spoken about their experience of using LLMs to summarise long, complex PDFs. I am so ignorant on this matter. What is so different about reading a long PDF vs reading a large source base? Or can a modern LLM handle, say, 100 pages, but 10,000 pages is way too much? What happens to an LLM that tries to read 10,000 pages and summarise it? Is the summary rubbish?
Get the LLM to read and summarise N pages at a time, and store the outputs. Then, you concatenate those outputs into one "super summary" and use _that_ as context.
Theres some fidelity loss but it works for text, because there's quite often so much redundancy.
However, I'm not sure this technique could work on code.
You raise a good point. I had a former teammate who swore by Source Insight. To repeat myself, I wrote: <<Before I started using advanced IDEs that could navigate project structures very quickly>>. So, I was really talking about my life before I started using advanced IDEs. It was so hard to get a good grasp of a project and navigate quickly.