So at a project that I was working on one of the compilation units got so unwieldy that the last part of the build process was basically waiting for it alone to compile (it contained a lot of template instantiations). The solution was of course to manually "shard" it into several files even if semantically it didn't make much sense. Now, you might argue that it was our shitty code that caused this but surely this is a fairly simple thing for a compiler to do automatically and not require the user do it.
Even better: throw differential dataflow machinery at all the template handling of C++ code, interprocedural constant peopagation (and similar steps), and of course the linking itself.
It's sad these foundations only seem to exist in rust.
I recently learned about c-reduce, which minimizes the size of reproducing C/C++ crash testcases by iteratively permuting the source code and invoking the compiler.
I can imagine a similar tool that takes a set of input file(s), carefully instruments the files somehow to determine the data interactions (a bit like the dataflow analysis mentioned in the sibling comment), and then iterates through different bucket-sorts (automatically invoking the compiler) until it finds some arrangement of locality-optimized input that also happens to compile the fastest.
On the one hand, this process would take hours - but on the other hand it can be lifted out of the compile/test cycle, and run eg overnight instead.
Optimizations might include tracing what you're editing right now and what that depends on, so active work can be relocated to the smallest discrete files possible. The system could just aim to minimize the size of all input files, but weighting what you're currently working on might produce additional speedups, I'm not sure.
In such a model, feeding in something like Boost would result in it eating all the templates that are never referenced.
To me, the biggest problem is that this entire infrastructure would need to understand very large parts of C/C++, and of course would also need to be faster than current infrastructure in order to actually speed anything up. I don't think there are any production-capable research analysis systems out there capable of doing this.
So, the likeliest path forward would be turning LLVM/GCC into something that can a) stay resident in memory (not fundamentally hard, just don't exit() :) ), b) be fed modified source code and accordingly traverse/update its analyses graph(s), and c) (most important) perform (b) efficiently (hah).
One major downside, apart from the total nonsemanticity of the actually-compiled output, would be the introduction of yet another hurdle to jump over to achieve reproducible builds.
I wonder if a design like this could be [part of] an intermediary first stage to getting something like incremental compilation into LLVM/GCC. Ie, it could be a (temporary) binary of its own that would allow for these features to be developed within a production-usable context that doesn't impact the behavior of the compiler itself; and then when it was properly built out, the compiler could be made to more and more progressively depend on it until either a) the compiler itself has the server-mode built in, or b) the changes are so dramatic the server-mode is not needed (unlikely).
I say all the above as a not-compiler person. I have no idea what I'm talking about.