I just don't understand how the decision of which bits of a project need rebuilding can be so complex.
If I edit 50 lines of code in a 10GB project, then rebuild, the parts that need rebuilding are the parts that read those files when they were last built.
So... The decision of what to rebuild should take perhaps a millisecond and certainly doable locally.
50 lines in a single .cc source file which is only compiled once to produce the final artifact - sure, easy to handle.
Now consider that you are editing 50 lines of source for a tool which will then need to be executed on some particular platform to generate some parts of your project.
Now consider that you are editing 50 lines defining the structure and dependency graph of your project itself.
• Adding a file. It hasn't been read before, so no tasks in your graph know about it. If you can intercept and record what file patterns a build tool is looking for it helps, but you can't easily know that because programs often do matching against directory contents themselves, not in a way you can intercept.
• File changes that yield no-op changes, e.g. editing a comment in a core utility shouldn't recompile the entire project. More subtly, editing method bodies in a Java program doesn't require the users to be recompiled, but editing class definitions or exposed method prototypes does.
• "Building" test cases.
• You don't want to repeat work that has been done before, so you want to cache it (e.g. switching branches back and forth shouldn't rebuild everything).
If your system is C based, tup [0] fulfills your request by watching the outputs of the specified commands. It isn't, however, appropriate for systems like java that create intermediate files that the developer didn't write [1].
Back to bazel, I am of the impression that some of its complexity comes from a requirement to handle heterogeneous build systems. For example, some python development requires resolving both python dependencies and C. Being good at either is a bunch of work; but, handling both means rolling your own polyglot system or coordinating both as a second class citizen.
It is well possible by changing 50 lines of code in a 10GB project you have to rebuild the entire project, if everything (indirectly) depends on what you just changed.
It is not at all uncommon to have changes percolate out into larger impacts than you'd expect, though. Especially in projects that attempt whole program optimizations as part of the build.
Consider anything that basically builds a program that is used at build time. Which is not that uncommon when you consider that ML models have grown significantly. Change that tool, and suddenly you have to rebuild the entire project if you didn't split it out into a separate graph. (I say ML, but really any simple linter/whatever is the same here.)
> the parts that need rebuilding are the parts that read those files when they were last built...
...and the transitive closure of those parts, which is where things get complicated. It may be that the output didn't actually change, so you can prune the graph there with some smarts. It may be that the thing changed was a tool used in many other rules.
And you have to know the complete set of outputs and inputs of every action.
If I edit 50 lines of code in a 10GB project, then rebuild, the parts that need rebuilding are the parts that read those files when they were last built.
So... The decision of what to rebuild should take perhaps a millisecond and certainly doable locally.