Improving C++ Builds with Split DWARF

mewse · on Oct 11, 2018

In my moderate-sized real-world project (about 700 C++ source files, about 130k lines of code), adding -gsplit-dwarf reduced the time of a single-file recompile+relink from 7.2 seconds to 5.8 seconds, which isn't bad at all for a single extra compiler flag that you never have to think about again (assuming that it works with all your other debugging tools).

But much better than that was switching to using `gold` instead of the regular `ld` (mentioned in the "remarks" section near the bottom of the article); doing this brought my time down substantially further, from 5.8 seconds down to 1.6 seconds. Even without using -gsplit-dwarf, it brought the recompile+relink time down from the original 7.2 seconds to 2.6 seconds.

I guess my codebase might be hitting a particularly bad case for `ld`. I'm kind of startled to see such a large difference for what seemed like a throwaway extra suggestion at the end of the article!

glandium · on Oct 11, 2018

Try lld, it's even faster than gold.

Also note that not so long ago, ccache didn't support split dwarf. If you're using ccache, check that your version supports it.

mewse · on Oct 11, 2018

With lld, that single-file rebuild+relink drops from 1.6 seconds (with gold) to 1.2 seconds.

lld spits out a bunch of warnings that neither ld nor gold cared about, about finding local symbols in the global symbol table of some of the third-party .so libraries I link against. Not sure to what extent I should care about that; the final build product does run the same as always.

And just so that nobody else needs to go searching; looks like ccache added support for split dwarf debugging data in its version 3.2.3, about three years ago. (although I haven't actually tried it to confirm, yet)

gpderetta · on Oct 11, 2018

gold is multi-threaded.

claudius · on Oct 11, 2018

Relatedly, I found it useful to pass -g1 and not just -g to GCC. The resulting debug information is a fair bit smaller and GCC itself also takes less RAM and less CPU time to compile template-heavy code (going from 10s of GB to 1-2 GB on one particularly nasty object file).

saagarjha · on Oct 11, 2018

Have you noticed any noticeable decrease in the quality of debug information available with -g1?

glandium · on Oct 11, 2018

-g1 omits information about local variables and line numbers. It's fine if you want stack traces, not so good when you actually want to debug.

LHxB · on Oct 11, 2018

13 minutes -> 11 minutes: "Elapsed time goes down by 15%."

3:16 minutes -> 1:42 minutes: "We get a roughly 90% speedup in elapsed time."

What's the formula for the "speedup"? I think it would be more expressive if one would compare it in the same terms.

tom_ · on Oct 11, 2018

Or just print the numbers, and let the reader get whatever they want out of that...

On this subject: https://randomascii.wordpress.com/2018/02/04/what-we-talk-ab...

EdSchouten · on Oct 11, 2018

I think the math was done inversely.

1:42 -> 3:16: 90% slowdown?

leni536 · on Oct 11, 2018

We could use a logarithmic scale instead of percentages. If we defined a decibel scale for time then this would be a 2.83 dB change both ways (if we defined dB-sec as 10*log10(time/(1 sec)) ).

cryptonector · on Oct 11, 2018

It might be useful to have things like version-script mapfiles as defining interfaces separate from the actual libraries (ELF objects). Then for shared linking the build system (make, whatever) could notice that though an object changed, its interfaces haven't, and then not re-link dependents.

roel_v · on Oct 11, 2018

Is this a new thing? Isn't this like the pdb files that msvc had since the late 1990's?

hohenheim · on Oct 11, 2018

Not new, for one thing I did not use it. Thinking about all the time lost waiting for linker to finish makes me sad, but I'll change my work flow now!

jcelerier · on Oct 11, 2018

it's been officially in gcc since 2012

grandinj · on Oct 11, 2018

Just be aware that various downstream dev tools like profilers don't support this very well yet