Modernizing compiler design for Carbon's toolchain [video]

sctb · on Aug 26, 2023

Something that stands out to me from this talk is the idea of linearizing ASTs—I was reminded of Aaron Hsu's work on Co dfns. There was an excellent YouTube video from years ago (does anyone have the link?) where he described how his compiler, written in APL, manages its data in an array-friendly linear fashion.

raphlinus · on Aug 26, 2023

I suspect it may be: https://www.youtube.com/watch?v=5I4YPkVU7mY

I'm interested in this because these linearized tree representations lend themselves well to processing on GPU. We use them in Vello for representing clip and blend groups, with quite deep nesting level allowed. By contrast, the traditional pointer chasing approach to ASTs limits parallelism and has poor utilization of memory bandwidth.

sctb · on Aug 26, 2023

Thanks for the link! I just found the one I was thinking of, which is older but perhaps goes into more detail: https://www.youtube.com/watch?v=gcUWTa16Jc0&list=PLDU0iEj6f8....

mhh__ · on Aug 26, 2023

Agreed but also this feels like a way to spending hundreds of engineer-years for a 7.3% speedup on the common case.

sudosysgen · on Aug 27, 2023

Even a 0.5% speedup is worth hundreds of engineer-years if it's in the common case for a popular language.

posnet · on Aug 26, 2023

Also similar to the design of the Luajit IR.

"Linear, pointer-free IR: The typed IR is SSA-based and highly orthogonal. An instruction takes up only 64 bits. It has up to two operands which are 16 bit references. It's implemented with a bidirectionally growable array. No trees, no pointers, no cry. Heavily optimized for minimal D-cache impact, too." [0]

[0] http://lua-users.org/lists/lua-l/2009-11/msg00089.html

fooker · on Aug 26, 2023

I am a compiler engineer and this is not as great as it seems like.

If you are replacing pointer chasing with arbitrary indexes into arrays, you are likely going to make it orders of magnitude slower for large arrays. (It's barely worth it even if the array fits in L1)

It's counterintuitive because pointers have been optimized on hardware for about 50 years, and array indices are just pointers that your hardware doesn't know about.

The main things that go wrong are hardware prefetching and compiler optimizations taking advantage of alias analysis.

mikemike · on Aug 27, 2023

This is conjecture. OTOH I measured while I designed the LuaJIT IR.

1. An array index is just as suitable as a pointer for dereferencing. 2. What matters is how many dereferences are needed and their locality. 3. Data structure density is important to get high cache utilization.

References show a lot of locality: 40% of all IR operands reference the previous node. 70% reference the previous 10 nodes. A linear IR is the best cache-optimized data structure for this.

That said, dereferencing of an operand happens less often than one might think. Most of the time, one really needs the operand index itself, e.g. for hashes or comparisons. Again, indexes have many advantages over pointers here.

What paid off the most was to use a fixed size IR instruction format (only 64 bit!) with 2 operands and 16 bit indexes. The restriction to 2 operands is actually beneficial, since it helps with commoning and makes you think about IR design. The 16 bit index range is not a limitation in practice (split IR chunks, if you need to). The high orthogonality of the IR avoids many iterations and unpredictable branches in the compiler itself.

The 16 bit indexes also enable the use of tagged references in the compiler code (not in the IR). The tag caches node properties: type, flags, constness. This avoids even more dereferences. LuaJIT uses this in the front pipeline for fast type checks and on-the-fly folding.

the-smug-one · on Aug 28, 2023

Did you happen to do any long form prose writing (paper, online article, etc) about the design of the LuaJIT IR? Sounds extremely fascinating.

the-smug-one · on Aug 26, 2023

I dunno, is that really true? The indices into the array can be 4 instead of 8 bytes on a 64-bit system for example, that would double the indices available in one cache line. And then you have the possibility of storing the in/out node references as indices also, halving the memory there. And I know that work lists and stacks of nodes are common for certain transformations, so that would also have an impact. And that's just one thing, having an index effectively means that you can split your data structure into multiple arrays ('struct of arrays' fashion), which might mean that you're doing fewer accesses and jumps over all.

Considering all of that, I'm really not convinced that plain pointers will beat indices. I also don't see what alias analysis has to do with it in this case, sounds like a compiler could easily do alias analysis on A being accessed only as a T by virtue of A having the type Array<T,N>.

mattsan · on Aug 27, 2023

Might be worth exploring linear ASTs with relative pointers which size could be truncated along with N-byte alignments

kevindamm · on Aug 27, 2023

profile it!

viraptor · on Aug 27, 2023

Are you saying a move from/to [register+offset] is a magnitude(s) slower than a move from/to [register]? And that prefetching doesn't work with them? Because that's what array indices effectively result in. And... that's not true for either case as far as I know.

fooker · on Aug 27, 2023

Arrays shine when you access the elements in a predictable order.

That's what compiler backend and hardware optimizes for.

Blindly replacing pointers with random indexes into large arrays doesn't pan out.

viraptor · on Aug 27, 2023

This is still really vague. Doesn't pan out how exactly? Arrays aren't a thing at hardware level and are only sometimes a thing in compiler backend. Prefetching the next line helps, but it's irrelevant how the type is represented in the code. (And you can explicitly prefetch from a random address as well) What downsides are you talking about here specifically? Where do the orders of magnitude come from?

vanderZwan · on Aug 27, 2023

Maybe the next paragraph in the link is a crucial bit of missing context here?

> Skip-list chains: The IR is threaded with segregated, per-opcode skip-list chains. The links are stored in a multi-purpose 16 bit field in the instruction. This facilitates low-overhead lookup for CSE, DSE and alias analysis. Back-linking enables short-cut searches (average overhead is less than 1 lookup). Incremental build-up is trivial. No hashes, no sets, no complex updates.

Seems like there is more going on than just replacing pointers with arrays 1:1.

titzer · on Aug 27, 2023

The goal of lowering to binary at 0.1 million lines/sec isn't very ambitious. Virgil bootstraps its 59,000 line compiler in 400 milliseconds, which is over that number, and it does whole program reachability and optimization.

On the other hand, the 10 million lines/sec parsing is probably not achievable. Thats 400MB/s. The fastest JavaScript parser I know of, V8's, is on the order of 60-80MB/s. Virgil's is about 45MB/s. Maybe if you are parsing Lisp you can get to 200MB/s, but no curly-braced language with actual syntax is going to parse that fast.

And you don't have to parse 10X faster than semantic analysis. In my experience, semantic analysis is only about 2X as expensive as parsing, so that 1 million line/sec goal for semantic analysis is easily achievable. (Virgil is at about 800KLoc/sec).

Keep in mind the above is all just one core. 10 and 20 core CPUs are becoming widespread, and these two phases parallelize nicely. Just don't do C's stupid O(n^2) header madness and Carbon will compile plenty fast.

meeho · on Aug 27, 2023

> Just don't do C's stupid O(n^2) header madness and Carbon will compile plenty fast.

Outdated is a better term IMO. This design made sense at that time (Weak hardware and small codebases)

dsab · on Aug 27, 2023

What do you mean by "header madness" ?

titzer · on Aug 27, 2023

Header files in C/C++ boil down to textual inclusion. They generally end up with such complicated directive logic that a compiler can't reasonably pre-compile or even pre-parse them and get the semantics right. Large projects tend to scale up by having more and more compilation units and headers. Eventually, textual inclusion leads to an O(n^2) explosion in the amount of work the compiler has to do, and generates object files that are O(n^2) big, most of which is then de-duped by the linker. For example, compiling V8 generates a couple gigabytes of .o files that get linked into ~30MB binary in the end. It all starts with header madness.

jeroenhd · on Aug 27, 2023

Google had experience in designing languages so they can be parsed quickly, that's one of Go's design goals. Javascript, on the other hand, is made fast out of necessity, not out of design. I wouldn't take a Javascript parser to be the be-all and end-all of parser performance. If Google has applied any of the lessons learned from Go, they could easily exceed V8, even with curly braces.

pjmlp · on Aug 27, 2023

Go's design traces back to Oberon and Limbo, both fast to compile languages.

It isn't Google's experience, rather how the rest of the world lost touch with fast compile cycles with the uptake of C and C++.

There is a talk by Rob Pike where he mentions how fast the Algol compiler (punch card based) would go through his applications back in the day.

_a_a_a_ · on Aug 27, 2023

> In my experience, semantic analysis is only about 2X as expensive as parsing

Can you elaborate? Semantic analysis might cover anything from simple type checking to full program data flow analysis, and I understand the latter can be very expensive.

GuestHNUser · on Aug 27, 2023

Good talk, however, I recommend people watch Andrew Kelley's talk[1] first. His talk has clearly had a major influence on Carbon's compiler design, plus I think Andrew's talk is a lot more practical for those not in the compiler space.

[1] https://vimeo.com/649009599

isaiahg · on Aug 26, 2023

I find carbon and the development of a modern replacement for C++ intriguing to watch. But for the life of me I can’t understand why someone would want this over a more mature language like D

sctb · on Aug 26, 2023

At least one major reason is "Interoperate with your existing C++ code, from inheritance to templates" (https://github.com/carbon-language/carbon-lang).

mhh__ · on Aug 26, 2023

D interacts with LLVM via using C++ inheritance and templates although not via their original source to be fair.

N.B. ChatGPT is not a toy in this domain - weird macro? It can reason.

pjmlp · on Aug 26, 2023

The problem with D is changing direction every couple of years regarding the next big feature that will finally bring the masses into the language, while not quite finishing the last big feature that predated it.

Even Andrei Alexandrescu is back doing C++ at NVidia.

suby · on Aug 26, 2023

I also don't know if the code I write in D today will compile 20 years from now. A few months ago there was a discussion on the D forums which spilled over here about constant churn due to breaking changes in the language.

mhh__ · on Aug 26, 2023

The guy who wrote that loves controversy.

summerlight · on Aug 27, 2023

I think Google is exploring other options as well, such as Rust. Carbon is a more aggressive experiment with higher risks. They've been very clear about Carbon's experimental status and that users should consider using Rust or something else instead of Carbon if they can use the one right now.

xvilka · on Aug 27, 2023

Or Rust.