Something that stands out to me from this talk is the idea of linearizing ASTs—I was reminded of Aaron Hsu's work on Co dfns. There was an excellent YouTube video from years ago (does anyone have the link?) where he described how his compiler, written in APL, manages its data in an array-friendly linear fashion.
I'm interested in this because these linearized tree representations lend themselves well to processing on GPU. We use them in Vello for representing clip and blend groups, with quite deep nesting level allowed. By contrast, the traditional pointer chasing approach to ASTs limits parallelism and has poor utilization of memory bandwidth.
"Linear, pointer-free IR: The typed IR is SSA-based and highly
orthogonal. An instruction takes up only 64 bits. It has up to
two operands which are 16 bit references. It's implemented with
a bidirectionally growable array. No trees, no pointers, no cry.
Heavily optimized for minimal D-cache impact, too." [0]
I am a compiler engineer and this is not as great as it seems like.
If you are replacing pointer chasing with arbitrary indexes into arrays, you are likely going to make it orders of magnitude slower for large arrays. (It's barely worth it even if the array fits in L1)
It's counterintuitive because pointers have been optimized on hardware for about 50 years, and array indices are just pointers that your hardware doesn't know about.
The main things that go wrong are hardware prefetching and compiler optimizations taking advantage of alias analysis.
This is conjecture. OTOH I measured while I designed the LuaJIT IR.
1. An array index is just as suitable as a pointer for dereferencing.
2. What matters is how many dereferences are needed and their locality.
3. Data structure density is important to get high cache utilization.
References show a lot of locality: 40% of all IR operands reference the previous node. 70% reference the previous 10 nodes. A linear IR is the best cache-optimized data structure for this.
That said, dereferencing of an operand happens less often than one might think. Most of the time, one really needs the operand index itself, e.g. for hashes or comparisons. Again, indexes have many advantages over pointers here.
What paid off the most was to use a fixed size IR instruction format (only 64 bit!) with 2 operands and 16 bit indexes. The restriction to 2 operands is actually beneficial, since it helps with commoning and makes you think about IR design. The 16 bit index range is not a limitation in practice (split IR chunks, if you need to). The high orthogonality of the IR avoids many iterations and unpredictable branches in the compiler itself.
The 16 bit indexes also enable the use of tagged references in the compiler code (not in the IR). The tag caches node properties: type, flags, constness. This avoids even more dereferences. LuaJIT uses this in the front pipeline for fast type checks and on-the-fly folding.
I dunno, is that really true? The indices into the array can be 4 instead of 8 bytes on a 64-bit system for example, that would double the indices available in one cache line. And then you have the possibility of storing the in/out node references as indices also, halving the memory there. And I know that work lists and stacks of nodes are common for certain transformations, so that would also have an impact. And that's just one thing, having an index effectively means that you can split your data structure into multiple arrays ('struct of arrays' fashion), which might mean that you're doing fewer accesses and jumps over all.
Considering all of that, I'm really not convinced that plain pointers will beat indices. I also don't see what alias analysis has to do with it in this case, sounds like a compiler could easily do alias analysis on A being accessed only as a T by virtue of A having the type Array<T,N>.
Are you saying a move from/to [register+offset] is a magnitude(s) slower than a move from/to [register]? And that prefetching doesn't work with them? Because that's what array indices effectively result in. And... that's not true for either case as far as I know.
This is still really vague. Doesn't pan out how exactly? Arrays aren't a thing at hardware level and are only sometimes a thing in compiler backend. Prefetching the next line helps, but it's irrelevant how the type is represented in the code. (And you can explicitly prefetch from a random address as well) What downsides are you talking about here specifically? Where do the orders of magnitude come from?
Maybe the next paragraph in the link is a crucial bit of missing context here?
> Skip-list chains: The IR is threaded with segregated, per-opcode skip-list chains. The links are stored in a multi-purpose 16 bit field in the instruction. This facilitates low-overhead lookup for CSE, DSE and alias analysis. Back-linking enables short-cut searches (average overhead is less than 1 lookup). Incremental build-up is trivial. No hashes, no sets, no complex updates.
Seems like there is more going on than just replacing pointers with arrays 1:1.
The goal of lowering to binary at 0.1 million lines/sec isn't very ambitious. Virgil bootstraps its 59,000 line compiler in 400 milliseconds, which is over that number, and it does whole program reachability and optimization.
On the other hand, the 10 million lines/sec parsing is probably not achievable. Thats 400MB/s. The fastest JavaScript parser I know of, V8's, is on the order of 60-80MB/s. Virgil's is about 45MB/s. Maybe if you are parsing Lisp you can get to 200MB/s, but no curly-braced language with actual syntax is going to parse that fast.
And you don't have to parse 10X faster than semantic analysis. In my experience, semantic analysis is only about 2X as expensive as parsing, so that 1 million line/sec goal for semantic analysis is easily achievable. (Virgil is at about 800KLoc/sec).
Keep in mind the above is all just one core. 10 and 20 core CPUs are becoming widespread, and these two phases parallelize nicely. Just don't do C's stupid O(n^2) header madness and Carbon will compile plenty fast.
Header files in C/C++ boil down to textual inclusion. They generally end up with such complicated directive logic that a compiler can't reasonably pre-compile or even pre-parse them and get the semantics right. Large projects tend to scale up by having more and more compilation units and headers. Eventually, textual inclusion leads to an O(n^2) explosion in the amount of work the compiler has to do, and generates object files that are O(n^2) big, most of which is then de-duped by the linker. For example, compiling V8 generates a couple gigabytes of .o files that get linked into ~30MB binary in the end. It all starts with header madness.
Google had experience in designing languages so they can be parsed quickly, that's one of Go's design goals. Javascript, on the other hand, is made fast out of necessity, not out of design. I wouldn't take a Javascript parser to be the be-all and end-all of parser performance. If Google has applied any of the lessons learned from Go, they could easily exceed V8, even with curly braces.
> In my experience, semantic analysis is only about 2X as expensive as parsing
Can you elaborate? Semantic analysis might cover anything from simple type checking to full program data flow analysis, and I understand the latter can be very expensive.
Good talk, however, I recommend people watch Andrew Kelley's talk[1] first. His talk has clearly had a major influence on Carbon's compiler design, plus I think Andrew's talk is a lot more practical for those not in the compiler space.
I find carbon and the development of a modern replacement for C++ intriguing to watch. But for the life of me I can’t understand why someone would want this over a more mature language like D
The problem with D is changing direction every couple of years regarding the next big feature that will finally bring the masses into the language, while not quite finishing the last big feature that predated it.
Even Andrei Alexandrescu is back doing C++ at NVidia.
I also don't know if the code I write in D today will compile 20 years from now. A few months ago there was a discussion on the D forums which spilled over here about constant churn due to breaking changes in the language.
I think Google is exploring other options as well, such as Rust. Carbon is a more aggressive experiment with higher risks. They've been very clear about Carbon's experimental status and that users should consider using Rust or something else instead of Carbon if they can use the one right now.