I am a compiler engineer and this is not as great as it seems like. If you are r...

mikemike · on Aug 27, 2023

This is conjecture. OTOH I measured while I designed the LuaJIT IR.

1. An array index is just as suitable as a pointer for dereferencing. 2. What matters is how many dereferences are needed and their locality. 3. Data structure density is important to get high cache utilization.

References show a lot of locality: 40% of all IR operands reference the previous node. 70% reference the previous 10 nodes. A linear IR is the best cache-optimized data structure for this.

That said, dereferencing of an operand happens less often than one might think. Most of the time, one really needs the operand index itself, e.g. for hashes or comparisons. Again, indexes have many advantages over pointers here.

What paid off the most was to use a fixed size IR instruction format (only 64 bit!) with 2 operands and 16 bit indexes. The restriction to 2 operands is actually beneficial, since it helps with commoning and makes you think about IR design. The 16 bit index range is not a limitation in practice (split IR chunks, if you need to). The high orthogonality of the IR avoids many iterations and unpredictable branches in the compiler itself.

The 16 bit indexes also enable the use of tagged references in the compiler code (not in the IR). The tag caches node properties: type, flags, constness. This avoids even more dereferences. LuaJIT uses this in the front pipeline for fast type checks and on-the-fly folding.

the-smug-one · on Aug 28, 2023

Did you happen to do any long form prose writing (paper, online article, etc) about the design of the LuaJIT IR? Sounds extremely fascinating.

the-smug-one · on Aug 26, 2023

I dunno, is that really true? The indices into the array can be 4 instead of 8 bytes on a 64-bit system for example, that would double the indices available in one cache line. And then you have the possibility of storing the in/out node references as indices also, halving the memory there. And I know that work lists and stacks of nodes are common for certain transformations, so that would also have an impact. And that's just one thing, having an index effectively means that you can split your data structure into multiple arrays ('struct of arrays' fashion), which might mean that you're doing fewer accesses and jumps over all.

Considering all of that, I'm really not convinced that plain pointers will beat indices. I also don't see what alias analysis has to do with it in this case, sounds like a compiler could easily do alias analysis on A being accessed only as a T by virtue of A having the type Array<T,N>.

mattsan · on Aug 27, 2023

Might be worth exploring linear ASTs with relative pointers which size could be truncated along with N-byte alignments

kevindamm · on Aug 27, 2023

profile it!

viraptor · on Aug 27, 2023

Are you saying a move from/to [register+offset] is a magnitude(s) slower than a move from/to [register]? And that prefetching doesn't work with them? Because that's what array indices effectively result in. And... that's not true for either case as far as I know.

fooker · on Aug 27, 2023

Arrays shine when you access the elements in a predictable order.

That's what compiler backend and hardware optimizes for.

Blindly replacing pointers with random indexes into large arrays doesn't pan out.

viraptor · on Aug 27, 2023

This is still really vague. Doesn't pan out how exactly? Arrays aren't a thing at hardware level and are only sometimes a thing in compiler backend. Prefetching the next line helps, but it's irrelevant how the type is represented in the code. (And you can explicitly prefetch from a random address as well) What downsides are you talking about here specifically? Where do the orders of magnitude come from?

vanderZwan · on Aug 27, 2023

Maybe the next paragraph in the link is a crucial bit of missing context here?

> Skip-list chains: The IR is threaded with segregated, per-opcode skip-list chains. The links are stored in a multi-purpose 16 bit field in the instruction. This facilitates low-overhead lookup for CSE, DSE and alias analysis. Back-linking enables short-cut searches (average overhead is less than 1 lookup). Incremental build-up is trivial. No hashes, no sets, no complex updates.

Seems like there is more going on than just replacing pointers with arrays 1:1.