In practice, it's not "~500 lines". You have whole control-flow statements with several semicolon-separated statements in their body, all crammed on a single line. The short variable names allow for the physical line not becoming too long and there is even some symmetry to it, but it's not like each line is a single statement with many nested subexpressions.
The real size of this code, after putting each statement on its own line, would be on the order of 300%.
Imagine this:
if (tk == Mul) { next(); *++e = PSH; expr(Inc); *++e = MUL; ty = INT; }
Turning into this:
if (tk == Mul) {
next();
*++e = PSH;
expr(Inc);
*++e = MUL;
ty = INT;
}
The line count isn't not "real" just because it isn't how a mindless autoformatter would do it. The formatting conveys actual information. A line expresses one "thought". Laying it out horizontally allows the vertical direction to be used to visually convey the repeating pattern
else if (tk == Mul) { next(); *++e = PSH; expr(Inc); *++e = MUL; ty = INT; }
else if (tk == Div) { next(); *++e = PSH; expr(Inc); *++e = DIV; ty = INT; }
else if (tk == Mod) { next(); *++e = PSH; expr(Inc); *++e = MOD; ty = INT; }
You know what else would communicate that? A function or macro.
else if (tk == Mul) { applyOperator(MUL); }
else if (tk == Div) { applyOperator(DIV); }
else if (tk == Mod) { applyOperator(MOD); }
But then you're not conforming to their arbitrary idea of "minimalism = fewer functions".
I definitely have some admiration for their picking a goal and following through on it, and there are a few tricks in there that are downright brilliant, but let's not pretend this is about effective communication.
I can agree that there's a repeating pattern in handling each of the binary operators (they are around a dozen), but I'm failing to see a pattern in fragments like these:
if (tk == ']') next(); else { printf("%d: close bracket expected\n", line); exit(-1); }
if (t > PTR) { *++e = PSH; *++e = IMM; *++e = sizeof(int); *++e = MUL; }
else if (t < PTR) { printf("%d: pointer type expected\n", line); exit(-1); }
This code is implementing pointer indexing: p[i]. The first line is reading the closing `]`. The second line is computing the pointer offset, which is equal to `i * sizeof(int)`. The third line is producing an error if `p` does not have a pointer type.
I think I agree with you that this part could be refactored a bit. I would be tempted to put the "PSH" corresponding to the "i" next to when we parse the "i". I would also write the check that "p" has a pointer type before the code that indexes it.
This is why I mostly hate (but partly love for the sake of reducing bikeshedding) it when teams add an autoformatter as a mandatory part of a code pipeline -- it destroys relevant spatial information.
If we are going to force autoformatters, we might as well just use annotated ASTs instead of text so we all see our own chosen view of the code.
The number of lines isn't really the point; it's C in 4 Functions, not C In 500 Lines.
What's more impressive is that it's self-hosted and implements just the subset of C required to compile itself, which makes it harder to keep the code short, but it manages anyways.
The real size of this code, after putting each statement on its own line, would be on the order of 300%.
Imagine this:
Turning into this: