CPS is a cool technique, and makes advanced constructs like call/cc trivial to implement. Using CPS techniques in code can feel like magic.
But CPS isn't really the state of the art for functional compilers anymore. It's complex for a human to read and hard to optimize. Something like A-normal form is much easier to work with for a compiler intermediate representation
I wouldn’t call either “state of the art”, it all depends on what you’re doing and what you want to achieve. Many compilers use several different IRs to achieve different things
ANF transform will rewrite that to “let a0 = if x then an else b in E a0”.
This isn’t identical to floating the call to E inside the “if”, obviously, but would have (within predictable boundaries) the same performance. If this code went through LLVM to produce machine code I suspect it would wind up identical as LLVM will likely rewrite the duplicated call to be outside the “if”.
Let's say E[h] is "if h == 1 then 2 else 3", a is 1, and b is 2. Then before:
if (if x then 1 else 2) == 1 then 2 else 3
After:
if x then (if 1 == 1 then 2 else 3) else (if 2 == 1 then 2 else 3)
which trivially simplifies to if x then 2 else 3
Your proposed rewrite is "let a0 = if x then 1 else 2 in if a0 == 1 then 2 else 3" which is not a simplification at all - it makes the expression longer and actually introduces a variable indirection, making the expression harder to analyze. The call will require some sort of non-local analysis to pierce through the variable and do the duplication and elimination.
The ANF form is amenable to rewriting the “if” outside the “let”.
if x then
let a0 = a in E a0
else
let a0 = b in E a0
If a and b are simple variables or constants the new “let”s will be eliminated by ANF.
What happens to E isn’t so straightforward. In your example, though, it’s a small expression tree and would just be duplicated inline with the same obvious constant reductions available.
If it’s ‘large’ (heuristically) it might instead be assigned to a “let” binding outside the “if”, to avoid duplicating a large block of code.
CPS should be making that same determination on E.
let
x k = if x then k 1 else k 2
E k h = if h == 1 then k 2 else k 3
in
\k -> x (E k)
and then when you inline E into x it becomes
let
E k h = if h == 1 then k 2 else k 3
in
\k -> if x then E k 1 else E k 2
which can again be locally simplified to
\k -> if x then k 2 else k 3
Unlike ANF, no blind duplication is needed, it is obvious from inspection that E can be reduced. That's why ANF doesn't handle this transformation well - determining which expressions can be duplicated is not straightforward.
It looks like the compiler would need to determine it can inline E twice before it can perform a constant case fold on the "if h == 1" expression, right? It's 'obvious' if you understand what E does, but that's not a program transform. Unless I'm very mistaken, this is two steps: inline E twice, then constant case fold (or constant "if" fold, if you aren't translating all the "if"s to "case"s).
If E were a different expression, perhaps "E h = let F i = if z == i + h then 7 else 8 in if y then F 1 else F 2", then the inlining choice (made twice, here) would be very poor - F would be inlined four times all up, and since "z" is a free variable here there's no constant case/if to fold, and everything would wind up worse.
The ANF transforms should look something like:
let
a0 = if x then 1 else 2
E h = if h == 1 then 2 else 3
in
E a0
Departing from my previous analysis, I'd say that the next obvious step is the inlining of the now-only-called-once E:
let
a0 = if x then 1 else 2
in
if a0 == 1 then 2 else 3
But (either here or earler) that's not ANF, the "a0 == 1" isn't a simple constant or variable:
let
a0 = if x then 1 else 2
a1 = a0 == 1
in
if a1 then 2 else 3
Let's take the time to rewrite all those conditionals and such as case statements:
let
a0 = case x of { True -> 1; False -> 2 }
a1 = case a0 of { 1 -> True; _ -> False }
in
case a1 of { True -> 2; False -> 3 }
Now we can inline a0 into a1, since it's only called once:
let
a1 = case (case x of { True -> 1; False -> 2 }) of
{ 1 -> True; _ -> False }
in
case a1 of { True -> 2; False -> 3 }
A case-of-case transform applies to a1:
let
a1 = case x of
True -> case 1 of { 1 -> True ; _ -> False }
False -> case 2 of { 1 -> True ; _ -> False }
in
case a1 of { True -> 2; False -> 3 }
Two case of constant transforms on a1 apply:
let
a1 = case x of { True -> True ; False -> False }
in
case a1 of { True -> 2; False -> 3 }
a1 can be inlined and case-of-case once again applied:
case x of
True -> case True of { True -> 2; False -> 3 }
False -> case False of { True -> 2; False -> 3}
Case-of-constant again leaves us with just:
case x of { True -> 2; False -> 3 }
And that's our desired outcome. We took some risks doing case-of-case because of the duplicated outer case, but that can be dealt with using join points for the expressions that get duplicated; a bit more long winded but the same end result here due to the case-of-constant transform eliminating the duplicate immediately each time, followed by an inlining of an expression evaluated only once.
If there was a clear distinction between let-bindings and join points, I would be happy. But there is not - there is this contification process and, rather than proving that their algorithm finds all join points, they just say "it covers most of the ground [in practice]", and then they cite (Section 4) two other papers that suggest contification is undecidable - whether a function returns to a specific continuation or function is a behavioral property, not syntactic. Even if you accept that the dominator-based analysis is optimal, it is a global property. not a local property like in a proper IR. So what I see is a muddy mess.
Last I looked, join points hadn't made it into GHC Core directly - do you know if the ideas in the join points paper were introduced to (mainline) GHC in the end?
edit: it looks like there's distinct Core join points now, never mind!
But CPS isn't really the state of the art for functional compilers anymore. It's complex for a human to read and hard to optimize. Something like A-normal form is much easier to work with for a compiler intermediate representation
https://en.wikipedia.org/wiki/A-normal_form