Lambda lifting is an alternative to closure conversion, so it doesn't get rid of closures so much as obviate introducing them at all.
The two transformations are fairly closely related, actually. You can view lambda lifting as closure conversion plus flattening, in the case where the code pointer is unnecessary.
This is nonsense. Modern compilers don't work by generating the results you see with -O0 and then optimising; the poor quality of -O0 code is the result of skipping register allocation.
> Dataflow-directed, "work backwards" techniques might be the solution
Destination driven code generation is a known technique. It doesn't generate good enough results to have gotten much attention (better than what you get from -O0, though).
Mutually recursive functions can be closure converted without cycles by deriving closure values from each other rather than storing them in each other.
Environment copying ruses are revealed when something mutates a lexical variable, and the mutation doesn't appear everywhere as it should. (So you have to ban that.)
First, deriving is just pointer arithmetic and doesn't copy anything. Second, standard flat closure representations already involve copying parts of environments, with any sharing problems addressed by assignment conversion (turning variables that are assigned to into mutable cells, a reference to which can be copied into however many environments is necessary).
> standard flat closure representations already involve copying parts of environments, with any sharing problems addressed by assignment conversion (turning variables that are assigned to into mutable cells)
In fact, the simple assoc list representation of environments (whereby simple consing extends the environment) does this. It doesn't eliminate circularity.
If a function has a certain binding in scope, and that binding refers back to the function, you can shuffle that binding around between different environment vectors all you want. Wherever you stick that binding, as long as the binding is in scope of that function, you have circularity.
This turns out to be much simpler than expected, on paper at least.
First, this is a tracing JIT and so it basically inlines everything. There is no argument passing convention at all within a block of JIT code because subroutines bodies are completely inline. So - passing tagged values as arguments is actually a fairly infrequent operation.
Then in cases where tagged values really do need to be passed around they will always be passed by pointer reference. The JIT will write them to memory - the Lua stack or the heap - and pass a pointer reference in a register.
The really happy circumstance is that the whole C runtime system is already written to reference tagged values using pointers, and there is already a port that supports 64-bit Lua values using 32-bit machine words.
I see. It seems like that would still make calling supporting routines annoyingly expensive, but perhaps that (and the extra tag memory) doesn't matter as much as I think it does.
There are really two kinds of supporting routines here.
Runtime system routines operate on tagged values and do have to swallow the stack convention. Thankfully the code is already written in this style and the JIT seldom invokes any of these routines in optimized programs (it generates inline versions.)
Second is C routines called via FFI. These use native C data types and the standard platform calling convention. They never see the tagged values (get an untagged uint64_t in a register instead of a TValue struct.) These are easy, cheap, and frequently used.
There's a decent textbook written in the incremental style argued for by the author: Essentials of Compilation. It features compilers for seven languages, written in Racket, each with successively more advanced features.
My complaint with |> is that it makes code with nesting look quite different based on argument position. Argument position is not a very interesting property, so it doesn't seem right for it to be influencing the shape of code in that way.
My preference is to use parens and introduce single-shot variables if nesting gets too deep. This works the same way for all code.
The two transformations are fairly closely related, actually. You can view lambda lifting as closure conversion plus flattening, in the case where the code pointer is unnecessary.