Coincidentally enough I'm neck deep in this area for the last month (I haven't read this paper but I've read others from the group). There are a bunch of things around this that are related and look vaguely the same: abstract interpretation, staged execution, symbolic execution and partial evaluation.
I'll give a use case (the use case I'm interested in) that might help to illustrate the point: dynamic neural nets. How do you optimize neural nets that have control flow and dynamic shapes (variable size tensors)? Most (all?) hardware fast paths (GPUs, accelerators, SIMD registers) have hard (parden the pun) requirements on memory layout; e.g, if you want to use vector/SIMD units on CPUs effectively you have to align and chunk your data to fit into the fixed width registers. So for nets that have dynamics it's pretty hard generate machine code for using these things - most people pad to some "good enough" width and pay the price on throughput.
All of these techniques speak to this problem (among others obviously) by inferring as much as possible at compile time and kicking the can to runtime for the rest. Partial evaluation, the one I'm most familiar, propagates, usually on an IR, as much statically known information as it can infer (depending on the quality of the implementation), and simplifies thereby (think things like substituting lengths of lists to simplify comparisons). After the partial eval loop hits a fixed point (no more changes discovered) you have a representation of the neural net that has unbound variables (for some tensor shapes) in it but also code for quickly determining more of these unknown shapes at runtime.
A really good pedagogical paper/dissertation is from the guy responsible for large parts of TVM:
I'll give a use case (the use case I'm interested in) that might help to illustrate the point: dynamic neural nets. How do you optimize neural nets that have control flow and dynamic shapes (variable size tensors)? Most (all?) hardware fast paths (GPUs, accelerators, SIMD registers) have hard (parden the pun) requirements on memory layout; e.g, if you want to use vector/SIMD units on CPUs effectively you have to align and chunk your data to fit into the fixed width registers. So for nets that have dynamics it's pretty hard generate machine code for using these things - most people pad to some "good enough" width and pay the price on throughput.
All of these techniques speak to this problem (among others obviously) by inferring as much as possible at compile time and kicking the can to runtime for the rest. Partial evaluation, the one I'm most familiar, propagates, usually on an IR, as much statically known information as it can infer (depending on the quality of the implementation), and simplifies thereby (think things like substituting lengths of lists to simplify comparisons). After the partial eval loop hits a fixed point (no more changes discovered) you have a representation of the neural net that has unbound variables (for some tensor shapes) in it but also code for quickly determining more of these unknown shapes at runtime.
A really good pedagogical paper/dissertation is from the guy responsible for large parts of TVM:
https://digital.lib.washington.edu/researchworks/handle/1773...
Also these blogposts on how abstract interpretation works in Julia are good
https://mikeinnes.github.io/2020/07/29/mjolnir.html
https://aviatesk.github.io/posts/data-flow-problem/?s=09
The last is pretty formal but good if you're comfortable with abstraction (no pun intended).
And here's one of their papers on staged programming for NNs
https://research.google/pubs/pub47990/