In fact, I took one of the techniques (traces) directly from a paper describing how the Bochs x86 virtual machine works :-)
But it goes deeper than that. I believe the whole "trace" thing in both jits and vms comes from a few papers describing trace-based instruction predecoding for hardware CPUs.
Yes I believe that's true, with the idea of a trace cache originating in the 90s to work around perceived I-cache limitations, "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching" is the seminal paper on it.
The idea of straight-line traces goes back even further, to Josh Fisher's work on trace scheduling in compilers in the early 80s (linearizing one control-flow path gives a much wider scope for optimizations):
He combined this with a VLIW processor architecture to build Multiflow, a hardware startup. (Interesting history tidbit: Robert Colwell, who architected the P6, the first out-of-order Intel core, started his career at Multiflow before joining Intel. The P6 didn't have any trace-cache influences, but the P4, a few years later, infamously did...)
Yes, nowadays x86 cores have µop caches which store the decomposition of individual instructions and then other "optimizers" that target specific constructs (e.g. loop stream detector).
In fact, I took one of the techniques (traces) directly from a paper describing how the Bochs x86 virtual machine works :-)
But it goes deeper than that. I believe the whole "trace" thing in both jits and vms comes from a few papers describing trace-based instruction predecoding for hardware CPUs.