Why is static pointless and why is dynamic better? If emulating a newer game con...

vidarh · on June 7, 2013

Dynamic lets you do lots of nasty tricks, and also lets you detect and work around various nasty tricks at runtime (worst case by falling back to emulation). Consider that old games often would use self-modifying code, for example. Reliably statically detecting self-modifying code can be extremely hard even when it's not intentionally obfuscated. But at runtime it is "easy": Write protect all the code pages, and trap page-faults, and either modify the offending code, or fall back to emulation.

In general, I think dynamic approaches are ideal when you're dealing with a "hostile" environment where the software you're translating was written with no expectation that it would be translated, and possibly (like with old games) in a situation where the programmer may have tried to really maximally exploit the hardware, because it means failure to statically determine that something weird is going on can often be counteracted much simpler by detecting attempts at violating your assumptions.

You can do hybrid approaches, and statically make a "best effort" and include similar methods to trap stuff that breaks your assumptions and fall back to JIT or emulation, but if you do that then there's a tradeoff between how much dynamic stuff you need to be able to do before it's easier to just do everything dynamically from the start.

The performance thing is not so easy to ascertain. JIT'ing code takes a bit of time, but not much compared to the expected overall time the program will be run afterwards. Static compilation can spend more time doing optimisations, but JIT's at least in theory have more information to work with (can detect the specific processor version, and use specialised instructions or alter instruction selection, for example, or could at least in theory even do tricks like re-arranging data to get better cache behavior (I have no idea if any existing JITs actually do that) based on profiling access patterns for the current run.

jmhain · on June 7, 2013

Thanks for the thorough response. I really appreciate it.

delroth · on June 7, 2013

You can't: as long as you have self mutating code (including loading more code in memory from another place), static recompiling is simply not possible. It's not a matter of being "better", it's the only solution.

As video game consoles start looking more and more like PCs, self mutating code starts to be less frequent and more easily detectable. On the Nintendo 3DS for example, only the OS can allocate executable pages, and executable pages are forced read-only. This means you could in theory do static recompilation of a 3DS game. This is the only case I know of though - all other recent video game consoles allow self mutating code or dynamic code loading.

vidarh · on June 10, 2013

That's not really true. A lot of self-modifying code follow very predictable patterns where the modifications treats addresses in parts of the code as a variable. In 6502 code in particular this is a common idiom for looping over arrays of more than 256 bytes. Many of these patterns are easily detectable and easy to statically rewrite.

You likely can't handle the general case, but that's a lot less critical. Especially for old consoles or computers where the pool of software with too complex cases that can't easily be handled with generic analysis is small enough that you can reasonably add special cases for most stuff you care about.

comex · on June 7, 2013

Even so, I wonder if you could get better speed for, say, Dolphin, by pausing to do as much statically as possible (perhaps going all the way through LLVM's optimizations) rather than being in a hurry to do everything dynamically without pausing.

delroth · on June 7, 2013

I wanted to experiment with this idea at some point (without pausing - using a background thread that does optimization and patches code with the optimized version when it's done) but our current JITCache is just not fit for this. We perform JIT compilation per-basic block, which does not allow for a lot of optimization compared to per-function (following all direct branches as long as possible). Changing that would've required too much time and I got bored of it :(