There are of course cases that can be slower, like code that depends heavily on SIMD, as wasm support for SIMD is still being improved. But in general compiling to wasm as an intermediary IR and then using an optimizing compiler can lead to very fast code.
A year ago I tested JSON Link compiled to wasm and run in wasmtime and it was around 20-25%. That isn’t bad at all considering it as a cost of running on almost anything. Didn’t see it transpiled to C89, obviously wasn’t available yet, but I think wasmtime compiled it to machine native anyways.
No need. For the Rust compiler, like most modern compilers, the list of systems it can produce binaries for is independent of which system it is running on. Either the compiler supports the instruction set and executable format used on Mac OS 9, or it doesn’t. If it does, then you could cross-compile Rust code for Mac OS 9 from any other supported system – including cross-compiling the compiler itself if desired. If it doesn’t, then it doesn’t, and even if you managed to run the compiler on Mac OS 9 (through a translation layer or otherwise), it would still only be able to produce code for its supported targets.
An exception to this principle is the linker: the Rust compiler relies on an external linker, and it may happen that the standard linker for a system only runs on that system itself. This is true on modern macOS and Windows. In this case it’s impossible to cross-compile for the target normally (at least without using a nondefault linker such as LLD). But even in that case, it’s possible to separate the compile and link steps, so you could compile an object file on another system, copy the object file to the target system, and run the linker from there. That would be inconvenient but good enough for bootstrapping purposes, so a translation layer still wouldn’t really help.
I wonder: do those WASM-2-C transpilers actually "patch back" the original external function calls from the WASM imports table back into the C source (so that system API calls would be handled directly by the system's linker), or is there still a WASI-style indirection involved going through a user-provided 'system API wrapper' (e.g. what would be needed to call into a random Mac OS 9 UI function with this solution to create a UI 'Hello World' demo - can this be handled purely with Rust's FFI features?)
From that link, it also sounds like 14% is the best case that requires some help from the underlying OS:
"Two results are shown for WasmBoxC, representing two implementations of memory sandboxing. The first is explicit sandboxing, in which each memory load and store is explicitly verified to be within the sandboxed memory using an explicit check (that is, an if statement is done before each memory access). This has 42% overhead.
The OS-based implementation uses the “signal handler trick” that wasm VMs use. This technique reserves lots of memory around the valid range and relies on CPU hardware to give us a signal if an access is out of bounds (for more background see section 3.1.4 in Tan, 2017). That is fully safe and has the benefit of avoiding explicit bounds checks. It has just 14% overhead! However, it cannot be used everywhere (it needs signals and CPU memory protection, and only works on 64-bit systems).
There are more options in between those 14% and 42% figures. Explicit and OS-based sandboxing preserve wasm semantics perfectly, that is, a trap will happen exactly when a wasm VM would have trapped. If we are willing to relax that (but we may not want to call it wasm if we do) then we can use masking sandboxing instead (see section 3.1.3 in Tan, 2017), which is 100% portable like explicit sandboxing and also prevents any accesses outside of the sandbox, and is somewhat faster at 29% overhead. Other sandboxing improvements are possible too - almost no effort has gone into this yet."
It sounds like this last one is the most relevant to porting code to obscure platforms (which usually means embedded these days). 29% overhead for verifiably safe sandboxing is a good trade-off, but when you don't actually need that sandboxing, I wouldn't call that insignificant. Especially on hardware that's slow by modern standards to begin with.
Most of the obscure architectures people talk about are rather slow—after all, if performance were desired, then there would be a modern compiler available for them—and so code running on them isn't performance critical.
That conclusion does not follow. Often the slower architectures need the higher performance code where modern computers can afford to waste some. 8051s are ridiculously common, because they're stupidly cheap, but they're so under-powered that software performance is absolutely critical.
https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html
See also here, where wasm2c is the fastest wasm runtime:
https://00f.net/2023/01/04/webassembly-benchmark-2023/
There are of course cases that can be slower, like code that depends heavily on SIMD, as wasm support for SIMD is still being improved. But in general compiling to wasm as an intermediary IR and then using an optimizing compiler can lead to very fast code.