It would be cool if it compiled either LCC assembly or QVM bytecode to LLVM IR. Now you have all of LLVM's optimizations! Finally, spit out the JS using Emscripten like the rest of the project.
It worked, but the main issue was related to object file generation.
Basically, you have to allocate one big i8 array to represent all of the QVM's data segments (including its bss segment). This doesn't translate well once compiled with Emscripten, as bss relocation is on a per-variable basis. For one of the Quake 3 Fortress QVMs which was only a few hundred kb it output a ~50mb .bc file and a ~70mb .js file of mostly 0s.
I experimented with storing each segment as its own i8 array (which could be 0 initialized for the bss segment) in a packed struct, which significantly lowered the .bc file size. However, while I was looking into what I'd need to change in Emscripten to support this, I instead decided to go the route of writing the runtime compiler.
Fun fact: Quake 3 mods didn't have access to malloc / free, so it wasn't uncommon to have large amounts of static, 0 initialized data.
I thought that too, but remember this has to all be done at runtime, so depending on LLVM would require an Emscripten-ized version of LLVM (does such a thing exist?) to be distributed with the generated js and similarly depending on Emscripten would require a copy of Emscripten to be distributed with the generated js. And from what I hear the runtime performance of Emscripten itself ain't that great.
Why does this have to be done at runtime, exactly? How many mods (or rather, QVM files) were there? And are people still making them?
If the answer is "less than 100K" and "no, of course not," then you can probably get away with:
1. compiling all the QVM files to their Javascript equivalents ahead-of-time;
2. stuffing them all on a CDN (with each JS blob named after the md5 of the relevant QVM source);
and then 3. having a "compiler" in your client that just hashes the source it's about to "compile", and requests the blob with that hash from your CDN.
Of course, if the amount of generated JS is small enough, you could even just serve it all with the client. I'm doubting that one, though; it's probably at least 50MB of code. (Though it might be heavily redundant code... it could compress very well!)
> Why does this have to be done at runtime, exactly?
Because that's how quake3 works. What you're describing would be incompatible with all other quake3 clients. And part of the point of implementing the QVM in the first place is compatibility.
I'm impressed, but I'm not particularly surprised. The initial code was interpreted, while the new code is JIT compiled. I have no data to back this up, but I suspect that if instead of generating javascript, he generated x86 assembly, you'd see a similar speedup. Possibly a larger one.
To add to this, transpiling is a kind of compiling where the source code and target code have roughly the same level of abstraction. asm.js does appear to be more "low-level" than quake bytecode, so this is compiling, and not transpiling.
He's trying to conjure up a difference that just isn't there in practice. Compilation is compilation, regardless of the number of times it's performed, and regardless of whether or not the result is stored for future use.
Thanks for the pointers. Looks like asm.js and C are on par as portable assembly targets. The one difference is that asm.js is ready to use by half a billion users, the other is that no sane human should code asm.js directly, whereas some people swear by C as a language for humans. Allow me to clarify the initial statement:
> For 99.9% of the end users, asm.js it's even closer to "portable native code" than C, as C is usually delivered to end users as a precompiled binary via a separate compilation step, therefore no longer portable.
The reason people call C portable assembly is because it's about as low as you can go without coding in assembly, and still have it compile to various platforms.
You are really comparing asm.js and C as portable assembly emitters rather than targets. That said, people very occasionally use C as a target like they use asm.js, e.g. the original C++ compilers, the mars rover.
asm.js is as low as you can seemingly go and have it run in a modern version of Chrome and Firefox, but those requirements prevent it from being a portable assembler in the sense that C is. In addition, it's sensible to write your program in another language first before asm.js, as I understand it.
There's a whole swathe of machines (rather than users) out there that cannot run a modern version of those browsers, i.e. asm.js is not portable to them.
When the code being written in and transposed from is asm.js and the destination is some other language - then asm.js will be portable assembler in the sense that C is.
Asm.js is just a subset of JavaScript that is far more machine-friendly than it is human-friendly. Having a worse syntax than C or normal JavaScript doesn't mean that it's closer to native code or anything like that.
And the compilation step is still there in both cases. It doesn't matter if it's just-in-time compilation or ahead-of-time compilation; it's still compilation, and it's still there.
'Translation' would be the word of choice from PL literature. But there's such a significant transformation going on here that compiling seems quite apt.