There's a post by Mike Pall, author of LuaJIT, which explains why writing C is v...

mraleph · on Aug 10, 2015

> Worth noting that IIRC, for a while LuaJIT in interpreted mode was able to beat V8 in optimized mode not all that infrequently

V8 had no optimizing compiler when Mike Pall sent his (in)famous mail about "LuaJIT interpreter beating V8 compiler"[1].

Also usual disclaimers about cross-language benchmarks apply (e.g. nobody looked how those benchmarks differ between JS and Lua implementation).

[1] http://lua-users.org/lists/lua-l/2010-03/msg00305.html

barrkel · on Aug 10, 2015

Common register allocation across different bytecode interpretation sequences was one of the things specifically on my mind that could be tuned using a high-level assembler.

Very suboptimal might be a slight overstatement. I can see a way, given known register calling conventions, that you could write an interpreter written as tail calls and post-process the machine code to effectively JMP instead of CALL. Guaranteed tail calls would save you a bunch of effort, and register calling convention would give you some guarantees about consistent allocation.