Now, if you could reduce the overhead of OS threads by using lightweight processes instead ... Wait a moment, isn't that what Erlang does? Well, project Lumen might help you out on the JVM at some point.
The Beam VM handles memory copying, afaik. It is very much multi-process. It has very lightweight processes, of which you can spawn thousands. Those are then run in as many OS threads as makes sense and are available to the virtual machine. It is unclear what you mean by Erlang (the language) being single threaded. The VM easily runs on as many cores as you give it, probably by using as many or more OS threads.
If I wanted to get a little more performance per watt I would probably rewrite it in C with arrays of atomic variables.
But you need a VM with GC to be able to be productive during the day and sleep at night, so probably not...