Interesting - it resembles a network of heterogeneous systems that can share a memory space used primarily for explicit data exchange. Not quite what I was imagining, but probably much simpler to implement than a Unix where the kernel can see processes running on different ISAs on a shared memory space.
I guess hardware availability is an issue, as there aren't many computers with, say, an ARM, a RISC-V, an x86, and an AMD iGPU sharing a common memory pool.
OTOH, there are many where a 32-bit ARM shares the memory pool with 64-bit cores. Usually the big cores run applications while the small ARM does housekeeping or other low-latency task.
> Not quite what I was imagining, but probably much simpler to implement than a Unix where the kernel can see processes running on different ISAs on a shared memory space.
Indeed. The other argument is that treating the computer as a distributed system can make it scale better to say hundreds of cores compared to a lock-based SMP system.
Up to GPGPUs, there was no reason to build a machine with multiple CPUs of different architectures except running different OSs on them (such as the Macs, Suns and Unisys mainframes with x86 boards for running Windows side-by-side with a more civilized OS). With GPGPUs you have machines with a set of processors that are good on many things, but not great at SIMD and one that's awesome at SIMD, but sucks for most other things.
And, as I mentioned before, there are lots of ARM machines with 64-bit and ultra-low-power 32-bit cores sharing the same memory map. Also, even x86 variants with different ISA extensions can be treated as different architectures by the OS - Intel had to limit the fast cores of its early asymmetric parts because the low-power cores couldn't do AVX512 and OSs would not support migrating a process to the right core on an invalid instruction fault.
I guess hardware availability is an issue, as there aren't many computers with, say, an ARM, a RISC-V, an x86, and an AMD iGPU sharing a common memory pool.
OTOH, there are many where a 32-bit ARM shares the memory pool with 64-bit cores. Usually the big cores run applications while the small ARM does housekeeping or other low-latency task.