On x86, I t was presumably for performance, so that the TLB does not have to be flushed when switching from user to kernel mode. x86 requires some kernel memeory to be mapped always, for example the stack for syscall and trap handlers. So by keeping everything mapped into memory, the kernel did not have to worry about which parts were needed to handle syscalls and which were not. These kernel pages were marked as “supervisor only”, so only the kernel code could actually read and write them.
I say all of this in the past tesnse, since Meltdown makes it possible to read all that kernel memory. Kernels now keep most of the kernel memory unmapped when user mode is executing.
>"x86 requires some kernel memeory to be mapped always, for example the stack for syscall and trap handlers."
Can you elaborate on what you mean be x86 requires that the kernel stack always be mapped into a process address space in order for system calls?
The kernel always knows where a process's kernel stack is located as there is a pointer to it in the user process's task_struct. It is only in kernel mode that the kernel switches the CPU's stack pointer to use that that processes kernel stack.
You can't unmap the kernel stack in Meltdown mitigation, because the syscall instruction will want to push to the kernel stack before you as the kernel has a chance to map the kernel stack.
It sounded like the OPs comment wasn't strictly about the post-meltdown era and that they were commenting on the general case. But maybeI misinterpreted that?
OK, in the context of 'why can't you cleanly have the kernel in a different address space from user processes on x86', the same reasons apply. It's a chicken/egg thing, as a syscall instruction executes and touches the kernel stack before you have a chance to change mmu mappings.
There are versions of Darwin for x86 (but no released versions of full OSX AFAIK) that separate the address spaces, but they reserve a (albeit much smaller) piece of virtual address space at the top for the kernel in all address spaces in order to facilitate the transition to the full kernel address space.
This is it. Most of the RISC chips had a ASID tag with the MMU metadata that allowed you to switch address spaces without flushing the TLBs, but x86 added this super late. It ended up being added on the second round of virtualization extensions on x86 (and it's different between AMD and Intel).
I say all of this in the past tesnse, since Meltdown makes it possible to read all that kernel memory. Kernels now keep most of the kernel memory unmapped when user mode is executing.