Why couldn't a future OS update add access control to these registers?

anyfoo · on May 26, 2021

Because the OS has no say. A running program issues an assembly instruction to the CPU to read or write this register, and the CPU complies.

For the OS to have a say, the CPU would need to provide a way where the OS tells it (usually by setting certain values in other registers) that the CPU should not allow access, at least under certain circumstances.

The article actually does go into certain situations where the access is more restricted (search for "VHE"), but also in how that does not really apply here.

amelius · on May 26, 2021

The OS can scan the program for instructions that access these bits. If necessary on a per-basic-block basis.

saagarjha · on May 26, 2021

Of course, this only works if you can't introduce new code without the kernel noticing.

amelius · on May 26, 2021

Yes, you can introduce new code but the kernel should also watch for that (JIT compilation etc.) and check the resulting code. It's quite involved, and the whole process looks more like a sandbox or emulator, but it's possible.

saagarjha · on May 26, 2021

Doing this performantly is going to be very prohibitive.

amelius · on May 26, 2021

Perhaps (depends also on CPU support), but on the other hand: in today's world with untrusted apps, the kernel will have to do some sandboxing anyway.

Wowfunhappy · on May 26, 2021

Could the OS intentionally clear or write dummy data to the register instead?

josephcsible · on May 26, 2021

No. The author explained why not:

> originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel.

slver · on May 26, 2021

You gotta access those bits though some instructions though. What if the command pipeline filters those instructions.

anyfoo · on May 26, 2021

Can you elaborate what you mean? What is the "command pipeline" here?

tomerdmann · on May 26, 2021

https://news.ycombinator.com/item?id=27286918

x0054 · on May 26, 2021

You are working here with CPU registers. At this point the OS has no say, it’s a hardware bug. Not a particularly serious one though.

slver · on May 26, 2021

I didn't say the OS filters the pipeline. Modern CPUs have a lot of updateable microcode, including how it handles its command pipeline.

marcan_42 · on May 26, 2021

There is no indication that the M1 has updatable microcode, nor any other features that might allow such mitigation. (If it did, Apple would've fixed it; I did give them a 90 day disclosure warning and they're not lazy about fixing actual fixable bugs.)

ncr100 · on May 26, 2021

Aw - that was what I was worried about - without updatable microcode :nuke:.

throwaheyy · on May 26, 2021

Modern x86/x64 CPUs. The M1 might not have updatable microcode.

dev_tty01 · on May 26, 2021

Apple might consider microcode a vulnerability. Certainly a double-edged knife.

josephcsible · on May 26, 2021

Because the CPU doesn't provide a practical means to do so.

herpderperator · on May 26, 2021

Doesn't the kernel control CPU access?

CGamesPlay · on May 26, 2021

There's more specific answers here, but in general the answer to this question is "only partly". The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code. During your time slice, you can do anything you want to the CPU, and in general only interrupts (timer interrupts, hardware interrupts, page faults, etc) will cause the kernel to get involved again. There are some specific features that CPU designers add to give extra control to the kernel, but that's a feature of the CPU and it's only when the CPU has explicitly added that type of control.

saagarjha · on May 26, 2021

> The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code.

Somewhat critically, it will also drop down to EL0.

kevingadd · on May 26, 2021

Registers aren't resources you access through syscalls, there's no way for the kernel to control them unless you're running under virtualization or the CPU architecture specifically allows access control for the register. (As the site notes, virtualization allows controlling access to this register)

kolbusa · on May 26, 2021

Can kernel scan each page it maps as executable and return an error if it finds instructions interacting with the 'bad' register? Assuming the kernel requires executable pages to be read-only (W^X), this may even be doable (but probably very very slow).

josephcsible · on May 26, 2021

> Assuming the kernel requires executable pages to be read-only (W^X)

Which macOS's kernel doesn't.

marcan_42 · on May 26, 2021

It does require that, but it allows flipping between RX and RW at will (for JITs), and the M1 actually has proprietary features to allow userspace to do this without involving the kernel, so the kernel couldn't re-scan when those flips happen (plus it would kill performance anyway).

Plus, as I said above, this is prone to false positives anyway because the executable section on ARM also includes constant pools.

josephcsible · on May 26, 2021

Can't a MAP_JIT region be writable by one thread and executable by a different thread at the same time?

marcan_42 · on May 26, 2021

Ah, yes, I forgot about that. So indeed there is no non-racy hook point for the kernel to do such a check, even if it made sense and the RX/RW switch went through the kernel, which it doesn't.

anyfoo · on May 26, 2021

https://developer.apple.com/documentation/apple-silicon/port...

josephcsible · on May 26, 2021

That link confirms that it can:

> Because pthread_jit_write_protect_np changes only the current thread’s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region.

marcan_42 · on May 26, 2021

The kernel doesn't get a say in what instructions a userspace program can run, other than what the CPU is designed to allow it to control. The bug is the CPU designers forgot to allow it to control this one.

saagarjha · on May 26, 2021

Apple could "mitigate" this by refusing to sign code interacting with s3_5_c15_c10_1, I guess.

marcan_42 · on May 26, 2021

Only on iOS. On macOS, JITs are allowed (as is ad-hoc signed code if you click through the warnings).

However, this would be prone to false positives, as constant pools are in the executable section on ARM.

SheinhardtWigCo · on May 26, 2021

Let's say someone submits a malicious keyboard with the bad instructions hidden in a constant pool.

Apple can't just scan for a bad byte sequence in executable pages because it could also represent legitimate constants used by the program. (not sure if this part is correct?)

If so, doesn't that make detection via static analysis infeasible unless LLVM is patched to avoid writing bad byte sequences in constant pools? Otherwise they have to risk rejecting some small number of non-malicious binaries, which might be OK, depending on the likelihood of it happening.

josephcsible · on May 26, 2021

Doesn't Rice's theorem mean that they cannot?

cokernel_hacker · on May 26, 2021

I believe that Rice's theorem is about computability, not about whether or not it is possible to validate which CPU instructions a program can contain.

With certain restrictions, it is possible to do this: Google Native Client [1] has a verifier which checks that programs it executed did not jump into the middle of other instructions, forbade run-time code generation inside of such programs, etc.

[1]: https://en.wikipedia.org/wiki/Google_Native_Client

saagarjha · on May 26, 2021

Jumping in the middle of other instructions is not a problem on ARM.

josephcsible · on May 26, 2021

Yes, but then you're not just blocking instructions that touch s3_5_c15_c10_1; you're also blocking a bunch of other kinds of instructions too.

anyfoo · on May 26, 2021

(What other kinds of instructions? Genuinely asking.)

I don't think Rice's Theorem applies here. As a counterexample: On a hypothetical CPU where all instructions have fixed width (e.g. 32 bits), if accessing a register requires the instruction to have, say, the 10th bit set, and all other instructions don't, and if there is no way to generate new instructions (e.g. the CPU only allows execution from ROM), then it is trivial to check whether there is any instruction in ROM that has bit 10 set.

The next part I'm less sure how to state it rigorously (I'm not in the field): In our hypothetical CPU, I think disallowing that instruction either lets you remain being Turing Complete or not. In the former case, it's still the case that you can compute everything a Turing Machine can.

josephcsible · on May 26, 2021

You'd have to add one extra condition to your hypothetical CPU: that it can't execute unaligned instructions. Given that, then yes, that lets you bypass Rice's theorem, even though it is indeed still Turing-complete.

But the M1 does have a way to "generate new instructions" (i.e., JIT), so that counterexample doesn't hold for it.

anyfoo · on May 26, 2021

Yes, indeed, I should have stated "cannot execute unaligned instructions". Or have said 8 bit instead, then it would be immediately obvious what I mean. (You cannot jump into the middle of a byte because you cannot even address it.)

But I wanted to show how Rice's Theorem does not generally apply here. You can make up other examples: A register that needs an instruction with a length of 1000 bytes, yet the ROM only has 512 bytes space etc...

As for JIT, also correct (hence my condition), though that's also a property of the OS and not just the M1 (and on iOS for example, it is far more restricted what code is allowed to do JIT, as was stated in the thread already).

johncolanduoni · on May 26, 2021

With the way Apple allows implementation of JIT on the M1 (with their custom MAP_JIT flag and pthread_jit_write_protect_np) it is actually possible to do this analysis even with JIT code. Since it enforces W^X (i.e. pages cannot be writable or executable at the same time) it gives the OS opportunity to inspect the code synchronously before it is rendered executable. Rosetta 2’s JIT support already relies on this kind of inspection to do translation of JIT apps.

josephcsible · on May 26, 2021

https://news.ycombinator.com/item?id=27286771

saagarjha · on May 26, 2021

M1 enforces W^X through SPRR, which does not involve the kernel.

johncolanduoni · on May 26, 2021

It does when running native ARM code (but not x86 code), but AFAIK nothing stops Apple from changing this to being kernel mediated by updating libSystem in the ARM case as well. Of course I doubt they would take the performance hit just to get rid of a this issue.

ynik · on May 26, 2021

There's three cases:

1) the program does not contain an instruction that touches s3_5_c15_c10_1

2) the program contains an instruction that touches s3_5_c15_c10_1, but never executes that instruction

3) the program contains an instruction that touches s3_5_c15_c10_1, and uses it

Rice's theorem means we cannot tell whether a program will touch the register at runtime (as that's a dynamic property of the program). But that's because we cannot tell case 2 from case 3. It's perfectly decidable whether a program is in case 1 (as that's a static property of the program).

Any sound static analysis must have false positives -- but those are exactly the programs in case 2. It doesn't mean we end up blocking other kinds of instructions.

dezgeg · on May 26, 2021

Couldn't there be another register that controls whether access to the problematic register in EL0 is allowed, though?