While Marcan has written in a very entertaining fashion, there is perhaps one application of this vulnerability that wasn't considered.
If this can be reproduced on the iPhone, it can lead to 3rd party keyboards exfiltrating data. By default, keyboard app extensions are sandboxed away from their owning applications [0], but they may communicate with the app over this channel and leak data. It's not as easy as I describe because the app would have to be alive and scheduled on the same cluster, but it's within the realm of possibility.
> However, since iOS apps distributed through the App Store are not allowed to build code at runtime (JIT), Apple can automatically scan them at submission time and reliably detect any attempts to exploit this vulnerability using static analysis (which they already use). We do not have further information on whether Apple is planning to deploy these checks (or whether they have already done so), but they are aware of the potential issue and it would be reasonable to expect they will. It is even possible that the existing automated analysis already rejects any attempts to use system registers directly.
As I mentioned below and on the disclosure page, it's trivial for Apple to reliably detect this in apps submitted to the App Store and reject them, so I'm not worried. There's no such thing as "obfuscated" malware in the traditional sense on the App Store. You can obfuscate the code flow all you want, but all executable code has to be signed to run on iDevices. If you try to use this register, the instruction will be there for all to see. You can't use self-modifying code or packers on iOS.
I expect Apple to include checks for this in their App Store static analyzer, if they aren't already rejecting sysreg instructions, which mitigates the issue. Obviously JIT isn't allowed in the App Store, so this should be an effective strategy.
JITC is irrelevant actually. This is not an argument for blocking it.
Firstly, no normal JITC will ever emit instructions that access undocumented system registers. Any JITC that comes from a known trusted source (and they're expensive to develop, so they basically all do) would be signed/whitelisted already and not be a threat anyway.
So what about new/unrecognised programs or dylibs that request JITC access? Well, Apple already insist on creating many categories of disallowed thing in the app store that can't be detected via static analysis. For example, they disallow changing the behaviour of the app after it is released via downloaded data files, which is both very vague and impossible to enforce statically. So it doesn't fundamentally change the nature of things.
But what if you insist on being able to specifically fix your own obscure CPU bugs via static analysis? Well, then XNU can just implement the following strategy:
1. If a dylib requests a JITC entitlement, and the Mach-O CD Hash is on a whitelist of "known legit" compilers, allow.
2. Otherwise, require pages to be W^X. So the JITC requests some writeable pages, fills them with code, and then requests the kernel to make the pages executable. At that point XNU suspends the process and scans the requested pages for illegal instruction sequences. The pages are hot in the cache anyway and the checks are simple, so it's no big deal. If the static checks pass, the page is flipped to be executable but not writeable and the app can proceed.
Apple's ban on JITC has never really made much technical sense to me. It feels like a way to save costs on program static analysis investment and to try and force developers to use Apple's own languages and toolchains, with security being used as a fig leaf. It doesn't make malware harder to write but it definitely exposes them to possible legal hot water as it means competitors can't build first-party competitive web browsers for the platform. The only thing that saves them is their own high prices and refusal to try and grab high enough market share.
Possibly, the article has been updated in the last couple of hours, but it now says:
*What about iOS?*
iOS is affected, like all other OSes. There are unique privacy implications to this vulnerability on iOS, as it could be used to bypass some of its stricter privacy protections. For example, keyboard apps are not allowed to access the internet, for privacy reasons. A malicious keyboard app could use this vulnerability to send text that the user types to another malicious app, which could then send it to the internet.
Only if detection requires solving the halting problem. It does not. You just look for certain instructions that normal code shouldn't use. JIT isn't allowed (which means all instructions the program uses can be checked statically), so it should be easy enough.
Marcan said elsewhere in the thread that the executable section on ARM also includes constant pools, so if I understand correctly, you can hide instructions in there and make it intractable for a static analyzer to determine whether they are really instructions or just data.
The real saving grace here is that iOS app binaries are submitted as LLVM IR instead of ARM machine code.
> you can hide instructions in there and make it intractable for a static analyzer to determine whether they are really instructions or just data.
Uh, no? This is very tractable - O(N) in the size of the binary - just check, for every single byte offset in executable memory, whether that offset, if jumped to or continued to from the previous instruction, would decode into a `msr s3_5_c15_c10_1, reg` or `mrs reg, s3_5_c15_c10_1` instruction.
IIUC, the decoding of a M1 ARM instruction doesn't depend on anything other than the instruction pointer, so you only need one pass, and you only need to decode one instruction, since the following instruction will occur at a later byte address.
Edit: unless its executable section isn't read-only, in which case static analyzers can't prove much of anything with any real confidence.
Yes but if program constants are in executable memory, then you can end up with byte sequences that represent numeric values but also happen to decode into the problematic instructions.
For example, this benign line of code would trip a static analyzer looking for `msr s3_5_c15_c10_1, x15` in the way you described:
I said false positives are an issue in the context of a "dumb" real-time kernel-side scan. App Store submission is different. They can afford to have false positives and have a human look at them to see if they look suspicious.
There are 26 fixed bits in the problem instructions, which means a false positive rate of one in 256MiB of uniformly distributed constant data (the false positive rate is, of course, zero for executable code, which is the majority of the text section of a binary). Constant data is not uniformly distributed. So, in practice, I expect this to be a rather rare occurrence.
I just looked at some mac binaries, and it seems movk and constant section loads have largely superseded arm32 style inline constant pools. I still see some data in the text section, but it seems to mostly be offset tables before functions (not sure what it is, might have to do with stack unwinding), none of which seems like it could ever match the instruction encoding for that register. So in practice I don't think any of this will be a problem. It seems this was changed in gcc in 2015 [0], I assume LLVM does the same.
Only on watchOS is Bitcode required (to support the watch's 32-bit to 64-bit transition), on all other platforms it's optional and often turned off, as it makes a variety of things harder, like generating dSYMs for crash reporting.
Oh. Then I don't see how this can be reliably mitigated, other than patching LLVM to avoid writing the `msr s3_5_c15_c10_1` byte sequence in constant pools and then rejecting any binary that contains the byte sequence in an executable section. That seems difficult to get done before someone is able to submit a PoC malicious keyboard to the store, potentially turning this "joke" bug into a real problem. What am I missing?
WOX, except transmuting user code pages to data pages (reading its own code should be fine since it was loaded from a user binary anyhow) or a supervisor-level JIT helper to check and transmute user data pages into user code pages (check that user-mode JITs aren't being naughty).
There's often two kinds of loadable data pages: initialized constants (RO), initialized variables (RW), so some will need to be writable because pesky globals will never seem to die. Neither of should ever have execute or that will cross the streams and end the universe. I'm annoyed when constants or constant pools are loaded into RW data pages because it doesn't make sense.
So, it's basically an honor system. You cannot detect JIT, because there aren't "certain instructions" that aren't allowed - it's just certain registers that programs shouldn't access (but access patterns can be changed in branching code to ensure Apple won't catch it in their sandboxes).
Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions, or running the program in a hypervisor.
So, it's basically an honor system. You cannot detect JIT, because there aren't "certain instructions" that aren't allowed - it's just certain registers that programs shouldn't access (but access patterns can be changed in branching code to ensure Apple won't catch it in their sandboxes).
Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions.
You're missing the point, JIT not allowed means programs may not modify themselves. They're in read+execute only memory and cannot allocate writable+executable memory.
IPhones use A12/13/14 chip and the vulnerability is not confirmed there. Also, the post mentions that if you have two malware apps on your device, they can communicate in many other ways, so I'm not sure what's new here.
iPhones do not use the A1 chip as of quite a few years ago. Besides, the M1 and the A12+ have significant microarchitectural similarities, to the point that the DTK used the A12Z.
Furthermore, the keyboard app extension and the keyboard app are installed as a single package whose components are not supposed to communicate, hence why I brought this up.
Hearing an S-Tier hacker call a fellow S-Tier hacker B-Tier is certainly entertaining, but from my lowly perspective they're still far more capable than 99% of devs I'll ever encounter.
> Wait. Oh no. Some game developer somewhere is going to try to use this as a synchronization primitive, aren't they. Please don't. The world has enough cursed code already. Don't do it. Stop it. Noooooooooooooooo
I tried, but I also talked about it on public IRC before I knew it was a bug and not a feature, so I couldn't do much about that part. ¯\_(ツ)_/¯
This whole site is a good read. A great mix of real information, jokes, and a good send-up of how some security releases appear these days (I understand to a degree the incentives that cause those sites to be as they are, and I don't think they area all bad, but it's still good and useful to poke fun them I think).
This is Mark Kettenis, who has despite comments made jokingly by marcan, been working with a few other OpenBSD developers to bring-up OpenBSD/arm64 on the Apple M1. At least on the Mac Mini the Gigabit Ethernet works, Broadcom Wi-Fi, and work on the internal NVMe storage is progressing.
I'm almost as impressed that m1racles.com was available as I am with people who are good enough at this kind of reverse engineering that they can do it for fun.
I'll give them a couple too: M1GHT, M1CRO*, M1ASMA (M1ASTHMA?), M1D*, M1FFED (with some 0xFFED somewhere?), M1GRATE (for some particularly pesky data extraction hack?), M1LES (for some unit conversion bug that makes the first MacOs-based spaceship crash)
I'm constantly surprised what domains are still available. I've registered many 2/3-letter domains (with 3-4 letter TLDs) in the past year, as well as ones for very common nouns (some also 3 letters), almost always for under $40. Admittedly it's mostly for the newer TLDs, though.
Similar story. I own a half-dozen relatively recently-registered three-letter domains at two-letter ccTLDs. I’m surprised every time one turns out to be available at normal rates.
> Wait. Oh no. Some game developer somewhere is going to try to use this as a synchronization primitive, aren't they. Please don't. The world has enough cursed code already. Don't do it. Stop it. Noooooooooooooooo
You can already communicate between apps without going through the kernel by using shared memory - with a much higher bandwidth. And even just the regular write/sendmsg/etc calls are probably more efficient despite going through the kernel due to being able to carry much more bytes.
This was really just a good joke touching how the game industry in the past used non-common hardware features for optimization purposes.
Synchronization primitives AFAIK don't need to transfer huge amounts of data in a short time. One bit for every "okay" signal would suffice. At the given speed you can perform 8 million syncs per second between two threads.
I checked this out to find out just... information I guess? I don’t own an M1 but plan to get an ARM Mac when I can budget it. Good to be aware of the landscape.
I was not expecting such an entertaining FAQ. Good job, very informative, very amusing!
Why would you spend money on crappy and locked down hardware that can't be fixed. A computer that you don't own but basically rent. Get a Lenovo Thinkpad and join the light side, you'll be amazed!
Whatever your opinions on Apples policies and behavior it's just ignorant to call the M1 'crappy' when it absolutely annihilates any processor in its class and doesn't at all get embarrassed when compared to high end desktop CPUs.
CPUs are a chump's game, and it's no surprise that Apple, the company with sole access to next-generation silicon, was able to reach last-generation performance on a laptop chip. Nobody freaked out when AMD's Ryzen 7 4800u hit 4ghz over 8 cores, I don't see a reason why I should freak out now when Apple's doing it with 10 less watts.
Plus, that's only the CPU side of things. The M1's GPU is annihilated by most GPUs in it's class... from 2014. Fast forwards to 2021, and it's graphics performance is honestly pathetic. Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it.
So yeah, I think there are a lot of workloads where the M1 is a pretty crappy CPU. Unless your workload is CPU-bound, there's not really much of a reason to own one. And even still, the M1 doesn't guarantee compatibility with legacy software. It doesn't have a functional hypervisor, and it has lower IO bandwidth than most CPUs from a decade ago. Not really something I'd consider viable as a "daily driver", at least for my workload.
"CPUs are a chump's game" - what? High performance CPUs which nevertheless use very little power are extremely difficult to design.
"AMD's Ryzen 7 4800u hit 4ghz over 8 cores" - It doesn't. AMD specifies it as having 1.8 GHz base clock, 4.2 GHz max boost clock. AMD's cores use ~15W each at max frequency. Since the 4800U's configurable TDP range is 10W to 25W for the whole chip, there is no way that all 8 cores run at 4.2 GHz simultaneously for any substantial period of time. In fact, running even one core in its max performance state probably isn't sustainable in a lot of systems which opt to use the 4800U's default 15W TDP configuration.
On the other side of things, Apple M1 performance cores use ~6W each at max frequency. It is actually possible for all four to run at full performance indefinitely with the whole chip using about 25W, provided there is little GPU load.
"Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it." - Say what? The only direct comparison I've been able to find is 4700U vs M1, in Anandtech's M1 article, and it shows the M1 GPU as 2.6x faster in GFXBench 5.0 Aztec Ruins 1080p offscreen and 2.5x faster in 1440p high.
Granted, the 4700U GPU is a bit slower than the 4800U GPU, but not by a factor of 2 or more.
This isn't an unexpected result given that M1's GPU offers ~2.6 single precision TFLOPs while the 4800's is ~1.8 TFLOPs.
Literally everything you wrote about M1 being bad is wrongheaded in the extreme, LOL.
Not being viable as your daily driver does not make it crappy.
But you heard it here first guys, building CPUs is a chumps game. And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success. Fine I guess, but I don't agree.
Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.
> Not being viable as your daily driver does not make it crappy.
What does it make it then? Some unicorn device that I'm unworthy of? Is there something wrong with my workload, or Apple's? Apple is marketing the M1 to computer users. I'm a computer user, and I cannot use it as part of my workflow, I have every right to voice that concern to Apple.
> And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success.
You must be late to the party, ARM has been around for years. Apple's power efficiency is about on-par with what should be expected from a 5nm ARM chip with a gimped GPU. What is there to celebrate, that Apple had the initiative to buy out the entirety of the 5nm node at TSCM, plunging the entire world into a semiconductor shortage unlike anything ever seen before? Yeah, great job Apple. I think it was worth disrupting the global economy so you could ship your supercharged Raspberry Pi /s
> Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.
CPU-bound workloads absolutely exist, but who's running them on a Mac? Hell, more importantly, who's running them on ARM? x86 still has a better value proposition than ARM in the datacenter/server market, and most local workloads are hardware-accelerated these days. I really don't know what to tell you.
Yeah, after two failed Macbooks from 2016 because of their ssds I can just say stay away from apple hardware until they reverse course on storage devices.
I've been stumbling through writing a pile of secure software development lifecycle management and disclosure practices documentation all evening, and desperately needed a bit of levity. This post delivered. Thank you.
Also, I am still not sure if this is a disclosure, performance art, or extremely dry comedy, but it certainly covered all the bases.
> Newton OS users: I guess those are technically Apple Silicon but...
The Newton wasn't really Apple Silicon:
The OMP/MP100/MP110/MP120/MP130 ran an ARM610.
The eMate300 ran an ARM710.
The MP2000/MP2100 ran a DEC StrongARM SA-110 CPU.
None of which were designed or manufactured by Apple.
I did say "designed or manufactured" ... but I'll concede the point that they had some ownership of the 610/710, at least.
On 27 Nov 1990, ARM was formed with Apple owning 43% alongside Acorn (the designer), and VLSI Technology (the manufacturer).
Funny thing: I've found two articles that claim two different purchase prices for that 43%: one $3M [0] and the other $1.5B [1]. That's quite a difference!
This is the best thing I've seen on the internet for a long time. Hopefully some people (tech journalists and twitter folks) will "fall for it" and learn along the way...
I suppose you could use it to create a "covert suite" of apps for the M1 iPad that talk to each other where they aren't supposed to. Sharing permission X from app 1 with app 2 that isn't supposed to have permission X, etc.
If you put this in your app directly, Apple can just find it and reject it at submission time. If JIT were an option, that wouldn't be enough, because the app could do it at runtime. Since it isn't, there is no way to "hide" something like this from the App Store static analyzer.
Hrm. It seems like inline ASM allows for passing the register name dynamically, though I can't tell for sure. If that's the case, it seems like it would be hard to tell ahead of time, other than "app calls msr/mrs".
> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.
> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.
> If you've read all the way to here, congratulations! You're one of the rare people who doesn't just retweet based on the page title :-)
That's reassuring to read. I opened the page, read a bit of it, pressed play on the video and scrubbed around a bit, got irritated and closed the tab. I figured if it mattered I would wait until better coverage came out.
> It violates the OS security model. You're not supposed to be able to send data from one process to another secretly.
I'd argue this is not the case. What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?
Without this vulnerability, there would still be a million ways to send data between cooperative processes running as different users on Mac OS X.
For example, a process could start subprocesses at a deterministic rate and the other end of the covert link observes how fast the pid counter is going up.
This is a non-vulnerability, because it targets something there was no effort to protect.
It's not really a vulnerability as the FAQ states, but it violates the operating system's own application isolation policies. If you don't want your Facebook app to talk to your Instagram app (e.g. different accounts for different purposes), you should be able, as a user, to block communication between the two. This is a backdoor to circumvent that.
I mean not that anyone has a native Facebook or Instagram app on their device, but just to name an example.
> I'd argue this is not the case. What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?
All of them.
A piece of software able to read my mail but not use the Internet could credibly be a tool to help me index and find my email using search keywords. It promises to not use the Internet, and indeed nm/objdump shows no use of networking tools.
Another piece of software able to monitor RSS feeds I am interested in and alert me to their changes is expected to use the Internet, but not the filesystem, and surely not the part of the filesystem that contains my email. I can use strace/dtruss to verify it never touches the filesystem, and use chroot/jail to keep it honest.
This being said, I agree that "mainstream operating systems" (meaning Windows and macOS, but not perhaps iOS) don't do enough and it might be impossible for them without changing user expectations[1], but I think they're trying. Web browsers disabled high resolution timers specifically to protect against this sort of thing. iOS doesn't permit arbitrary background tasks from running to protect battery and ostensibly privacy. But they could all do better.
[1]: For example, for me high CPU load is a red flag - a program that does this to me regularly gets put into a VM so that I can mess with its time-- Zoom now loses about a minute every three if it's not focused which is annoying because it messes with the calendar view, but I'm pretty sure it can't do anything else I don't want it to. Who should do this work? My operating system? Zoom? Neither will do it if users don't demand it.
So my point as it applies to this example: the email indexing program could communicate towards the rss program using cpu or storage load spikes. And no widely used multitasking OS tries to prevent this.
> What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?
The answer will depend on whether you consider Multi-Level Security (MLS) https://en.wikipedia.org/wiki/Multilevel_security "mainstream". It's certainly a well-established approach if only in an academic sense, and the conflux of new use cases (such as secretive, proprietary "apps" being expected to manage sensitive user data) and increasingly-hard-to-mitigate info disclosure vulnerabilities has made it more relevant than ever.
There's two bits of a CPU's register that are shared between all of its processes and that any process can write to. The result is that two sandboxed processes that are supposed to be totally isolated from each other can use this to communicate anyway. One example of how this can be exploited is cross-app tracking: if you told one app your name and another your location, they could secretly communicate with each other so both apps end up with both pieces of information.
Because the OS has no say. A running program issues an assembly instruction to the CPU to read or write this register, and the CPU complies.
For the OS to have a say, the CPU would need to provide a way where the OS tells it (usually by setting certain values in other registers) that the CPU should not allow access, at least under certain circumstances.
The article actually does go into certain situations where the access is more restricted (search for "VHE"), but also in how that does not really apply here.
Yes, you can introduce new code but the kernel should also watch for that (JIT compilation etc.) and check the resulting code. It's quite involved, and the whole process looks more like a sandbox or emulator, but it's possible.
> originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel.
There is no indication that the M1 has updatable microcode, nor any other features that might allow such mitigation. (If it did, Apple would've fixed it; I did give them a 90 day disclosure warning and they're not lazy about fixing actual fixable bugs.)
There's more specific answers here, but in general the answer to this question is "only partly". The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code. During your time slice, you can do anything you want to the CPU, and in general only interrupts (timer interrupts, hardware interrupts, page faults, etc) will cause the kernel to get involved again. There are some specific features that CPU designers add to give extra control to the kernel, but that's a feature of the CPU and it's only when the CPU has explicitly added that type of control.
> The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code.
Somewhat critically, it will also drop down to EL0.
Registers aren't resources you access through syscalls, there's no way for the kernel to control them unless you're running under virtualization or the CPU architecture specifically allows access control for the register. (As the site notes, virtualization allows controlling access to this register)
Can kernel scan each page it maps as executable and return an error if it finds instructions interacting with the 'bad' register? Assuming the kernel requires executable pages to be read-only (W^X), this may even be doable (but probably very very slow).
It does require that, but it allows flipping between RX and RW at will (for JITs), and the M1 actually has proprietary features to allow userspace to do this without involving the kernel, so the kernel couldn't re-scan when those flips happen (plus it would kill performance anyway).
Plus, as I said above, this is prone to false positives anyway because the executable section on ARM also includes constant pools.
Ah, yes, I forgot about that. So indeed there is no non-racy hook point for the kernel to do such a check, even if it made sense and the RX/RW switch went through the kernel, which it doesn't.
> Because pthread_jit_write_protect_np changes only the current thread’s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region.
The kernel doesn't get a say in what instructions a userspace program can run, other than what the CPU is designed to allow it to control. The bug is the CPU designers forgot to allow it to control this one.
Let's say someone submits a malicious keyboard with the bad instructions hidden in a constant pool.
Apple can't just scan for a bad byte sequence in executable pages because it could also represent legitimate constants used by the program. (not sure if this part is correct?)
If so, doesn't that make detection via static analysis infeasible unless LLVM is patched to avoid writing bad byte sequences in constant pools? Otherwise they have to risk rejecting some small number of non-malicious binaries, which might be OK, depending on the likelihood of it happening.
I believe that Rice's theorem is about computability, not about whether or not it is possible to validate which CPU instructions a program can contain.
With certain restrictions, it is possible to do this: Google Native Client [1] has a verifier which checks that programs it executed did not jump into the middle of other instructions, forbade run-time code generation inside of such programs, etc.
(What other kinds of instructions? Genuinely asking.)
I don't think Rice's Theorem applies here. As a counterexample: On a hypothetical CPU where all instructions have fixed width (e.g. 32 bits), if accessing a register requires the instruction to have, say, the 10th bit set, and all other instructions don't, and if there is no way to generate new instructions (e.g. the CPU only allows execution from ROM), then it is trivial to check whether there is any instruction in ROM that has bit 10 set.
The next part I'm less sure how to state it rigorously (I'm not in the field): In our hypothetical CPU, I think disallowing that instruction either lets you remain being Turing Complete or not. In the former case, it's still the case that you can compute everything a Turing Machine can.
You'd have to add one extra condition to your hypothetical CPU: that it can't execute unaligned instructions. Given that, then yes, that lets you bypass Rice's theorem, even though it is indeed still Turing-complete.
But the M1 does have a way to "generate new instructions" (i.e., JIT), so that counterexample doesn't hold for it.
Yes, indeed, I should have stated "cannot execute unaligned instructions". Or have said 8 bit instead, then it would be immediately obvious what I mean. (You cannot jump into the middle of a byte because you cannot even address it.)
But I wanted to show how Rice's Theorem does not generally apply here. You can make up other examples: A register that needs an instruction with a length of 1000 bytes, yet the ROM only has 512 bytes space etc...
As for JIT, also correct (hence my condition), though that's also a property of the OS and not just the M1 (and on iOS for example, it is far more restricted what code is allowed to do JIT, as was stated in the thread already).
With the way Apple allows implementation of JIT on the M1 (with their custom MAP_JIT flag and pthread_jit_write_protect_np) it is actually possible to do this analysis even with JIT code. Since it enforces W^X (i.e. pages cannot be writable or executable at the same time) it gives the OS opportunity to inspect the code synchronously before it is rendered executable. Rosetta 2’s JIT support already relies on this kind of inspection to do translation of JIT apps.
It does when running native ARM code (but not x86 code), but AFAIK nothing stops Apple from changing this to being kernel mediated by updating libSystem in the ARM case as well. Of course I doubt they would take the performance hit just to get rid of a this issue.
1) the program does not contain an instruction that touches s3_5_c15_c10_1
2) the program contains an instruction that touches s3_5_c15_c10_1, but never executes that instruction
3) the program contains an instruction that touches s3_5_c15_c10_1, and uses it
Rice's theorem means we cannot tell whether a program will touch the register at runtime (as that's a dynamic property of the program). But that's because we cannot tell case 2 from case 3. It's perfectly decidable whether a program is in case 1 (as that's a static property of the program).
Any sound static analysis must have false positives -- but those are exactly the programs in case 2. It doesn't mean we end up blocking other kinds of instructions.
Sounds like this is by design and not really a newly discovered vulnerability. Maybe more of a discovery of deceptive advertising/documentation? Which is to say that Apple's engineers are reading this as non-news.
There is a small bit of memory that all programs on your computer share that isn’t protected in any way. If two misbehaving programs on your computer wanted to communicate in a really really secret way, they could use it.
If you don’t have misbehaving programs on your computer that want to secretly communicate than it doesn’t matter.
How about randomising/reset these bits from kernel whenever there is a syscall? Not a great workaround but this should limit the effectiveness of leaking. Yeah, there will be tiny perf hit due to extra register read and write.
> Wait, didn't you say on Twitter that this could be mitigated really easily?
> Yeah, but originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel. Other than running in EL1/0 with TGE=0 (i.e. inside a VM guest), there's no known way to block it.
In other words: this register is shared between cores, so if the two processes are running simultaneously on different cores, they can communicate by reading & writing directly to & from this register, without any operating system interaction.
Unfortunately, you can use this to send thousands of bits between syscalls, so the simplest error correction would fix that, with very little effort or overhead.
The demo already uses error correction (I'm not sure exactly what causes the errors, but I'm guessing the processes sometimes end up briefly scheduled on the other core cluster)
It seems like there's a partial mitigation available to the OS here. When scheduling a task, write a random value to the two user-writable bits. When the task is unscheduled, if the bits do not match, terminate the task. This effectively makes writing to the register an OS-enforced illegal operation with a 75% chance of being caught within 10 ms if the channel is being used at full bandwidth. (The writer can reduce the chance of it being caught proportional to reduced use of channel bandwidth by resetting it to the OS-chosen value after a bit is transmitted.) The reader can't be detected this way, but since the channel requires cooperation between the writer and reader, catching either is fine. Not a perfect fix, but would help, and would also give visibility into whether this is used in the wild -- e.g., report to Apple via crash reporting mechanism if a process is terminated this way, which would allow prompt discovery of app store apps that abuse the channel.
I don't think so, no. If it has microcode it's probably burned into sequencer tables, not updatable. I was kind of hoping Apple would have some chicken bit register up their sleeve as a last resource fix (e.g. "trap on instruction encodings matching this mask"), but given that they seem to have no useful mitigation for it, I don't think they do.
Is it possible Apple have the silicon functionality to fix this, but have decided it isn't worth fixing?
After all, process isolation between cooperating processes is nearly impossible to do. If Apple close this loophole, there will be other lower bandwidth side channels like spinning up the fan in Morse code and the other process notices the clock speed scaling up and down...
They're using zero so far [0], and until they need it for something else it wouldn't make sense not to use it for this. The CPU tunables aren't fuses or anything, the OS configures them (m1n1 in our case)
It's an implementation-defined register, which means it's up to Apple to define it. We have no idea what it does; we haven't observed any visible effects from flipping those bits. Given that it's per-cluster, we can infer that it has something to do with cluster-specific logic. Perhaps memory interface or power control.
There are hundreds of Apple implementation-defined registers; we're documenting them as we learn more about them [0] [1] [2]
I googled it for you and err, came up blank, there's just two code references in some ASM code, the rest points to this resource. Weird, I would have thought things like this would have public documentation.
In all seriousness, I wonder what the actual issue is.
Could anyone comment as to the implications of only supporting a Type 2 hypervisor that is (as said on the site) "in violation of the ARMv8 specification"?
The implications are just that OSes that assume otherwise won't run; Linux used to work (by chance) until a patch that just about coincided with our project went in that used the non-VHE ("type 1") mode by default, which broke it, and then we had to add an explicit workaround for the M1.
It's just a very unfortunate coincidence that precisely that support would allow this bug to be trivially mitigated on Linux. (Wouldn't help macOS, as they'd have to implement this from scratch anyway; it's just that existing OSes that support this mode could use it).
The actual issue is just what I described: the hardware implementation of this register neglects to check for and reject accesses from EL0 (userspace). It's a chip logic design flaw. I don't know exactly where it is (whether in the core/instruction decoder, or in the cluster component that actually holds the register, depending on where they do access controls), but either way that's what the problem is.
You can still solve the issue in VHE mode, since you can still implement a Type 1 hypervisor in VHE mode. It's just that, well, nobody does that, because why would they? That's what non-VHE mode is for.
So it's not that not following the spec prevents the workaround, it's just that had they followed the spec it would just take a single kernel command line argument (to force non-VHE mode) to fix this in Linux, while instead, now we'd have to make major changes to KVM to make the non-VHE code actually work with VHE, and really nobody wants to do that just to mitigate this silly thing.
Had this been a more dangerous flaw (e.g. with DoS or worse consequences), OSes would be scrambling to make major reworks right now to mitigate it in that way. macOS would have to turn its entire hypervisor design on its head. Possible, but not fun.
I had to use that one for this demo for obvious reasons, but if I'm allowed the shameless plug, I actually make my own music in the same genre (Touhou rearrangements) [0]. I'm actually very much looking forward to moving my music production to M1 and seeing what the real-time performance is like, though that will depend on us having at least a usable Rosetta-like thing on Linux to run x86 Windows apps (which will allow me to bridge the few x86 Windows plug-ins I rely on with yabridge, as I do today on x86) :-)
That's awesome! I'm definitely thinking about getting an M1 for realtime keys, though I'm all set up in Logic/MainStage so I'll probably stick with macOS for now :)
On the Linux side, would qemu user-mode emulation work for that (maybe with a patch to take advantage of the M1's switchable-memory-order thing)?
I think qemu would work fine, but it's pretty slow, so I'm hoping it can either be improved or another project more focused on this use case can do it better.
If nothing else though, I plan to expose at least the TSO feature of the M1 so qemu can reduce the overhead of its memory accesses.
It seems like a single bit available to all apps but that no one is really using now. I wonder if a easy software mitigation could be just polluting it intentionally.
Thankfully, Apple should be able to statically analyze apps to look for this on App Store submission, as the App Store does not allow dynamic code (JITs).
At the core, DO NOT TRACK prevents Apps having access to the Advertising Identifier. So different Apps cannot aggregate their analytics data about the users.
This vulnerability enables different Apps to communicate a super cookie for cross-app tracking. A possible exploit would be to implement this feature in an AD SDK to be used by different Apps.
Oooo! Depending on your taste you're in for either a very boring movie or the experience of a lifetime. Complete with an associated rabbit hole of mystery surrounding the director: https://en.wikipedia.org/wiki/The_Room
> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.
> If you've read all the way to here, congratulations! You're one of the rare people who doesn't just retweet based on the page title :-)
> If you already have malware on your computer, that malware can communicate with other malware on your computer in an unexpected way.
> Chances are it could communicate in plenty of expected ways anyway.
> That doesn't sound too bad.
> Honestly, I would expect advertising companies to try to abuse this kind of thing for cross-app tracking, more than criminals. Pretty sure Apple could catch them if they tried, though (for App Store apps).
>Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.
>If you've read all the way to here, congratulations! You're one of the rare people who doesn't just retweet based on the page title :-)
I was actually planning to pick up a new Mac today and I’ve been on the fence over M1 or Intel for months. My biggest con for the M1 is how proprietary it is. With Intel, you know it’s been battle-tested for years. Things like this (at best, is an oversight, at worst, it’s the tip of the iceberg) make the decision a little bit easier…
> With Intel, you know it’s been battle-tested for years.
This is quite an interesting statement to make in the wake of Spectre and Meltdown (and vulnerabilities in that class being discovered what seems like every couple of months or so).
It's a bit amusing to me how much we overlook clear, even huge flaws in products and processes because they're old - sorry, Battle-Tested™ - while zeroing in on lesser flaws in newer ones.
Don't know why you've been downvoted. It all depends on the software that you use everyday (job or whatever).
If the software does not run on an M1 Mac or requires involved workarounds to get it 'working' and it breaks on any update or it prevents you from doing your work, then don't waste your time and just skip it for now.
The content of the website basically makes that point itself. It's mocking the whole concept of these vulnerability websites, while also presenting a real (but not very impactful) vulnerability.
Wait, did he just, as exploit proof of concept, infiltrated some catch music&&video clip ? With life rendering on same CPU ? :>
Anyway - Apple did it again ! In shiny, new hardware for "creative" ppl introduced hardware backdors... Like FireWire and Thunderbird. Seriously, there must be some market for spying on writers and painters. Or anyone who do thing and is rich...
If this can be reproduced on the iPhone, it can lead to 3rd party keyboards exfiltrating data. By default, keyboard app extensions are sandboxed away from their owning applications [0], but they may communicate with the app over this channel and leak data. It's not as easy as I describe because the app would have to be alive and scheduled on the same cluster, but it's within the realm of possibility.
[0]: https://developer.apple.com/library/archive/documentation/Ge...