M1racles: An Apple M1 covert channel vulnerability

umanghere · on May 26, 2021

While Marcan has written in a very entertaining fashion, there is perhaps one application of this vulnerability that wasn't considered.

If this can be reproduced on the iPhone, it can lead to 3rd party keyboards exfiltrating data. By default, keyboard app extensions are sandboxed away from their owning applications [0], but they may communicate with the app over this channel and leak data. It's not as easy as I describe because the app would have to be alive and scheduled on the same cluster, but it's within the realm of possibility.

[0]: https://developer.apple.com/library/archive/documentation/Ge...

GuB-42 · on May 26, 2021

This exact use case is touched on in the article.

Here is the follow-up

> However, since iOS apps distributed through the App Store are not allowed to build code at runtime (JIT), Apple can automatically scan them at submission time and reliably detect any attempts to exploit this vulnerability using static analysis (which they already use). We do not have further information on whether Apple is planning to deploy these checks (or whether they have already done so), but they are aware of the potential issue and it would be reasonable to expect they will. It is even possible that the existing automated analysis already rejects any attempts to use system registers directly.

marcan_42 · on May 26, 2021

Full disclosure: I added this after the parent comment (and others) mentioned this case. :)

ncr100 · on May 26, 2021

Thanks - yeah that is a real flaw.

Obfuscated malware where the malicious part is not obvious; it's distributed and requires a separate process/image.

Curious to see if some smart Apple-ers can invent a fix for this, though it seems like "no way" given the vulnerability.

marcan_42 · on May 27, 2021

As I mentioned below and on the disclosure page, it's trivial for Apple to reliably detect this in apps submitted to the App Store and reject them, so I'm not worried. There's no such thing as "obfuscated" malware in the traditional sense on the App Store. You can obfuscate the code flow all you want, but all executable code has to be signed to run on iDevices. If you try to use this register, the instruction will be there for all to see. You can't use self-modifying code or packers on iOS.

marcan_42 · on May 26, 2021

I expect Apple to include checks for this in their App Store static analyzer, if they aren't already rejecting sysreg instructions, which mitigates the issue. Obviously JIT isn't allowed in the App Store, so this should be an effective strategy.

rmst · on May 26, 2021

How convenient for Apple. Now they finally have a good argument to keep forbidding JIT compilation and side-loading.

inetknght · on May 26, 2021

> Now they finally have a good argument to keep forbidding JIT compilation and side-loading.

The argument was there the entire time. Some people just buried their heads in the sand though.

mike_hearn · on May 27, 2021

JITC is irrelevant actually. This is not an argument for blocking it.

Firstly, no normal JITC will ever emit instructions that access undocumented system registers. Any JITC that comes from a known trusted source (and they're expensive to develop, so they basically all do) would be signed/whitelisted already and not be a threat anyway.

So what about new/unrecognised programs or dylibs that request JITC access? Well, Apple already insist on creating many categories of disallowed thing in the app store that can't be detected via static analysis. For example, they disallow changing the behaviour of the app after it is released via downloaded data files, which is both very vague and impossible to enforce statically. So it doesn't fundamentally change the nature of things.

But what if you insist on being able to specifically fix your own obscure CPU bugs via static analysis? Well, then XNU can just implement the following strategy:

1. If a dylib requests a JITC entitlement, and the Mach-O CD Hash is on a whitelist of "known legit" compilers, allow.

2. Otherwise, require pages to be W^X. So the JITC requests some writeable pages, fills them with code, and then requests the kernel to make the pages executable. At that point XNU suspends the process and scans the requested pages for illegal instruction sequences. The pages are hot in the cache anyway and the checks are simple, so it's no big deal. If the static checks pass, the page is flipped to be executable but not writeable and the app can proceed.

Apple's ban on JITC has never really made much technical sense to me. It feels like a way to save costs on program static analysis investment and to try and force developers to use Apple's own languages and toolchains, with security being used as a fig leaf. It doesn't make malware harder to write but it definitely exposes them to possible legal hot water as it means competitors can't build first-party competitive web browsers for the platform. The only thing that saves them is their own high prices and refusal to try and grab high enough market share.

Angostura · on May 26, 2021

Possibly, the article has been updated in the last couple of hours, but it now says:

*What about iOS?*

iOS is affected, like all other OSes. There are unique privacy implications to this vulnerability on iOS, as it could be used to bypass some of its stricter privacy protections. For example, keyboard apps are not allowed to access the internet, for privacy reasons. A malicious keyboard app could use this vulnerability to send text that the user types to another malicious app, which could then send it to the internet.

m3kw9 · on May 26, 2021

There would be code signatures that can detect this use by apple?

api · on May 26, 2021

Detection is very hard if the developer employs very clever obfuscation. See: halting problem.

slashdev · on May 26, 2021

Only if detection requires solving the halting problem. It does not. You just look for certain instructions that normal code shouldn't use. JIT isn't allowed (which means all instructions the program uses can be checked statically), so it should be easy enough.

SheinhardtWigCo · on May 26, 2021

Marcan said elsewhere in the thread that the executable section on ARM also includes constant pools, so if I understand correctly, you can hide instructions in there and make it intractable for a static analyzer to determine whether they are really instructions or just data.

The real saving grace here is that iOS app binaries are submitted as LLVM IR instead of ARM machine code.

a1369209993 · on May 26, 2021

> you can hide instructions in there and make it intractable for a static analyzer to determine whether they are really instructions or just data.

Uh, no? This is very tractable - O(N) in the size of the binary - just check, for every single byte offset in executable memory, whether that offset, if jumped to or continued to from the previous instruction, would decode into a `msr s3_5_c15_c10_1, reg` or `mrs reg, s3_5_c15_c10_1` instruction.

IIUC, the decoding of a M1 ARM instruction doesn't depend on anything other than the instruction pointer, so you only need one pass, and you only need to decode one instruction, since the following instruction will occur at a later byte address.

Edit: unless its executable section isn't read-only, in which case static analyzers can't prove much of anything with any real confidence.

SheinhardtWigCo · on May 26, 2021

Yes but if program constants are in executable memory, then you can end up with byte sequences that represent numeric values but also happen to decode into the problematic instructions.

For example, this benign line of code would trip a static analyzer looking for `msr s3_5_c15_c10_1, x15` in the way you described:

  uint32_t x = 0xd51dfa2f;

marcan_42 · on May 27, 2021

I said false positives are an issue in the context of a "dumb" real-time kernel-side scan. App Store submission is different. They can afford to have false positives and have a human look at them to see if they look suspicious.

There are 26 fixed bits in the problem instructions, which means a false positive rate of one in 256MiB of uniformly distributed constant data (the false positive rate is, of course, zero for executable code, which is the majority of the text section of a binary). Constant data is not uniformly distributed. So, in practice, I expect this to be a rather rare occurrence.

I just looked at some mac binaries, and it seems movk and constant section loads have largely superseded arm32 style inline constant pools. I still see some data in the text section, but it seems to mostly be offset tables before functions (not sure what it is, might have to do with stack unwinding), none of which seems like it could ever match the instruction encoding for that register. So in practice I don't think any of this will be a problem. It seems this was changed in gcc in 2015 [0], I assume LLVM does the same.

[0] https://gcc.gnu.org/pipermail/gcc-patches/2015-November/4334...

SheinhardtWigCo · on May 27, 2021

That makes sense. I'm glad to be wrong :-)

jshier · on May 26, 2021

Only on watchOS is Bitcode required (to support the watch's 32-bit to 64-bit transition), on all other platforms it's optional and often turned off, as it makes a variety of things harder, like generating dSYMs for crash reporting.

SheinhardtWigCo · on May 26, 2021

Oh. Then I don't see how this can be reliably mitigated, other than patching LLVM to avoid writing the `msr s3_5_c15_c10_1` byte sequence in constant pools and then rejecting any binary that contains the byte sequence in an executable section. That seems difficult to get done before someone is able to submit a PoC malicious keyboard to the store, potentially turning this "joke" bug into a real problem. What am I missing?

slashdev · on May 26, 2021

That's problematic. Allowing the constant pools in executable memory is a bad idea.

Data segments should go in read only memory with no write or execute permission.

doggodaddo78 · on May 27, 2021

WOX, except transmuting user code pages to data pages (reading its own code should be fine since it was loaded from a user binary anyhow) or a supervisor-level JIT helper to check and transmute user data pages into user code pages (check that user-mode JITs aren't being naughty).

There's often two kinds of loadable data pages: initialized constants (RO), initialized variables (RW), so some will need to be writable because pesky globals will never seem to die. Neither of should ever have execute or that will cross the streams and end the universe. I'm annoyed when constants or constant pools are loaded into RW data pages because it doesn't make sense.

cozzyd · on May 27, 2021

Does the IR help if you're obfuscating instructions as static data?

sangnoir · on May 26, 2021

> JIT isn't allowed

So, it's basically an honor system. You cannot detect JIT, because there aren't "certain instructions" that aren't allowed - it's just certain registers that programs shouldn't access (but access patterns can be changed in branching code to ensure Apple won't catch it in their sandboxes).

Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions, or running the program in a hypervisor.

sangnoir · on May 26, 2021

> JIT isn't allowed

So, it's basically an honor system. You cannot detect JIT, because there aren't "certain instructions" that aren't allowed - it's just certain registers that programs shouldn't access (but access patterns can be changed in branching code to ensure Apple won't catch it in their sandboxes).

Besides, even if certain instructions are not allowed, a program can modify itself, it's hard to detect if a program modifies itself without executing the program under specific conditions.

slashdev · on May 26, 2021

You're missing the point, JIT not allowed means programs may not modify themselves. They're in read+execute only memory and cannot allocate writable+executable memory.

gostsamo · on May 26, 2021

IPhones use A12/13/14 chip and the vulnerability is not confirmed there. Also, the post mentions that if you have two malware apps on your device, they can communicate in many other ways, so I'm not sure what's new here.

Edit: fixed name of the chip.

saagarjha · on May 26, 2021

I just tested it on the A14 and it seemed to work there.

withinboredom · on May 26, 2021

I wonder if it would have passed Apple's review process?

saagarjha · on May 26, 2021

At this point I would hope that App Store ingestion would filter for this.

umanghere · on May 26, 2021

iPhones do not use the A1 chip as of quite a few years ago. Besides, the M1 and the A12+ have significant microarchitectural similarities, to the point that the DTK used the A12Z.

Furthermore, the keyboard app extension and the keyboard app are installed as a single package whose components are not supposed to communicate, hence why I brought this up.

dfox · on May 26, 2021

I believe that only significant difference between A14 and M1 (apart from package) is number of cores.

gostsamo · on May 26, 2021

The only 1 in the name of the chip is typo. The rest I'm still not sure if it is significant.

Closi · on May 26, 2021

iPad contains an m1 chip so that might be a similar better example.

denysvitali · on May 26, 2021

> I came here from a news site and they didn't tell me any of this at all!

>

> Then perhaps you should stop reading that news site, just like they stopped reading this site after the first 2 paragraphs.

Marcan is a genius, in every aspect. He is on my top list of people I could read all day long without getting annoyed.

Pretty much everything he posts on Twitter is interesting and curious. I'm a huge fan!

The other person I have similar feelings for is Geohot.

These guys are really, really smart.

soedirgo · on May 26, 2021

> The other person I have similar feelings for is Geohot.

I don't know about that... https://news.ycombinator.com/item?id=25679907

yannoninator · on May 26, 2021

> But it is no surprise that George Hotz, working alone as team tomcr00se, would rise to the top of the CTF.

So then this has to be fake then, obviously. Apparently George Hotz (geohot/tomcr00se) won a few CTFs single handedly [0][1].

I'm sure that marcan is also genius as well, unfortunately though Hotz is somehow still able to stay relevant, continuously.

[0] https://www.koscom.co.kr/eng/bbs/B0000043/view.do?nttId=1040...

[1] https://www.prnewswire.com/news-releases/nyu-poly-cyber-secu...

comboy · on May 26, 2021

You gave denysvitali some serious cognitive dissonance.

denysvitali · on May 26, 2021

Indeed. Lol

The only difference between the twos is that Geohot does a lot of thing for the fame (or at least it seems so), and marcan does that only for fun.

I'm okay with both tbh, if you are at this level you deserve some fame

jonny_eh · on May 26, 2021

Hearing an S-Tier hacker call a fellow S-Tier hacker B-Tier is certainly entertaining, but from my lowly perspective they're still far more capable than 99% of devs I'll ever encounter.

throwkeep · on May 26, 2021

Yes, he's a great self promoter and a genius level engineer. You can watch his livestreams to see both for yourself.

xmprt · on May 26, 2021

I also really liked this line

> Wait. Oh no. Some game developer somewhere is going to try to use this as a synchronization primitive, aren't they. Please don't. The world has enough cursed code already. Don't do it. Stop it. Noooooooooooooooo

ohazi · on May 26, 2021

m1racle-mutex:

https://twitter.com/LunaFoxgirlVT/status/1397441284487401478...

inductive_magic · on May 26, 2021

>The other person I have similar feelings for is Geohot. These guys are really, really smart.

Its ok George, we love you and you know it

tardyp · on May 26, 2021

Ah! I'm not sure he would really like the comparison with geohot.

kbenson · on May 26, 2021

Was this responsibly disclosed?

I tried, but I also talked about it on public IRC before I knew it was a bug and not a feature, so I couldn't do much about that part. ¯\_(ツ)_/¯

This whole site is a good read. A great mix of real information, jokes, and a good send-up of how some security releases appear these days (I understand to a degree the incentives that cause those sites to be as they are, and I don't think they area all bad, but it's still good and useful to poke fun them I think).

notaplumber · on May 26, 2021

> "OpenBSD users: Hi Mark!"

This is Mark Kettenis, who has despite comments made jokingly by marcan, been working with a few other OpenBSD developers to bring-up OpenBSD/arm64 on the Apple M1. At least on the Mac Mini the Gigabit Ethernet works, Broadcom Wi-Fi, and work on the internal NVMe storage is progressing.

There was an early teaser dmesg posted in Feburary showing OpenBSD booting multi-user (on bare metal): https://marc.info/?l=openbsd-arm&m=161386122115249&w=2

Mark has also been adding support for the M1 to the U-Boot project, which will not only benefit OpenBSD, but also Asahi Linux.

Another OpenBSD developer posted these screenshots and videos on Twitter.

https://twitter.com/bluerise/status/1359644736483655683

https://twitter.com/bluerise/status/1354216838406823936

culturestate · on May 26, 2021

I'm almost as impressed that m1racles.com was available as I am with people who are good enough at this kind of reverse engineering that they can do it for fun.

bombcar · on May 26, 2021

Quick register all words that start with MI for future vulnerabilities. I’m waiting for M1RAGE and M1TIGATE myself.

culturestate · on May 26, 2021

You joke but somewhere there's a domainer doing this right now.

cmehdy · on May 26, 2021

I'll give them a couple too: M1GHT, M1CRO*, M1ASMA (M1ASTHMA?), M1D*, M1FFED (with some 0xFFED somewhere?), M1GRATE (for some particularly pesky data extraction hack?), M1LES (for some unit conversion bug that makes the first MacOs-based spaceship crash)

alexeldeib · on May 26, 2021

On a hunch I tried "myasthma.com" to see if that spelling were free. TIL that redirects to GlaxoSmithKline's en-gb homepage!

k_sze · on May 26, 2021

M1GRAIN for something about atomic ops.

fooker · on May 26, 2021

jraph · on May 26, 2021

That would be a line feed related parsing vulnerability in a messaging app on M1.

Cybotron5000 · on May 26, 2021

M1XOMATOSIS, M1G-29, M1XMASTRM1KE, M1CKEYM1CE, M1L1TTLEPWN1E…

not2b · on May 26, 2021

... for those who like Apple a bit too much.

asddubs · on May 26, 2021

m1stake, m1aculpa

ant6n · on May 26, 2021

Is m1aculpa like the mar1o spelling of the word? MamaM1a!

arcticbull · on May 26, 2021

M1SASMA - for a particularly stinky bug.

jwr · on May 26, 2021

A "domainer" — that must be one of the ultimate pejorative descriptions of a person.

steve_adams_86 · on May 26, 2021

Is it though? They refer to themselves as domainers quite often. (I work with them indirectly)

dehrmann · on May 26, 2021

It'll have a short shelf life. Won't the M2 be out within a year?

Cthulhu_ · on May 26, 2021

Soon enough we'll get the M9 and then the MX because of reasons. And SE / Lite / Pro versions.

insaneirish · on May 26, 2021

I would suspect the final versions of M chips for all the first round Apple Silicon Macs are all taped out with no respins planned.

And I further expect that they’re already sampling the M chips for the subsequent round of products. Heck, they may even be completely done as well.

jonsen · on May 26, 2021

MeToo

meowface · on May 26, 2021

I'm constantly surprised what domains are still available. I've registered many 2/3-letter domains (with 3-4 letter TLDs) in the past year, as well as ones for very common nouns (some also 3 letters), almost always for under $40. Admittedly it's mostly for the newer TLDs, though.

trogdor · on May 26, 2021

Similar story. I own a half-dozen relatively recently-registered three-letter domains at two-letter ccTLDs. I’m surprised every time one turns out to be available at normal rates.

vignesh_warar · on May 26, 2021

>I came here from a news site and they didn't tell me any of this at all!

>Then perhaps you should stop reading that news site, just like they stopped reading this site after the first 2 paragraphs.

This is my most favorite

tectonic · on May 26, 2021

> Wait. Oh no. Some game developer somewhere is going to try to use this as a synchronization primitive, aren't they. Please don't. The world has enough cursed code already. Don't do it. Stop it. Noooooooooooooooo

mesebrec · on May 26, 2021

Cross-core communication without going through the kernel seems like a very useful performance feature for games.

Am I missing something or is it somewhat likely this will be "abused" by games?

Matthias247 · on May 26, 2021

You can already communicate between apps without going through the kernel by using shared memory - with a much higher bandwidth. And even just the regular write/sendmsg/etc calls are probably more efficient despite going through the kernel due to being able to carry much more bytes.

This was really just a good joke touching how the game industry in the past used non-common hardware features for optimization purposes.

lilyball · on May 26, 2021

This thing communicates at 1MB/s. A “performance feature” it ain’t.

dijit · on May 26, 2021

throughput and latency are different measures.

Games usually live in the realm of latency.

passivate · on May 27, 2021

That begs the question, what is the latency, and what kind of feature would you anticipate this could be useful for?

mukesh610 · on May 26, 2021

Synchronization primitives AFAIK don't need to transfer huge amounts of data in a short time. One bit for every "okay" signal would suffice. At the given speed you can perform 8 million syncs per second between two threads.

smoldesu · on May 26, 2021

who cares, it's not like you're going to use it to send textures from one thread to another

jonny_eh · on May 26, 2021

Even if it were a performance saver (it isn't), it'd break when new silicon is released with this issue fixed.

eyelidlessness · on May 26, 2021

I checked this out to find out just... information I guess? I don’t own an M1 but plan to get an ARM Mac when I can budget it. Good to be aware of the landscape.

I was not expecting such an entertaining FAQ. Good job, very informative, very amusing!

madis · on May 26, 2021

Why would you spend money on crappy and locked down hardware that can't be fixed. A computer that you don't own but basically rent. Get a Lenovo Thinkpad and join the light side, you'll be amazed!

breakfastduck · on May 26, 2021

Whatever your opinions on Apples policies and behavior it's just ignorant to call the M1 'crappy' when it absolutely annihilates any processor in its class and doesn't at all get embarrassed when compared to high end desktop CPUs.

smoldesu · on May 26, 2021

CPUs are a chump's game, and it's no surprise that Apple, the company with sole access to next-generation silicon, was able to reach last-generation performance on a laptop chip. Nobody freaked out when AMD's Ryzen 7 4800u hit 4ghz over 8 cores, I don't see a reason why I should freak out now when Apple's doing it with 10 less watts.

Plus, that's only the CPU side of things. The M1's GPU is annihilated by most GPUs in it's class... from 2014. Fast forwards to 2021, and it's graphics performance is honestly pathetic. Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it.

So yeah, I think there are a lot of workloads where the M1 is a pretty crappy CPU. Unless your workload is CPU-bound, there's not really much of a reason to own one. And even still, the M1 doesn't guarantee compatibility with legacy software. It doesn't have a functional hypervisor, and it has lower IO bandwidth than most CPUs from a decade ago. Not really something I'd consider viable as a "daily driver", at least for my workload.

krunkcoin · on May 27, 2021

"CPUs are a chump's game" - what? High performance CPUs which nevertheless use very little power are extremely difficult to design.

"AMD's Ryzen 7 4800u hit 4ghz over 8 cores" - It doesn't. AMD specifies it as having 1.8 GHz base clock, 4.2 GHz max boost clock. AMD's cores use ~15W each at max frequency. Since the 4800U's configurable TDP range is 10W to 25W for the whole chip, there is no way that all 8 cores run at 4.2 GHz simultaneously for any substantial period of time. In fact, running even one core in its max performance state probably isn't sustainable in a lot of systems which opt to use the 4800U's default 15W TDP configuration.

On the other side of things, Apple M1 performance cores use ~6W each at max frequency. It is actually possible for all four to run at full performance indefinitely with the whole chip using about 25W, provided there is little GPU load.

"Remember our friend the 4800u? It's integrated GPU is able to beat the M1's GPU in raw benchmarks, and it came out 18 months before it." - Say what? The only direct comparison I've been able to find is 4700U vs M1, in Anandtech's M1 article, and it shows the M1 GPU as 2.6x faster in GFXBench 5.0 Aztec Ruins 1080p offscreen and 2.5x faster in 1440p high.

Granted, the 4700U GPU is a bit slower than the 4800U GPU, but not by a factor of 2 or more.

This isn't an unexpected result given that M1's GPU offers ~2.6 single precision TFLOPs while the 4800's is ~1.8 TFLOPs.

Literally everything you wrote about M1 being bad is wrongheaded in the extreme, LOL.

breakfastduck · on May 26, 2021

Not being viable as your daily driver does not make it crappy.

But you heard it here first guys, building CPUs is a chumps game. And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success. Fine I guess, but I don't agree.

Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.

smoldesu · on May 26, 2021

> Not being viable as your daily driver does not make it crappy.

What does it make it then? Some unicorn device that I'm unworthy of? Is there something wrong with my workload, or Apple's? Apple is marketing the M1 to computer users. I'm a computer user, and I cannot use it as part of my workflow, I have every right to voice that concern to Apple.

> And you see no reason to celebrate the first genuinely viable, power-efficient and fast non x86 CPU being a mass success.

You must be late to the party, ARM has been around for years. Apple's power efficiency is about on-par with what should be expected from a 5nm ARM chip with a gimped GPU. What is there to celebrate, that Apple had the initiative to buy out the entirety of the 5nm node at TSCM, plunging the entire world into a semiconductor shortage unlike anything ever seen before? Yeah, great job Apple. I think it was worth disrupting the global economy so you could ship your supercharged Raspberry Pi /s

> Also not sure why you wave away CPU bound workloads as though they don't exist or somehow lesser.

CPU-bound workloads absolutely exist, but who's running them on a Mac? Hell, more importantly, who's running them on ARM? x86 still has a better value proposition than ARM in the datacenter/server market, and most local workloads are hardware-accelerated these days. I really don't know what to tell you.

breakfastduck · on May 27, 2021

Audio production is almost entirely CPU bound, to give one example.

Who's running them on ARM? Not many now, but everything starts somewhere.

It's called progress. You say it's 'to be expected' - well no one else has done it, have they?

DeliriumTrigger · on May 27, 2021

Yeah, after two failed Macbooks from 2016 because of their ssds I can just say stay away from apple hardware until they reverse course on storage devices.

eyelidlessness · on May 26, 2021

Why would you let my computer preference affect you this much?

raxxorrax · on May 26, 2021

> locked down hardware

not every register!

ImprobableTruth · on May 26, 2021

Well, the performance is pretty good...

princekolt · on May 26, 2021

“Aaaaa look at me I am right you are wrong aaaa”

BusTrainBus · on May 26, 2021

Lenovo is a Chinese company.

mukesh610 · on May 26, 2021

Rantenki · on May 26, 2021

I've been stumbling through writing a pile of secure software development lifecycle management and disclosure practices documentation all evening, and desperately needed a bit of levity. This post delivered. Thank you.

Also, I am still not sure if this is a disclosure, performance art, or extremely dry comedy, but it certainly covered all the bases.

__d · on May 26, 2021

> Newton OS users: I guess those are technically Apple Silicon but...

The Newton wasn't really Apple Silicon: The OMP/MP100/MP110/MP120/MP130 ran an ARM610. The eMate300 ran an ARM710. The MP2000/MP2100 ran a DEC StrongARM SA-110 CPU.

None of which were designed or manufactured by Apple.

phire · on May 26, 2021

At the time, Apple owned 50% of ARM.

ARM, the company only existed because Apple wanted them to manufacture a CPU for it's Newton project.

While Apple might not have designed the ARM610, but they technically owned it.

__d · on May 26, 2021

I did say "designed or manufactured" ... but I'll concede the point that they had some ownership of the 610/710, at least.

On 27 Nov 1990, ARM was formed with Apple owning 43% alongside Acorn (the designer), and VLSI Technology (the manufacturer).

Funny thing: I've found two articles that claim two different purchase prices for that 43%: one $3M [0] and the other $1.5B [1]. That's quite a difference!

[0] https://appleinsider.com/articles/20/06/09/how-arm-has-alrea...

[1] https://www.cultofmac.com/97055/this-is-how-arm-saved-apple-...

klelatti · on May 26, 2021

> At the time, Apple owned 50% of ARM.

Nope, Apple never owned 50% of ARM.

> ARM, the company only existed because Apple wanted them to manufacture a CPU for it's Newton project.

Who knows what would have happened had Apple not invested but Apple was never ARM's only customer.

> While Apple might not have designed the ARM610, but they technically owned it.

If I own some Apple shares reasonably sure that doesn't mean that "technically" I own the M1.

tonyedgecombe · on May 26, 2021

ARM was a joint venture between Acorn Computers, Apple Computer and VLSI Technology so it's not that clear cut.

planb · on May 26, 2021

This is the best thing I've seen on the internet for a long time. Hopefully some people (tech journalists and twitter folks) will "fall for it" and learn along the way...

tyingq · on May 26, 2021

I suppose you could use it to create a "covert suite" of apps for the M1 iPad that talk to each other where they aren't supposed to. Sharing permission X from app 1 with app 2 that isn't supposed to have permission X, etc.

marcan_42 · on May 26, 2021

Thankfully Apple can, in principle, statically analyze for this on the iOS App Store, as they do not allow JIT mappings on those devices.

eproxus · on May 26, 2021

Can they guarantee no JIT code via static analysis as well? Or could someone sneak in a tiny bit of disguised JIT code just to get to this register?

I would assume a huge JITed VM implementation would show up easily in analysis.

phire · on May 26, 2021

They don't provide anyway to mark memory as executable.

saagarjha · on May 26, 2021

Well, they do, because they have to run your code :P You just can't make a new page of code and mark it executable.

marcan_42 · on May 26, 2021

The OS only makes pages executable if they come from a signed app. There is no way for the app itself to do that.

saagarjha · on May 26, 2021

If it makes it more clear, my comment was mostly "if your code page has a valid signature you can mark it as executable".

tyingq · on May 26, 2021

Do you need JIT though? Does Xcode support inline ASM, or various compilers extensions that can read/write a cpu register?

marcan_42 · on May 26, 2021

If you put this in your app directly, Apple can just find it and reject it at submission time. If JIT were an option, that wouldn't be enough, because the app could do it at runtime. Since it isn't, there is no way to "hide" something like this from the App Store static analyzer.

tyingq · on May 26, 2021

Hrm. It seems like inline ASM allows for passing the register name dynamically, though I can't tell for sure. If that's the case, it seems like it would be hard to tell ahead of time, other than "app calls msr/mrs".

saagarjha · on May 26, 2021

Inline assembly must resolve register names at compile time.

thinkloop · on May 26, 2021

The attackers already have whatever data you are intending them to steal/share. The author says this bug is no big deal:

>Can malware use this vulnerability to take over my computer? No.

>Can malware use this vulnerability to steal my private information? No.

tyingq · on May 26, 2021

This is a bit different. Skirting app store policy/rules

fnord77 · on May 26, 2021

> So what's the point of this website?

> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.

Iv · on May 26, 2021

> So what's the point of this website?

> Poking fun at how ridiculous infosec clickbait vulnerability reporting has become lately. Just because it has a flashy website or it makes the news doesn't mean you need to care.

> If you've read all the way to here, congratulations! You're one of the rare people who doesn't just retweet based on the page title :-)

ddtaylor · on May 26, 2021

That's reassuring to read. I opened the page, read a bit of it, pressed play on the video and scrubbed around a bit, got irritated and closed the tab. I figured if it mattered I would wait until better coverage came out.

fulafel · on May 26, 2021

> It violates the OS security model. You're not supposed to be able to send data from one process to another secretly.

I'd argue this is not the case. What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?

londons_explore · on May 26, 2021

Indeed.

Without this vulnerability, there would still be a million ways to send data between cooperative processes running as different users on Mac OS X.

For example, a process could start subprocesses at a deterministic rate and the other end of the covert link observes how fast the pid counter is going up.

This is a non-vulnerability, because it targets something there was no effort to protect.

Cthulhu_ · on May 26, 2021

It's not really a vulnerability as the FAQ states, but it violates the operating system's own application isolation policies. If you don't want your Facebook app to talk to your Instagram app (e.g. different accounts for different purposes), you should be able, as a user, to block communication between the two. This is a backdoor to circumvent that.

I mean not that anyone has a native Facebook or Instagram app on their device, but just to name an example.

vbsteven · on May 26, 2021

> I mean not that anyone has a native Facebook or Instagram app on their device, but just to name an example.

The M1 is used in the iPad Pro so your example is definitely possible. (or your comment was sarcasm in which case: woosh to myself)

geocar · on May 26, 2021

> I'd argue this is not the case. What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?

All of them.

A piece of software able to read my mail but not use the Internet could credibly be a tool to help me index and find my email using search keywords. It promises to not use the Internet, and indeed nm/objdump shows no use of networking tools.

Another piece of software able to monitor RSS feeds I am interested in and alert me to their changes is expected to use the Internet, but not the filesystem, and surely not the part of the filesystem that contains my email. I can use strace/dtruss to verify it never touches the filesystem, and use chroot/jail to keep it honest.

This being said, I agree that "mainstream operating systems" (meaning Windows and macOS, but not perhaps iOS) don't do enough and it might be impossible for them without changing user expectations[1], but I think they're trying. Web browsers disabled high resolution timers specifically to protect against this sort of thing. iOS doesn't permit arbitrary background tasks from running to protect battery and ostensibly privacy. But they could all do better.

[1]: For example, for me high CPU load is a red flag - a program that does this to me regularly gets put into a VM so that I can mess with its time-- Zoom now loses about a minute every three if it's not focused which is annoying because it messes with the calendar view, but I'm pretty sure it can't do anything else I don't want it to. Who should do this work? My operating system? Zoom? Neither will do it if users don't demand it.

fulafel · on May 26, 2021

So my point as it applies to this example: the email indexing program could communicate towards the rss program using cpu or storage load spikes. And no widely used multitasking OS tries to prevent this.

grahameb · on May 26, 2021

Yes, exactly. Multics actually tried, here's a memo from 1974 discussing the issue: https://multicians.org/mtbs/mtb696.html

Paged shared libraries, signalling by ramping up and down CPU usage, there are an enormous number of possible covert channels.

zozbot234 · on May 26, 2021

> What mainstream operating systems have made credible attempts to eliminate covert channels from eg timing or resources that can be made visible by cooperating processes across user account boundaries?

The answer will depend on whether you consider Multi-Level Security (MLS) https://en.wikipedia.org/wiki/Multilevel_security "mainstream". It's certainly a well-established approach if only in an academic sense, and the conflux of new use cases (such as secretive, proprietary "apps" being expected to manage sensitive user data) and increasingly-hard-to-mitigate info disclosure vulnerabilities has made it more relevant than ever.

alberth · on May 26, 2021

ELI5, anyone.

Are the chip registers not protected? What's the mechanism that's allowing this data sharing to happen?

josephcsible · on May 26, 2021

There's two bits of a CPU's register that are shared between all of its processes and that any process can write to. The result is that two sandboxed processes that are supposed to be totally isolated from each other can use this to communicate anyway. One example of how this can be exploited is cross-app tracking: if you told one app your name and another your location, they could secretly communicate with each other so both apps end up with both pieces of information.

SilverRed · on May 26, 2021

>they could secretly communicate with each other so both apps end up with both pieces of information

The could also just both ping a server to exchange data.

Aissen · on May 26, 2021

One has access to the internet, the other has not (but has less info).

herpderperator · on May 26, 2021

Why couldn't a future OS update add access control to these registers?

anyfoo · on May 26, 2021

Because the OS has no say. A running program issues an assembly instruction to the CPU to read or write this register, and the CPU complies.

For the OS to have a say, the CPU would need to provide a way where the OS tells it (usually by setting certain values in other registers) that the CPU should not allow access, at least under certain circumstances.

The article actually does go into certain situations where the access is more restricted (search for "VHE"), but also in how that does not really apply here.

amelius · on May 26, 2021

The OS can scan the program for instructions that access these bits. If necessary on a per-basic-block basis.

saagarjha · on May 26, 2021

Of course, this only works if you can't introduce new code without the kernel noticing.

amelius · on May 26, 2021

Yes, you can introduce new code but the kernel should also watch for that (JIT compilation etc.) and check the resulting code. It's quite involved, and the whole process looks more like a sandbox or emulator, but it's possible.

saagarjha · on May 26, 2021

Doing this performantly is going to be very prohibitive.

amelius · on May 26, 2021

Perhaps (depends also on CPU support), but on the other hand: in today's world with untrusted apps, the kernel will have to do some sandboxing anyway.

Wowfunhappy · on May 26, 2021

Could the OS intentionally clear or write dummy data to the register instead?

josephcsible · on May 26, 2021

No. The author explained why not:

> originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel.

slver · on May 26, 2021

You gotta access those bits though some instructions though. What if the command pipeline filters those instructions.

anyfoo · on May 26, 2021

Can you elaborate what you mean? What is the "command pipeline" here?

tomerdmann · on May 26, 2021

https://news.ycombinator.com/item?id=27286918

x0054 · on May 26, 2021

You are working here with CPU registers. At this point the OS has no say, it’s a hardware bug. Not a particularly serious one though.

slver · on May 26, 2021

I didn't say the OS filters the pipeline. Modern CPUs have a lot of updateable microcode, including how it handles its command pipeline.

marcan_42 · on May 26, 2021

There is no indication that the M1 has updatable microcode, nor any other features that might allow such mitigation. (If it did, Apple would've fixed it; I did give them a 90 day disclosure warning and they're not lazy about fixing actual fixable bugs.)

ncr100 · on May 26, 2021

Aw - that was what I was worried about - without updatable microcode :nuke:.

throwaheyy · on May 26, 2021

Modern x86/x64 CPUs. The M1 might not have updatable microcode.

dev_tty01 · on May 26, 2021

Apple might consider microcode a vulnerability. Certainly a double-edged knife.

josephcsible · on May 26, 2021

Because the CPU doesn't provide a practical means to do so.

herpderperator · on May 26, 2021

Doesn't the kernel control CPU access?

CGamesPlay · on May 26, 2021

There's more specific answers here, but in general the answer to this question is "only partly". The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code. During your time slice, you can do anything you want to the CPU, and in general only interrupts (timer interrupts, hardware interrupts, page faults, etc) will cause the kernel to get involved again. There are some specific features that CPU designers add to give extra control to the kernel, but that's a feature of the CPU and it's only when the CPU has explicitly added that type of control.

saagarjha · on May 26, 2021

> The kernel is what initially gives your process a time slice on the CPU, by setting an alarm for the CPU to return control to the kernel at the end of the time slice, and then just jumping into your code.

Somewhat critically, it will also drop down to EL0.

kevingadd · on May 26, 2021

Registers aren't resources you access through syscalls, there's no way for the kernel to control them unless you're running under virtualization or the CPU architecture specifically allows access control for the register. (As the site notes, virtualization allows controlling access to this register)

kolbusa · on May 26, 2021

Can kernel scan each page it maps as executable and return an error if it finds instructions interacting with the 'bad' register? Assuming the kernel requires executable pages to be read-only (W^X), this may even be doable (but probably very very slow).

josephcsible · on May 26, 2021

> Assuming the kernel requires executable pages to be read-only (W^X)

Which macOS's kernel doesn't.

marcan_42 · on May 26, 2021

It does require that, but it allows flipping between RX and RW at will (for JITs), and the M1 actually has proprietary features to allow userspace to do this without involving the kernel, so the kernel couldn't re-scan when those flips happen (plus it would kill performance anyway).

Plus, as I said above, this is prone to false positives anyway because the executable section on ARM also includes constant pools.

josephcsible · on May 26, 2021

Can't a MAP_JIT region be writable by one thread and executable by a different thread at the same time?

marcan_42 · on May 26, 2021

Ah, yes, I forgot about that. So indeed there is no non-racy hook point for the kernel to do such a check, even if it made sense and the RX/RW switch went through the kernel, which it doesn't.

anyfoo · on May 26, 2021

https://developer.apple.com/documentation/apple-silicon/port...

josephcsible · on May 26, 2021

That link confirms that it can:

> Because pthread_jit_write_protect_np changes only the current thread’s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region.

marcan_42 · on May 26, 2021

The kernel doesn't get a say in what instructions a userspace program can run, other than what the CPU is designed to allow it to control. The bug is the CPU designers forgot to allow it to control this one.

saagarjha · on May 26, 2021

Apple could "mitigate" this by refusing to sign code interacting with s3_5_c15_c10_1, I guess.

marcan_42 · on May 26, 2021

Only on iOS. On macOS, JITs are allowed (as is ad-hoc signed code if you click through the warnings).

However, this would be prone to false positives, as constant pools are in the executable section on ARM.

SheinhardtWigCo · on May 26, 2021

Let's say someone submits a malicious keyboard with the bad instructions hidden in a constant pool.

Apple can't just scan for a bad byte sequence in executable pages because it could also represent legitimate constants used by the program. (not sure if this part is correct?)

If so, doesn't that make detection via static analysis infeasible unless LLVM is patched to avoid writing bad byte sequences in constant pools? Otherwise they have to risk rejecting some small number of non-malicious binaries, which might be OK, depending on the likelihood of it happening.

josephcsible · on May 26, 2021

Doesn't Rice's theorem mean that they cannot?

cokernel_hacker · on May 26, 2021

I believe that Rice's theorem is about computability, not about whether or not it is possible to validate which CPU instructions a program can contain.

With certain restrictions, it is possible to do this: Google Native Client [1] has a verifier which checks that programs it executed did not jump into the middle of other instructions, forbade run-time code generation inside of such programs, etc.

[1]: https://en.wikipedia.org/wiki/Google_Native_Client

saagarjha · on May 26, 2021

Jumping in the middle of other instructions is not a problem on ARM.

josephcsible · on May 26, 2021

Yes, but then you're not just blocking instructions that touch s3_5_c15_c10_1; you're also blocking a bunch of other kinds of instructions too.

anyfoo · on May 26, 2021

(What other kinds of instructions? Genuinely asking.)

I don't think Rice's Theorem applies here. As a counterexample: On a hypothetical CPU where all instructions have fixed width (e.g. 32 bits), if accessing a register requires the instruction to have, say, the 10th bit set, and all other instructions don't, and if there is no way to generate new instructions (e.g. the CPU only allows execution from ROM), then it is trivial to check whether there is any instruction in ROM that has bit 10 set.

The next part I'm less sure how to state it rigorously (I'm not in the field): In our hypothetical CPU, I think disallowing that instruction either lets you remain being Turing Complete or not. In the former case, it's still the case that you can compute everything a Turing Machine can.

josephcsible · on May 26, 2021

You'd have to add one extra condition to your hypothetical CPU: that it can't execute unaligned instructions. Given that, then yes, that lets you bypass Rice's theorem, even though it is indeed still Turing-complete.

But the M1 does have a way to "generate new instructions" (i.e., JIT), so that counterexample doesn't hold for it.

anyfoo · on May 26, 2021

Yes, indeed, I should have stated "cannot execute unaligned instructions". Or have said 8 bit instead, then it would be immediately obvious what I mean. (You cannot jump into the middle of a byte because you cannot even address it.)

But I wanted to show how Rice's Theorem does not generally apply here. You can make up other examples: A register that needs an instruction with a length of 1000 bytes, yet the ROM only has 512 bytes space etc...

As for JIT, also correct (hence my condition), though that's also a property of the OS and not just the M1 (and on iOS for example, it is far more restricted what code is allowed to do JIT, as was stated in the thread already).

johncolanduoni · on May 26, 2021

With the way Apple allows implementation of JIT on the M1 (with their custom MAP_JIT flag and pthread_jit_write_protect_np) it is actually possible to do this analysis even with JIT code. Since it enforces W^X (i.e. pages cannot be writable or executable at the same time) it gives the OS opportunity to inspect the code synchronously before it is rendered executable. Rosetta 2’s JIT support already relies on this kind of inspection to do translation of JIT apps.

josephcsible · on May 26, 2021

https://news.ycombinator.com/item?id=27286771

saagarjha · on May 26, 2021

M1 enforces W^X through SPRR, which does not involve the kernel.

johncolanduoni · on May 26, 2021

It does when running native ARM code (but not x86 code), but AFAIK nothing stops Apple from changing this to being kernel mediated by updating libSystem in the ARM case as well. Of course I doubt they would take the performance hit just to get rid of a this issue.

ynik · on May 26, 2021

There's three cases:

1) the program does not contain an instruction that touches s3_5_c15_c10_1

2) the program contains an instruction that touches s3_5_c15_c10_1, but never executes that instruction

3) the program contains an instruction that touches s3_5_c15_c10_1, and uses it

Rice's theorem means we cannot tell whether a program will touch the register at runtime (as that's a dynamic property of the program). But that's because we cannot tell case 2 from case 3. It's perfectly decidable whether a program is in case 1 (as that's a static property of the program).

Any sound static analysis must have false positives -- but those are exactly the programs in case 2. It doesn't mean we end up blocking other kinds of instructions.

dezgeg · on May 26, 2021

Couldn't there be another register that controls whether access to the problematic register in EL0 is allowed, though?

acchow · on May 26, 2021

Sounds like this is by design and not really a newly discovered vulnerability. Maybe more of a discovery of deceptive advertising/documentation? Which is to say that Apple's engineers are reading this as non-news.

peter422 · on May 26, 2021

There is a small bit of memory that all programs on your computer share that isn’t protected in any way. If two misbehaving programs on your computer wanted to communicate in a really really secret way, they could use it.

If you don’t have misbehaving programs on your computer that want to secretly communicate than it doesn’t matter.

phnofive · on May 26, 2021

> So what's the real danger?

> If you already have malware on your computer, that malware can communicate with other malware on your computer in an unexpected way.

> Chances are it could communicate in plenty of expected ways anyway.

peteretep · on May 26, 2021

> So what's the real danger?

> If you already have malware on your computer, that malware can communicate with other malware on your computer in an unexpected way.

> Chances are it could communicate in plenty of expected ways anyway.

BoorishBears · on May 26, 2021

Your computer might spontaneously combust.

logimame · on May 26, 2021

Holy shit, just as I thought we’ve run out of novel ways of playing Bad Apple, here we are...

breck · on May 26, 2021

https://www.youtube.com/watch?v=tO6sfku_1b8

0xakhil · on May 26, 2021

How about randomising/reset these bits from kernel whenever there is a syscall? Not a great workaround but this should limit the effectiveness of leaking. Yeah, there will be tiny perf hit due to extra register read and write.

NobodyNada · on May 26, 2021

> Wait, didn't you say on Twitter that this could be mitigated really easily?

> Yeah, but originally I thought the register was per-core. If it were, then you could just wipe it on context switches. But since it's per-cluster, sadly, we're kind of screwed, since you can do cross-core communication without going into the kernel. Other than running in EL1/0 with TGE=0 (i.e. inside a VM guest), there's no known way to block it.

In other words: this register is shared between cores, so if the two processes are running simultaneously on different cores, they can communicate by reading & writing directly to & from this register, without any operating system interaction.

CJefferson · on May 26, 2021

Unfortunately, you can use this to send thousands of bits between syscalls, so the simplest error correction would fix that, with very little effort or overhead.

marcan_42 · on May 26, 2021

The demo already uses error correction (I'm not sure exactly what causes the errors, but I'm guessing the processes sometimes end up briefly scheduled on the other core cluster)

volta83 · on May 26, 2021

> in violation of the ARM architecture specification

> Apple decided to break the ARM spec by removing a mandatory feature

Is there a page documenting all incompatibilities / violations of the ARM architecture specification by the M1?

saagarjha · on May 26, 2021

I wouldn't be surprised if the language gets loosened in the next revision of the standard, as it has in the past.

addaon · on May 31, 2021

It seems like there's a partial mitigation available to the OS here. When scheduling a task, write a random value to the two user-writable bits. When the task is unscheduled, if the bits do not match, terminate the task. This effectively makes writing to the register an OS-enforced illegal operation with a 75% chance of being caught within 10 ms if the channel is being used at full bandwidth. (The writer can reduce the chance of it being caught proportional to reduced use of channel bandwidth by resetting it to the OS-chosen value after a bit is transmitted.) The reader can't be detected this way, but since the channel requires cooperation between the writer and reader, catching either is fine. Not a perfect fix, but would help, and would also give visibility into whether this is used in the wild -- e.g., report to Apple via crash reporting mechanism if a process is terminated this way, which would allow prompt discovery of app store apps that abuse the channel.

bradleybuda · on May 26, 2021

Could this be fixed with a microcode update (he asks, not really having any idea what microcode is)?

Operyl · on May 26, 2021

Does the M1 even have microcode updates? I haven't seen anything pointing to that yet.

marcan_42 · on May 26, 2021

I don't think so, no. If it has microcode it's probably burned into sequencer tables, not updatable. I was kind of hoping Apple would have some chicken bit register up their sleeve as a last resource fix (e.g. "trap on instruction encodings matching this mask"), but given that they seem to have no useful mitigation for it, I don't think they do.

londons_explore · on May 26, 2021

Is it possible Apple have the silicon functionality to fix this, but have decided it isn't worth fixing?

After all, process isolation between cooperating processes is nearly impossible to do. If Apple close this loophole, there will be other lower bandwidth side channels like spinning up the fan in Morse code and the other process notices the clock speed scaling up and down...

marcan_42 · on May 26, 2021

It doesn't really make sense not to fix it if they can in fact do so easily.

londons_explore · on May 26, 2021

Except software-silicon patches usually have a limited number of filters, patch slots, etc. Might not be worth using one for this.

marcan_42 · on May 26, 2021

They're using zero so far [0], and until they need it for something else it wouldn't make sense not to use it for this. The CPU tunables aren't fuses or anything, the OS configures them (m1n1 in our case)

[0] https://github.com/AsahiLinux/m1n1/blob/main/src/chickens.c

afandian · on May 26, 2021

Sorry if I missed it, but what is the defined purpose of the s3_5_c15_c10_1 register? Or is it just general purpose?

marcan_42 · on May 26, 2021

It's an implementation-defined register, which means it's up to Apple to define it. We have no idea what it does; we haven't observed any visible effects from flipping those bits. Given that it's per-cluster, we can infer that it has something to do with cluster-specific logic. Perhaps memory interface or power control.

There are hundreds of Apple implementation-defined registers; we're documenting them as we learn more about them [0] [1] [2]

[0] https://github.com/AsahiLinux/m1n1/blob/main/tools/apple_reg...

[1] https://github.com/AsahiLinux/docs/wiki/HW%3AARM-System-Regi...

[2] https://github.com/AsahiLinux/docs/wiki/HW%3AARM-System-Regi...

Cthulhu_ · on May 26, 2021

I googled it for you and err, came up blank, there's just two code references in some ASM code, the rest points to this resource. Weird, I would have thought things like this would have public documentation.

pthariensflame · on May 26, 2021

https://news.ycombinator.com/item?id=27287519