FuzzOS – an operating system which is designed specifically for fuzzing

yjftsjthsd-h · on Dec 7, 2020

Huh. So my initial response was, "why on earth would you need a whole OS for that", but memory snapshotting and improved virtual memory performance might actually be a good justification. Linux does have CRIU which might be made to work for such a purpose, but I could see a reasonable person preferring to do it from a clean slate. On the other hand, if you need qemu to run applications (which I'm really unclear about; I can't tell if the plan is to run stuff natively on this OS or just to provide enough system to run qemu and then run apps on linux on qemu) then I'm surprised that it's not easier to just make qemu do what you want (again, I'm pretty sure qemu already has its own memory snapshotting features to build on).

Of course, writing an OS can be its own reward, too:)

gamozolabs · on Dec 7, 2020

Oooh, wasn't really expecting this to make it to HN cause it was meant to be more of an announcement than a description.

But yes, I've done about 7 or 8 operating systems for fuzzing in the past and it's a massive performance (and cleanliness) cleanup. This one is going to be like an operating system I wrote 2-3 years ago for my vectorized emulation work.

To answer your QEMU questions, the goal is to effectively build QEMU with MUSL (just to make it static so I don't need a dynamic loader), and modify MUSL to turn all syscalls to `call` instructions. This means a "syscall" is just a call to another area, which will by my Rust Linux emulator. I'll implement the bare minimum syscalls (and enum variants to those syscalls) to get QEMU to work, nothing more. The goal is not to run Linux applications, but run a QEMU+MUSL combination which may be modified lightly if it means a lower emulation burden (eg. getting rid of threading in QEMU [if possible] so we can avoid fork())

The main point of this isn't performance, it's determinism, but that is a side effect. A normal syscall instruction involves a context switch to the kernel, potentially cr3 swaps depending on CPU mitigation configuration, and the same to return back. This can easily be hundreds of cycles. A `call` instruction to something that handles the syscall is on the order of 1-4 cycles.

While for syscalls this isn't a huge deal, it's even more emphasized when it comes to KVM hypercalls. Transitions to a hypervisor are very expensive, and in this case, the kernel, the hypervisor, and QEMU (eg. device emulation) will all be running at the same privilege level and there won't be a weird QEMU -> OS -> KVM -> other guest OS device -> KVM -> OS -> QEMU transition every device interaction.

But then again, it's mainly for determinism. By emulating Linux deterministically (eg. not providing entropy through times or other syscall returns), we can ensure that QEMU has no source of external entropy, and thus, will always do the same thing. Even if it uses a random-seeded hash table, the seed would be derived from syscalls, and thus, will be the same every time. This determinism means the guest always will do the same thing, to the instruction. Interrupts happen on the same instructions, context switches do, etc. This means any bug, regardless of how complex, will reproduce every time.

All of this syscall emulation + determinism I have also done before, in a tool called tkofuzz that I wrote for Microsoft. That used Linux emulation + Bochs, and it was written in userspace. This has proven incredibly successful and it's what most researchers are using at Microsoft now. That being said, Bochs is about 100x slower than native execution, and now that people have gotten a good hold of snapshot fuzzing (there's a steep learning curve), it's time to get a more performant implementation. With QEMU with get this with a JIT, which at least gets us a 2-5x improvement over Bochs while still "emulating", but even more value could be found if we get the KVM emulation working and can use a hypervisior. That being said, I do plan to support a "mode" where guests which do not touch devices (or more specifically, snapshots which are taken after device I/O has occurred) will be able to run without QEMU at all. We're really only using QEMU for device emulation + interrupt control, thus, if you take a snapshot to a function that just parses everything in one thread, without process IPC or device access (it's rare, when you "read" from a disk, you're likely just hitting OS RAM caches, and thus not devices), we can cut out all the "bloat" of QEMU and run in a very very thin hypervisor instead.

In fuzzing it's critical to have ways to quickly map and unmap memory as most fuzz cases last for hundreds of microseconds. This means after a few hundred microseconds, I want to restore all memory back to the state "before I handled user input" and continue again. This is extremely slow in every conventional operating system, and there's really no way around it. It's of course possible to make a driver or use CRIU, but these are still not exactly the solution that is needed here. I'd rather just make an OS that trivially runs in KVM/Hyper-V/Xen, and thus can run in a VM to get the cross-platform support, rather than writing a driver for every OS I plan to use this on.

Stay cute, ~gamozo

zomglings · on Dec 8, 2020

Your YouTube looks awesome, and I just followed you on Twitch. Looking forward to a marathon stream. :)

A question about fuzzing and determinism:

It seems like this requirement of determinism is connected to (and reinforces) a brute force approach to discovering problematic inputs.

This precludes strategies like performing a truly stochastic search (where you put complete faith in the randomness) over your input space.

Are people attempting this kind of thing? Are there additional requirements to fuzzing that make probabilistically finding a thin trajectory through the input space undesirable?

gamozolabs · on Dec 8, 2020

Determinism is already fairly important in fuzzing for 2 major reasons.

One, is that having determinism makes it easier to triage bugs. If I find a crash while fuzzing, but there's no way to reproduce it, it's going to pretty much never get fixed (unless the call stack from the first crash is obvious enough to prepare a fix). For fuzzing systems, this is effectively mandatory. Imagine a memory allocation failure leading to a NULL-deref, the odds of hitting that same bug are so low because it requires going OOM at the same point in a subsequent execution. Snapshot fuzzing (each fuzz case starts from a snapshotted state of memory/registers/whatever) mitigates some of this, but there's still noise from context switches that will affect RNGs, which will affect heap layouts, which will affect everything on the system. That being said, most fuzzing out there publicly is not system's fuzzing, and thus this is typically not something people think about.

But two, is what most people use determinism for. In modern fuzzing, coverage guidance is pretty much mandatory. This means when new code is hit, to save off the input such that it can be built upon. At a very simple level, this means a problem which is 256^4, turns into a 256*4, as all requirements do not need to be satisfied simultaneously, as long as the previous requirements cause new code to get hit they can be built upon. Of course, if the program does not behave the same way every time, the noise can start to erode the value which is provided here, since you're not actually building upon the same thing.

I'm not sure if this answers the question.

feanaro · on Dec 8, 2020

I think you're mixing two different concepts up here.

Determinism in the context of fuzzing is related to the reproducibility of a program state. It's deterministic in the sense that all the inputs to the system are known and controlled. This allows us to repeat all inputs and reproduce the exact same behaviour as before, e.g. an error state we stumbled upon or an interesting program state we want to continue exploring.

This in no way precludes sampling the input space stochastically. Brute forcing by sampling all possible inputs sequentially is usually untenable and wasteful. However, once you do encounter a new program state, you'll be able to perfectly recreate it forever.

zomglings · on Dec 8, 2020

I think you're right, I was mixing two concepts and my question wasn't really about determinism.

Writing a specialized OS suggests to me that someone is very focused on... the best way I can describe it is cutting a fat trajectory through the input space. I am curious if anyone is spending their effort on sparser (but more intelligent) sampling of the input space instead.

feanaro · on Dec 8, 2020

Yes, there's a lot of work being done on more intelligent fuzzing. To throw some terms into the mix, there's coverage-guided fuzzing (which is now an old technique), concolic testing (which combines concrete execution with symbolic execution in order to reach new branches in a targetted way) and grammar fuzzers (which generate valid inputs according to a grammar).

These are not really mutually exclusive with the type of work gamozolabs is doing because even with hyperintelligent input generation, you still ideally want raw speed.

codetrotter · on Dec 7, 2020

I’ve been watching this guy stream on Twitch and I can tell you that he is legit.

Also his streams are often insanely long, going 7 to 13 hours. So I only ever watch his streams live for a while and then I catch the remainder on VOD.

He also has a YouTube page with archive of past streams beyond the retention of Twitch.

https://www.youtube.com/channel/UC17ewSS9f2EnkCyMztCdoKA

Honestly the knowledge he shares is so interesting that I selfishly did not want other people to even know about it. But realistically speaking I am not going to have time to make real use of the knowledge myself anytime soon.

He puts out quality content and he deserves all the attention he can get. And also, even though competition to find security bugs and earn bounties might become too hard that I myself or you ever get to find one and claim some money, the products that we all use will be more secure the more people in the world that work on finding these bugs and reporting them.

albntomat0 · on Dec 7, 2020

Do you have any thoughts on how to approach the videos? I have a good OS and fuzzing background, but an 8 hour video seems like an ordeal and harder to extract value from than something written

codetrotter · on Dec 7, 2020

I think my best advice would be that you tune in on one of the streams on Twitch when it is live and ask him about it. Then maybe the two of you can figure out what is most relevant to you of his content compared to what you already know?

muricula · on Dec 7, 2020

codetrotter is right just tune into the stream and ask questions/do some light heckling as appropriate ;)

albntomat0 · on Dec 8, 2020

Thanks, I'll give it a shot! Definitely going to read some code as well, which I suspect will also provide more insight!

cute_boi · on Dec 8, 2020

His stream is sometime 23 hour. I think it was fuzz week lol. And I was there watching whole time. This guy is insane.

URfejk · on Dec 8, 2020

You are also insane.

:)

AgloeDreams · on Dec 7, 2020

Can someone tell me what the living heck is `Fuzzing`?

I read this twice and I really don't have a single clue other than it having something to do with or requiring fast memory?

dataflow · on Dec 7, 2020

Wikipedia explains it: https://en.wikipedia.org/wiki/Fuzzing

SAI_Peregrinus · on Dec 7, 2020

Testing code via semi-random inputs[1]. The most common fuzzers, AFL-Fuzz[2] and libFuzzer[3] are coverage-guided: they compile the program with special instrumentation to determine code coverage, then call the program repeatedly, changing the inputs via genetic algorithm to try to maximize the code paths executed. When unexpected behavior is observed (typically the test harness crashing) the fuzzer saves the test's input for future use.

Basically automatic generation of test case inputs. It's non-deterministic, so it won't always find problems, but it can save a lot of manual effort.

[1] https://en.wikipedia.org/wiki/Fuzzing [2] https://lcamtuf.coredump.cx/afl/ [3] https://www.llvm.org/docs/LibFuzzer.html

davidw · on Dec 7, 2020

For an interesting, similar idea, see also:

https://en.wikipedia.org/wiki/QuickCheck

lambda_obrien · on Dec 7, 2020

Fuzzing: give a program structured random garbage as input and see what happens, then fix the resulting bugs.

Forge36 · on Dec 7, 2020

Originally: for each terminal program, pass every file as input. If crash results: document it.

Effectively: random inputs to achieve unexpected results. It's now come to mean "random data testing of an API"

amelius · on Dec 8, 2020

Here is a tutorial I found: https://fuzzing-project.org/tutorial1.html