Unikernels, meet Docker

amirmc · on Nov 19, 2015

We'll be releasing the code sometime tomorrow afternoon. We need a bit of extra time to clear out the unused code, make a readme, and also get some rest. :)

I'll add a link to the blog post when it's live.

ibotty · on Nov 21, 2015

did you do that already?

amirmc · on Nov 24, 2015

Not yet (sorry) but will do soon!

amirmc · on Dec 8, 2015

Just to close the loop, we did release the code.

See the post at: http://unikernel.org/blog/2015/contain-your-unikernels/

aabaker99 · on Nov 19, 2015

this reminds me of a recent presentation posted here about OpenBSD's coming support for "pledges" [1].

I suppose the difference here is that pledges use a single kernel but restrict kernel interfaces for each process while the unikernel approach creates a kernel subset for each VM (and thus each process, for VMs dedicated to a single process).

Can anyone knowledgeable comment about the advantages or disadvantages of each? I'm guessing unikernals will be more portable (only OpenBSD is doing pledges to my knowledge) and more popular with the containerization movement. Can pledges accomplish the same objective with better use of system resources?

Edit: clarity about portability point

[1] http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man2/...

masklinn · on Nov 19, 2015

They're compatible approaches, you could use pledges in a unikernel for exactly the same reason in order to further reduce the application's attack surface: a unikernel includes only what the application needs but it can't take advantage of the application's needs varying (reducing) over time[0], which is pledge(2) allows.

[0] one of the reasons for pledge(2) was the realisation/understanding that many applications have an initial setup phase followed by a "stable state", where the stable state needs significantly less services than the setup e.g. setup might require reading config files but the steady state doesn't and just listens on a socket, so you can call pledge(2) multiple times and further reduce the application's abilities (attempting to increase them will fail with EPERM)

cpuguy83 · on Nov 20, 2015

Neat!

avsm · on Nov 19, 2015

(I'm a slacking OpenBSD developer and a unikernel hacker, so I guess I should respond :-)

Pledging is the most usable form of priv-dropping I've seen yet in any OS, since it was designed after looking at the "flow" of how a typical daemon interacts with the kernel, rather than a low-level syscall interface that the typical programmer would have no idea about. I've used many of the different privilege dropping interfaces in OpenBSD over the years, and to recall the journey:

- Separating out syslogd back in 2003 was a fun adventure in manual privsepping before it became mainstream; I think just sshd used in it base before that. This required building an automaton of how the privileged messages from syslog would work, and carefully coordinating the state machine and message passing around that. http://marc.info/?l=openbsd-cvs&m=105967566808306&w=2

At this point we had two processes, but the root process couldn't do fine-grained dropping of privileges and just had to be carefully audited by lots of people.

- systrace came along, which allowed policies to be built about which syscalls could be called. The policy language was limited, so I got really excited by the possibilities and wrote a DSL to make it easier to build systrace policies; http://anil.recoil.org/papers/sam03-secpol.pdf. In the end I gave up on using systrace since it was so brittle to unrelated changes in libc or dependent libraries changing the order of system calls and causing apps to break all the time.

- Pledge is almost the opposite of systrace. You provide a series of human-readable strings, and the OS takes care of mapping those to groups of syscalls. This makes so much more sense, since the person making the change can also update the pledge, and applications don't break! If you look at OpenBSD current, over half of the base daemons are now pledging. At no point are there insanely complex policies like SELinux, nor are there the race conditions of systrace. In fact, the most similar approach to this I've found is MacOS X entitlements, which also let the application specify what functionality it would like (as opposed to what syscalls it needs)

So, what's the relation to unikernels? Well, they are completely the opposite approach. Instead of starting with a wide kernel interface and then restricting it, unikernels start with a very narrow interface (the hypervisor) and build up higher level abstractions as they are demanded by the application.

This happens via a series of libraries, and so the programmer can choose at compile time how to weave in privilege levels and the use of hardware enforcement (such as processes).

I can see both pledge and unikernels converging in the future, specifically by a linker that would be aware of the pledges that a library needs, and mapping the appropriate hardware enforcement into the resulting unikernel. I'm not aware of anyone actually working on this, but get in touch with me if you are interested... (anil@recoil.org)

legulere · on Nov 19, 2015

What this is ignoring is that the hypervisor already is a full operating system. If the linux people finally make isolation secure I see no future for unikernels.

avsm · on Nov 19, 2015

> If the linux people finally make isolation secure I see no future for unikernels.

You're forgetting that unikernels are a library OS and not tied to a particular hypervisor at all. MirageOS code can currently be compiled to target:

- the Xen hypervisor via MiniOS, with Mirage-supplied implementations of XenStore/device drivers/TCPIP

- bare metal and the KVM hypervisor via Rump Kernel

- UNIX binaries via tuntap (which work great with Linux containers).

And future backends -- the MirageOS frontend just needs to swap out and link in the right libraries for the desired platform. And even when Linux containers get a complete isolation story, if you build applications as unikernels you can also choose to isolate kernel components that will never be covered by the current Linux container architecture (such as the TCP/IP stack).

mwcampbell · on Nov 19, 2015

Or you could use LX-branded zones on Illumos, as Joyent's Triton does. Too bad Joyent's public cloud isn't competitive with the others on price.

Edit: So as long as hardware virtualization is dominant in public clouds, a unikernel is a nice optimization for applications that only require a single process.

mtgx · on Nov 19, 2015

A hypervisor has a much smaller attack target than a full blown OS. The kernel alone of a mainstream OS can be an order of magnitude larger than a hypervisor. Xen, for instance, has fewer than 150,000 lines of code. The Linux kernel has about 15 million. A full Linux distro probably has around 200-300 million. So a full OS has about two orders of magnitude (100x) more potential for exploits.

mwcampbell · on Nov 19, 2015

Is there any Xen-based public cloud that's not running a full conventional OS kernel (Linux or otherwise) in dom0? If not, then the theoretical smaller TCB of Xen doesn't matter in practice. And of course, some major public clouds use KVM, so they're effectively using a full Linux kernel as the hypervisor.

legulere · on Nov 19, 2015

Most of that 15 million lines will be dead code for your computer. There's architecture-specific code that won't even get compiled under any circumstances. But most of the 15 million lines is drivers, of which the fewest will run on your computer because you don't have those drivers.

You can't just take the lines of code as a comparison. You have to look how much code is actually exposed to potential attackers.

anonymousDan · on Nov 19, 2015

I think rump kernels are definitely the future. It will be interesting to see if any of the other *nixes attempt to reorganize their kernels a la NetBSD to make them potentially useable as rump kernels.

vezzy-fnord · on Nov 19, 2015

Personally I'd like to see the monolithic Unixes get decomposed into multiserver, as was attempted with the IBM SawMill project in the late 90s for Linux.

anttiok · on Nov 19, 2015

Personally I'd like to see disjoint drivers and operating systems. No single OS model solves all problems. Of course no driver code scales to unlimited uses, but at least steps should be taken to ensure that current driver code goes as far as it does. (Then again, I already did that, now just putting my mouth where my money went)

noselasd · on Nov 19, 2015

Personally I much rather like the approach Plan 9 took to decomposition vs what SawMill attempted.

vezzy-fnord · on Nov 19, 2015

Going the Plan 9 way would be a blank state reset of everything, doing a retroactive modularization would retain the existing environment.

pjmlp · on Nov 19, 2015

I agree, even Microsoft is doing it with Windows Containers running on Hyper-V directly.

masklinn · on Nov 19, 2015

Why rump specifically?

amirmc · on Nov 19, 2015

I think because they play very nicely with legacy software, without needing any modifications. i.e the demo has unmodified, off-the-shelf versions of nginx, mysql and php.

The other unikernel projects (i.e. MirageOS and HaLVM), take a clean-slate approach which means application code also has to be in the same language (OCaml and Haskell, respectively). However, there's also ongoing work to make pieces of the different implementations play nicely together too (but it's early days).

masklinn · on Nov 19, 2015

> I think because they play very nicely with legacy software, without needing any modifications. i.e the demo has unmodified, off-the-shelf versions of nginx, mysql and php.

That's a good argument but seems to be one for rump being the past's future (unikernel-ifying legacy applications) rather than for it being the future itself.

amirmc · on Nov 19, 2015

I do see what you mean but asking people to re-write all the things isn't a successful strategy to achieve mass adoption.

Being able to build your unikernel (micro)services using legacy things and then swapping out certain services for clean-slate versions seems like a much more palatable approach. This is why unikernels fit so well with the microservices approach — you only have to re-write things one piece at a time if you choose to.

masklinn · on Nov 19, 2015

> Being able to build your unikernel (micro)services using legacy things and then swapping out certain services for clean-slate versions seems like a much more palatable approach. This is why unikernels fit so well with the microservices approach — you only have to re-write things one piece at a time if you choose to.

That's kind of my point, in that worldview rump kernels are not really the future they're a transitional phase bridging the present and the future.

pyvpx · on Nov 19, 2015

you can't ignore "legacy" applications if your new technology needs acceptance and deployment first and foremost to get out of the toy stage.

besides, think of all the future cool stuff you can more easily build on top of rump kernel (almost a year in the IRC channel and I still can't recall how I'm supposed to refer to...it) rather than being "stuck" with just Erlang on LINC or Haskell on HalVM etc.

masklinn · on Nov 19, 2015

I'm not saying you can ignore legacy applications but "the future" implies it's the goal. If rump is only for legacy application until they're reworked or replaced it's not the future, it's a transitional tool, and the future doesn't contain rump.

djcapelis · on Nov 19, 2015

Often the two are one in the same. Just look at a modern x86 chip.

macavity23 · on Nov 19, 2015

This looks great. Hopefully this can be configured to set off a big fat alarm if any unexpected syscalls get called. Will be a huge security win.

cpuguy83 · on Nov 20, 2015

Well, seccomp already does this, no?

yeukhon · on Nov 19, 2015

So what if I need to run centos and ubuntu containers on a centos host?