Thanks I'd missed that. A quite different codebase then, but still under the Doc...

monocasa · on Dec 3, 2021

It's the same codebase AFAIK, just ported to multiple kernels.

I think part of the issue here is how overloaded the term "bare metal" has become. In context I believe the parent is contrasting unikernel on hypervisor to container on kernel, seeing one coarse privilege boundary in both cases, reducing the obvious benefits of a unikernel design.

MaxBarraclough · on Dec 3, 2021

> It's the same codebase AFAIK, just ported to multiple kernels.

Presumably plenty of the code is specific to Windows.

Docker/Linux works by using various features of the Linux kernel to, well, contain, containers. As I understand it this is in contrast to BSD Jails and Solaris Zones, where the whole job is done 'officially' by the kernel.

> In context I believe the parent is contrasting unikernel on hypervisor to container on kernel, seeing one coarse privilege boundary in both cases, reducing the obvious benefits of a unikernel design.

Good point.

monocasa · on Dec 4, 2021

> Presumably plenty of the code is specific to Windows.

There's some code specific to windows but way less than you might think.

> Docker/Linux works by using various features of the Linux kernel to, well, contain, containers. As I understand it this is in contrast to BSD Jails and Solaris Zones, where the whole job is done 'officially' by the kernel.

Yes, that's techincally true, but doesn't really matter much to the discussion at hand at the end of the day. In all these cases (including if solaris zone/freebsd jail support was added to docker), docker has the same amount of work and it looks very similar. The containerization features of linux are absolutely significantly more ad hoc than NT, solaris, or freebsd, but at the end of the day a complete user space component for managing those containers has the same amount of work, and that work can look very very similar. The difference mainly comes in

* What happens for an incomplete container managing daemon. When Linux presents a new resource to be containerized, it's historically been a new interface and needs support in the container managing daemon like linux to be forked off of the root namespace or something. The systems with a cohesive concept of container have a better chance of by default breaking that resource away from the root namespace without configuration. A complete implementation in both cases requires manual config and setup.

* Linux's containerizing features can potentially be split and used in ways not originally intended. You can sort of see this in clone(2) as well. There fork and new thread have been coalesced to the same syscall, you just provide a bitset of resources to share or not. Share nothing? Pretty much a fork. Share virtual address space, file descriptor table, tgid, etc? You've just created a thread. Because of that you can do real neat things like sharing a virtual address space but not the file descriptor table if that's useful to you for some reason.

At the end of the day, it's more or less the same work for docker regardless of the mechanism used to push the container's config to the kernel.

MaxBarraclough · on Dec 4, 2021

Interesting, thanks.

Do you work on Docker internals?