If you take a look at LXC and LXD, I would very much argue you can use them as a security boundary. One of the main problems with Docker is that the most powerful isolation primitive available in Linux -- user namespaces -- is not used by default and doesn't fully utilise the underlying feature. LXC uses unprivileged user namespaces by default, and LXD defaults to user namespaces as well. You can even isolate containers from each other with a basic config option.
All of that being said, this bug is caused by container runtimes trusting the rootfs too much. This is something I've been trying to improve but it's definitely not a trivial problem (lots of things require trusting the container processes in specific ways due to limitations in the corresponding kernel APIs -- though I am working on fixing those too).
Apparently even user namespaces can't be trusted for secure isolation, so much so that Arch Linux even has them disabled by default[1]. That said, it's possible that security improved since then, and I don't know when the most recent user namespace vulnerability was found.
That mail is outdated. Arch, like some other distros such as debian, now applies a kernel patch that allows toggling userns support via kernel.unprivileged_userns_clone sysctl.
Oh, I'm aware that you can toggle it via sysctl, but it's still not on by default. That said, I can't find any user namespace CVE from 2019, only 2018, so maybe it's safe enough now. I guess "safe enough" is the keyword. If you really worry about the kernel's attack surface, you'll use a separation kernel, VMs, or separate machines altogether.
The issue is not that user namespaces cannot be used for secure isolation -- the problem is that it has been used for privilege escalation in the past. It definitely is more secure than it was 5+ years ago and there are ways of restricting its use on running systems through a couple of sysctls (in addition to the out-of-tree patch that Debian and Arch use).
But, in the case of running things in containers, you can stop exploits of user namespaces through seccomp filters that block unshare(CLONE_NEWUSER) -- Docker does this by default.
I wish it were possible to run Docker as a regular user, or run a separate Docker in Docker in CI (I assume the Docker CI runners on things like Gitlab are running as root or shared via `-v /var/run/docker.sock:/var/run/docker.sock` since Docker-in-Docker is only recommended for actually developing Docker)
It is possible to run Docker in Docker in CI. At a previous job I built containers that ran docker as Bamboo build agents. The containers did not use the docker socket and instead had their own and their own `/var/lib/docker` directory. However, the containers have to run docker as root (I started docker and then dropped privileges to run the bamboo agent) and have to run with the `--privileged` option. The advantage of doing it that way was that the hosts image storage was cleaned up with the containers and separate from the host. Disadvantage was that you have to use loopback based storage which makes docker a little slower. I don't think there's a huge difference in security since docker would end up being accessible via the socket anyway and by dropping privileges for the build agent you're losing the capabilities that you get from `--privileged`.
The issue is that if you want to communicate with the outside world you need to create a network bridge, which only a sufficiently privileged user on the host system can do.
An unprivileged-user docker daemon would be limited to either communicate with an isolated network namespace on the parent side or do userspace forwarding of network traffic. Or it would require a privileged helper for the network parts.
Containerization certainly ought to be an isolation layer to boost security. What's unfortunate is how little we can rely on docker to provide any additional security.
Given that docker is simply an abstraction of the various kernel functions it is arguable that eliminating the docker piece and directly managing the kernel functions would be more secure.
Problem there is it requires a significant learning curve and when you get to that point you start evaluating the ROI of containers. Deployment is easy. Operations is easy. Until either one of the two fail. Then debugging becomes a seriously complex problem.
There are fairly few vulnerabilities discovered in docker itself.
The last one was in runc, the underlying container executor... these are very hard to get right.
This is kind of a stinky one, though because docker runs in the root context (unless you are experimenting with the rootless docker mode).
You could take this same argument to absurd extremes: the kernel is just an abstraction over the hardware, surely you could ditch the kernel and manage the hardware yourself and it will be more secure.
The reality is, in both cases, no you can't. Doing this stuff right requires expertise, and generally need more than one or two people looking at it.
Absolutely correct. I've got a cross-compile toolchain set up right now that needs Debian (Ubuntu won't do because of different glibc versions in the cross-compile toolchains). I used debootstrap to make a basic Debian Stretch filesystem, chroot into that, and apt-get the remaining pieces and run the build. Works like a charm, no Docker required. And, unlike containers, it's intentionally persistent so future builds go very quickly.
Containers' raison d'être is to provide isolation. Just like chroot, they are used for security.
If the security mechanism fails, it's good to have defense-in-depth, as usual.
From reading the article it appears the issue is in Docker, not the container mechanism offered by the kernel upon which Docker builds.
Isolation doesn't mean secure. Some containerisation solutions are designed to provide a security mechanism. Other containerisation solutions, such as Docker, are designed as a development and/or orchestration tool.
I do agree that a it's important to have defence in depth
His statement was: "but containers and Docker specifically shouldn't be used to isolate systems for security".
Well, thousands of companies, such as ISPs offering VPS are using containers for exactly that reason. Containers use cgroups under the hood, a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. As long as there aren't any bugs in the kernel related to cgroups, security is provided.
Saying containers shouldn't be used for security is like saying kernel functions shouldn't be used for security.
I'm not suggesting all containers shouldn't be used for security. I'm just saying docker - specifically - isn't an isolation tool that's was designed primarily as a security tool. There are other solutions on Linux if you want security - such as LXC and OpenVZ - which I suspect is what those ISPs you're referring to use.
The problem with docker isn't at the kernel level, it's the userspace tooling. It's pretty insecure by default. For example it creates bridged networks as the default network interface and actively encourages (by design) developers to run code as root (since creating non-root users then becomes a manual RUN command). Then you have vulnerabilities in the user space tools to contend with in addition to the same concerns about sharing a kernel that crop up when discussing security and containerisation. That said, there are some stuff it does right from a security standpoint but generally speaking docker is a tool you need to harden rather than something that comes hardened.
I don't hate docker though. It's a great productivity tool and it can be run securely if you have proper defence of depth. But I would advise against running docker as your only sandboxing. To be honest, I'd advise security at all levels regardless of the docker discussion anyway.
You do get container-based VPSes too. Though you're right they wouldn't be docker. Usually OpenVZ (last time I checked) but I've not kept up to date with LXC development.
The Unix process, with its uids, is also a form of isolation. But most reasonable people would guess that there are undiscovered privilege escalation bugs in any given kernel and thus be careful who is allowed to put code on a machine.
If you put docker inside a VM, and your hypervisor is running in a zone, and you have different zones based on ”role”. Then of course you get the benefits of the zone and the hypervisor.
The parent said “docker solves deployment, not isolation”- if you get your isolation another way then there’s no issue with using docker.
>Run well written software. As a user. In a cgroup. With SELinux. On a VM. On Different Tin. With a security monitoring. Patch.
The analogy you're trying for is surely not that this is as likely to solve the deployement problem for most people just as "Eat food. Not too much. Mostly Plants" is to solve the obesity epidemic for most people ? Not at all ?
There are things that can be done to enhance the defaults that docker currently provides (b/c defaults are hard to change when you have millions of users), but a process running in a default docker container is absolutely more secure than a process running outside of a container.
It should be used to ease deployment.