Like, copyfail and dirtyfrag would punch through containers, but also punch through SELinux.
User namespaces and optionally limited capabilities severely limit the usefulness of both of these exploits. K8s containers with user namespaces or rootless podman prevent host-root and only allow elevating to container root (host uid != 0) and cross container cache pollution (jump to other containers that use the same base image?)
Copyfail would punch through user namespaces to get root straight on the host. User namespaces only really protect you against vulnerabilities in non kernel applications.
Limited capibilities/seccomp policies did help, though. In my admittedly limited testing, some of the vulnerabilities wouldn’t work in podman, but they would work in docker. This wasn’t due to user namespaces, but this was due to podman having stricter capibilities/seccomp policies than docker by default.
This implies that even if you were using docker rootless, they still would have been able to break out and get root in one go.
User namespaces don’t add that much security, in my opinion. Assuming your container has a non root user inside, adding user namespaces just changes the amount of cve’s/zerodays from 2 to maybe 3:
With a rootful container it’s:
Escalate to root (can be done after or before container escape)
Escape container (can be done after or before escalation to root)
With user namespaces it becomes:
Maybe escalate to root within the container first to get privileges or access to binaries needed to take advantage of a container escape exploit
Escape container
Escalate to root
User namespaces are like every other Linux security solution, they are extremely complex, hard to configure, and they don’t actually add that much security for the trouble The article I linked above has a section about them:
Another example of these features is user namespaces. User namespaces allow unprivileged users to interact with lots of kernel code that is normally reserved for the root user. It adds a massive amount of networking, mount, etc. functionality as new attack surface. It has also been the cause of numerous privilege escalation vulnerabilities, which is why many distributions, such as Debian, had started to restrict access to this functionality by default
Their complexity makes them difficult to secure and execute properly, and adds a ton of attack surface to the kernel.
I mostly use docker and rootfull podman for everything. You already need a CVE/zeroday to do a container break out in the first place, so just keep your runtimes up to date and you should be good. If you really care about being proactive with security, and trying to preemptively prevent issues, user namespaces are not really a good solution, better is just to use a VM container runtime like kata or microvm, or a userspace kernel like gvisor or syd. They are pretty easy to use. You can just set them as your container runtime, in docker, podman, or kubes, and things will mostly just work. Those (and other kernel isolation solutions) would have actually beaten dirtyfrag, copyfail, and the like of recent vulns.
Small correction:
User namespaces and optionally limited capabilities severely limit the usefulness of both of these exploits. K8s containers with user namespaces or rootless podman prevent host-root and only allow elevating to container root (host uid != 0) and cross container cache pollution (jump to other containers that use the same base image?)
Kind of.
Copyfail would punch through user namespaces to get root straight on the host. User namespaces only really protect you against vulnerabilities in non kernel applications.
Limited capibilities/seccomp policies did help, though. In my admittedly limited testing, some of the vulnerabilities wouldn’t work in podman, but they would work in docker. This wasn’t due to user namespaces, but this was due to podman having stricter capibilities/seccomp policies than docker by default.
This implies that even if you were using docker rootless, they still would have been able to break out and get root in one go.
User namespaces don’t add that much security, in my opinion. Assuming your container has a non root user inside, adding user namespaces just changes the amount of cve’s/zerodays from 2 to maybe 3:
With a rootful container it’s:
With user namespaces it becomes:
User namespaces are like every other Linux security solution, they are extremely complex, hard to configure, and they don’t actually add that much security for the trouble The article I linked above has a section about them:
Their complexity makes them difficult to secure and execute properly, and adds a ton of attack surface to the kernel.
Dirty frag, for example, was using user namespaces as one of the ways it would escalate. Most container runtimes restrict user namespace creation within user namespaced containers (via seccomp/capabilities), so running dirtyfrag in a container wouldn’t have worked. But, at the same time, dirtyfrag is only possible in the first place because of the attack surface user namespaces cause.
I mostly use docker and rootfull podman for everything. You already need a CVE/zeroday to do a container break out in the first place, so just keep your runtimes up to date and you should be good. If you really care about being proactive with security, and trying to preemptively prevent issues, user namespaces are not really a good solution, better is just to use a VM container runtime like kata or microvm, or a userspace kernel like gvisor or syd. They are pretty easy to use. You can just set them as your container runtime, in docker, podman, or kubes, and things will mostly just work. Those (and other kernel isolation solutions) would have actually beaten dirtyfrag, copyfail, and the like of recent vulns.