Docker — Deep Dive

Under the hood: how Linux namespaces and cgroups power containers, the overlay filesystem, layer caching mechanics, and what Docker actually can't do for you.

What Docker Actually Is (And Isn’t)

Docker is not a virtualization technology. It’s a set of process isolation tools wrapped in a friendly interface. This distinction matters more than most tutorials admit — when Docker behaves unexpectedly, understanding what’s underneath is the only way to debug it properly.

At its core, Docker is three things:

A runtime (containerd) that manages the lifecycle of containers
A daemon (dockerd) that exposes a REST API for managing images and containers
A CLI (docker) that talks to that daemon

When you run docker run ubuntu, you’re making an API call to dockerd, which asks containerd to create a container from the ubuntu image, which in turn calls runc (the actual container runtime) to fork a new process with specific isolation properties.

Linux Primitives: Namespaces

Docker containers look isolated because Linux namespaces create separate views of system resources. There are currently seven namespace types Docker uses:

Namespace	Isolates
`pid`	Process IDs — container thinks it’s the only process tree
`net`	Network interfaces, routing tables, sockets
`mnt`	Filesystem mount points
`uts`	Hostname and domain name
`ipc`	Shared memory, semaphores, message queues
`user`	User and group IDs (rootless Docker uses this heavily)
`cgroup`	The container’s view of its own resource limits

When a container starts, the container process is placed into a new set of namespaces. From its perspective, it’s PID 1 with its own network stack on its own machine. From the host’s perspective, it’s just process #47823 with some unusual kernel flags set.

Here’s the thing most people miss: namespaces provide isolation, not security. A container running as root that somehow breaks out of its namespace has root on the host. This is why “don’t run containers as root” is a real security concern, not just best practice cargo-culting.

Linux Primitives: cgroups

While namespaces control what a process can see, control groups (cgroups) control what it can use.

cgroups v1 (still common) organize processes into a hierarchy. Each group has resource controllers — you can set limits on CPU time, memory consumption, block I/O, and network bandwidth. Docker translates --memory=512m into cgroup configuration: the container’s memory cgroup gets a hard limit of 512MB. Exceed it, and the kernel’s OOM killer starts terminating processes inside the container.

cgroups v2 (default on modern kernels) unified the controller hierarchy, fixing some nasty edge cases with nested containers that made Kubernetes operators suffer through the late 2010s.

Practically, this means:

docker run --cpus=0.5 --memory=256m my-app

…constrains the container to half a CPU core and 256MB RAM at the kernel level, not through application-level tricks. The process literally cannot allocate more memory even if it tries.

The Overlay Filesystem

Container storage is managed through a layered filesystem called OverlayFS (or its predecessor, AUFS). Understanding this explains Docker’s build cache behavior and why some disk usage patterns seem puzzling.

An OverlayFS mount has three components:

Lower layers (read-only): the image layers
Upper layer (read-write): the container’s writable layer
Merged view: what the container sees

When a container reads a file, OverlayFS checks the upper layer first, then searches lower layers in order. When a container writes to a file that exists only in a lower (read-only) layer, OverlayFS performs a copy-on-write: it copies the file into the upper layer, then modifies the copy. The original in the lower layer is untouched.

This is why multiple containers from the same image share most of their disk footprint — they all use the same read-only lower layers and only their individual upper layers differ.

The performance implication: Copy-on-write is cheap for small files but expensive for large ones. A container that modifies a 2GB database file will trigger a full 2GB copy into its upper layer before the first byte changes. This is why “don’t store persistent data inside containers” is actual technical advice, not just a convention.

Build Cache Mechanics

Docker’s layer cache is content-addressed: each layer is identified by a hash of the instruction that created it plus the hash of the layer below it. When you build:

FROM node:20-alpine
WORKDIR /app
COPY package.json .
RUN npm install
COPY . .
CMD ["node", "server.js"]

If nothing changes in package.json, the RUN npm install layer hash is identical to the previous build and Docker skips it. Change one character in package.json and the cache invalidates from that point downward — npm install reruns, and so does every instruction after it.

This is why layer ordering matters enormously for build performance. Dependencies that change infrequently go early; application code that changes constantly goes late. A naive Dockerfile that does COPY . . before RUN npm install forces npm install to re-run on every single source file change, even if package.json didn’t change.

Multi-stage builds extend this further:

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]

The final image contains only what the second stage copies in. Build tools, source files, test dependencies — gone. A typical React app goes from a 1.2GB build image to a 180MB production image this way.

What Docker Can’t Do For You

It doesn’t fix broken software. A container with a race condition has a race condition everywhere it runs, consistently. Reproducibility cuts both ways.

It doesn’t solve distributed systems problems. Connecting five containers so they can find each other, stay healthy, restart on failure, and handle traffic spikes is what Kubernetes (or Docker Swarm, or Nomad) exists for. Docker Compose handles this locally in development; at production scale you almost certainly need an orchestrator.

Linux containers don’t run natively on Mac or Windows. Docker Desktop runs a lightweight Linux VM transparently, which is why it feels native but occasionally surprises you with file system permission weirdness — you’re crossing a VM boundary. This is also why Docker Desktop requires licensing fees for larger organizations; the VM management tooling is proprietary.

Security isolation has a surface area. Shared kernel means a kernel vulnerability could affect all containers on a host. For true isolation between untrusted workloads (multi-tenant SaaS, running user-submitted code), tools like gVisor (Google) or Kata Containers add an additional sandboxing layer at the cost of performance.

The `containerd` Split

Pre-2017, Docker was a single monolith. Under pressure from Kubernetes — which needed a standardized container runtime — Docker split its runtime into containerd (donated to CNCF in 2017) and standardized the container runtime interface. Kubernetes removed direct Docker support in version 1.24 (2022), which caused widespread panic and was mostly misunderstood: Kubernetes still runs containers the exact same way, it just talks to containerd directly instead of going through the Docker daemon as a middleman.

This history matters because “Docker” in 2026 refers to several different things depending on context: the CLI tool, the daemon, the image format (OCI), the runtime, or the company. The OCI (Open Container Initiative) image format that Docker popularized is now a standard — images built with Docker run with Podman, containerd, or any OCI-compliant runtime without modification.

One thing to remember: Containers are isolated processes, not isolated machines. The Linux kernel underneath is always shared. Everything Docker does — the familiar CLI, the image layering, the port mappings — is ultimately a convenience layer over clone() syscalls and kernel namespaces that have existed since Linux 3.8.

dockercontainersdevopslinuxinfrastructurekubernetes

Docker — Deep Dive

What Docker Actually Is (And Isn’t)

Linux Primitives: Namespaces

Linux Primitives: cgroups

The Overlay Filesystem

Build Cache Mechanics

What Docker Can’t Do For You

The containerd Split

See Also

Related Topics

The `containerd` Split