Docker & Container Fundamentals

Containers Are Not VMs

A new engineer joins the team. They ssh into a production Kubernetes node and run ps aux. Thousands of lines scroll past: nginx workers, Python processes, Redis, their own shell, and hundreds of processes they do not recognize. They ask the senior on call: "Are all these from the host, or from the containers?" The senior shrugs: "Both. Every container you see in kubectl get pods is just a group of processes down there." The new engineer looks confused. "But containers are isolated. They are like tiny VMs, right?"

They are not. And carrying the "lightweight VM" mental model into production is the source of more confusion than almost any other misconception in modern infrastructure. Containers do not have their own kernel. They do not have their own network stack (unless you count a namespace-scoped slice of the host's). They do not boot. They have no BIOS, no UEFI, no init ramdisk, no bootloader. A container is a process, or a tree of processes, that the kernel has made look isolated by lying about what it can see. The kernel that runs inside the container is the same kernel running on the host.

Once you see that, everything else about Docker becomes easy. Images, volumes, networking, security, debugging, they all collapse into standard Linux concepts with a thin wrapper around them. This lesson is the mental reset.

The One-Picture Difference

VMs virtualize hardware. Each guest has its own kernel, its own memory manager, its own scheduler, its own device drivers. The hypervisor's job is to give every guest the illusion of its own computer.

Containers virtualize nothing. There is one kernel. There is one scheduler. There is one set of device drivers. The kernel uses namespaces and cgroups to show each container a custom view of the host, but underneath the view, every read(), every clone(), every page fault is handled by the same kernel for every container on the machine.

KEY CONCEPT

A VM runs a kernel. A container runs as a process. This is the single most important mental model shift. VMs have boot sequences, guest operating systems, and virtualized hardware. Containers have namespaces, cgroups, and a pivoted rootfs, plain Linux features you can invoke by hand from a shell. Every Docker concept collapses into this one distinction.

What Is Shared, What Is Not

What	VM	Container
Kernel	Separate per guest	Shared with host and all other containers
Device drivers	In the guest kernel	In the host kernel
CPU scheduler	Guest schedules guest threads; host schedules guests	Host scheduler handles all container processes
Memory manager	Guest memory manager + hypervisor	Host kernel only
Filesystem	Entire disk image	Overlay layers on top of host filesystem
Network stack	Virtual NIC + guest TCP/IP	Host TCP/IP with namespaced interfaces
Boot time	Seconds to minutes (full OS boot)	Milliseconds (just a fork + exec)
Image size	GB (includes whole OS)	MB (just the app + deps)
Memory overhead	Hundreds of MB per VM	A few MB per container
Isolation strength	Strong (separate kernel)	Weaker (shared kernel = shared attack surface)

The sharing is the whole point. Containers are fast because there is no kernel to boot, no drivers to load, no BIOS to initialize. They are small because you ship only the userspace. They are dense, you can run hundreds of containers on one host, because they cost almost nothing beyond the process itself.

The trade-off is the isolation story. A kernel bug that affects containers is still there for every container on the host. A container escape (a user inside a container gaining access to the host) usually comes from a kernel-level vulnerability, which is why some workloads still use VMs (or VM-per-container technologies like Firecracker, Kata, gVisor) for stronger isolation.

See It With Your Own Eyes

Everything we have said is testable. No special tools, just ps and some grep.

# Start a simple container in the background
docker run -d --name demo alpine sleep 1000

# Find its PID on the host (yes — the host can see the container's PID)
docker inspect --format='{{.State.Pid}}' demo
# 18342

# Confirm: the container's main process is a normal host process
ps -p 18342 -o pid,ppid,user,cmd
#   PID  PPID USER     CMD
# 18342 18320 root     sleep 1000

# The whole "container" is just a cgroup containing this PID
cat /proc/18342/cgroup
# 0::/system.slice/docker-abc123def456....scope

# And its namespaces (see Linux course Module 5)
ls -l /proc/18342/ns
# lrwxrwxrwx 1 root root 0 ... cgroup -> 'cgroup:[4026531835]'
# lrwxrwxrwx 1 root root 0 ... ipc    -> 'ipc:[4026532152]'
# lrwxrwxrwx 1 root root 0 ... mnt    -> 'mnt:[4026532150]'
# lrwxrwxrwx 1 root root 0 ... net    -> 'net:[4026532155]'
# lrwxrwxrwx 1 root root 0 ... pid    -> 'pid:[4026532153]'
# lrwxrwxrwx 1 root root 0 ... uts    -> 'uts:[4026532151]'

# Compare the kernel seen from inside the container vs the host
docker exec demo uname -r
uname -r
# Both return the same number — the host kernel

docker rm -f demo

PRO TIP

docker inspect --format='{{.State.Pid}}' NAME gives you the host-side PID for a container's main process. From there you can reach every debug tool you know, ps, cat /proc/[pid]/*, nsenter, strace, and operate on the container as if it were any other process on the host. Most "I cannot debug inside the container because the image is too minimal" problems disappear when you realize you do not have to go inside.

Uname: the smoking gun

# Start a container with a different distro than the host
docker run --rm ubuntu:22.04 uname -r

# Host
uname -r

# Both print the SAME kernel version.
# Ubuntu's container does NOT ship its own kernel. It uses yours.

This is the moment the "containers are VMs" theory breaks for most people. An Ubuntu container running on a Fedora host is not running Ubuntu's kernel: it is running Fedora's kernel, with Ubuntu's userspace (glibc, coreutils, apt) on top. The /lib/x86_64-linux-gnu/libc.so.6 inside the container is Ubuntu's, but every syscall that libc makes goes to the host's kernel.

This also explains why you cannot run a Windows container on a Linux host (or vice versa). Windows userspace makes Windows syscalls; there is no Linux kernel under it to translate.

Why This Matters in Production

The "shared kernel" fact reshapes every production decision you make about containers.

Kernel-level bugs affect every container on the host. Spectre, Meltdown, CVE-2022-0185 (cgroup v1 vulnerability), the dirty pipe vulnerability, a single kernel patch is all your containers for host-kernel patches.
Resource limits are kernel features. When you say "limit this container to 2 CPUs and 4 GiB of memory," you are setting cgroup values. There is no guest kernel to convince, the host's cgroup controllers enforce the limit.
Syscall profiles matter. seccomp is a list of allowed syscalls. A container is sandboxed by denying certain syscalls at the kernel. This is why --privileged is dangerous: it disables seccomp and gives the process full syscall access.
Docker is not a security boundary by itself. Kubernetes runs both trusted and semi-trusted workloads on the same kernel. For untrusted workloads you want a VM boundary: Firecracker microVMs, Kata Containers, AWS Lambda's Firecracker backend.
Performance is almost native. Because there is no guest kernel or hypervisor, container CPU and memory performance is within 1-2% of bare metal. I/O performance depends on the driver (overlayfs, bind mounts, network); we cover this in Module 2 and 3.

WAR STORY

A team ran their Kubernetes cluster on Ubuntu 20.04 nodes. The application ran in Alpine-based images. A critical musl-libc bug affected Alpine's DNS resolver specifically, but the ops team was sure the fix was "wait for Ubuntu to patch its kernel." Four days of production incidents later, someone pointed out that the resolver is in userspace (inside the container image), not the kernel. A one-line FROM-tag bump in the Dockerfile and a rebuild fixed the issue. The lesson: kernel bugs are fixed on the host; userspace bugs are fixed in the image. Knowing which is which is fundamental.

When You Want VMs, Not Containers

If your mental model was "containers = lightweight VMs," you never thought about when to choose a VM. With the shared-kernel model, the decision sharpens:

Use a VM when…	Use a container when…
Running untrusted code (customer workloads on a shared host)	Running your own code at scale
You need a different OS or kernel version	Your workload runs fine on the host kernel
You need stronger security isolation (PCI, HIPAA, defense)	You accept shared-kernel risk for speed/density
You need hardware emulation (GPU passthrough, specific NIC behavior)	You can run on the host's hardware directly
Boot once per hour, run for days	Start in ms, die in seconds, repeat
Memory/CPU overhead is acceptable	You need to pack hundreds of workloads on a node

Most modern infrastructure uses both. Kubernetes nodes are VMs (for cloud isolation from the provider). The workloads inside those VMs are containers (for fast deploy and density). Firecracker-based systems do container-per-microVM, getting both benefits at the cost of more complexity.

Where This Course Fits

We spent the Linux Fundamentals course on the primitives, namespaces and cgroups (Module 5). Docker is the automation on top. Specifically:

Module 2 (Images): how Docker packages your userspace so the "no guest kernel" model works in practice.
Module 3 (Running containers): what docker run does in namespace/cgroup terms.
Module 4 (Compose): how multi-container apps share the host.
Module 5 (Security): dropping privileges, seccomp, and image hardening.
Module 6 (Debugging): using the host's tools to diagnose container issues.

If you have not read the Linux course's Module 5, skim it after this lesson. The two courses interlock: Linux gives you the primitives; Docker automates them.

Key Concepts Summary

Containers are processes, not VMs. They share the host kernel. There is no guest OS.
VMs virtualize hardware; containers virtualize views. The kernel uses namespaces and cgroups to show each container a tailored slice of the host.
uname -r inside a container returns the host kernel. This is the clearest proof the kernel is shared.
Shared kernel means shared attack surface. Kernel CVEs affect every container on the host. Userspace CVEs live in the image.
Containers are fast because they skip the kernel. No boot, no drivers, no BIOS, just a fork + exec + cgroup assignment.
Resource limits are cgroup values. Set by the host kernel, enforced by the host kernel. There is no guest to negotiate with.
For untrusted workloads or exotic OS needs, choose VMs. For your own code at scale, containers win on every axis except strong isolation.
Debug containers with host tools. ps, cat /proc/[pid]/*, nsenter, strace, they all work because the container is just a process.

Common Mistakes

Saying "the container's OS" when you mean "the image's userspace." The image contains libc, coreutils, package managers, but no kernel.
Assuming container resource limits are separate from host limits. They are not, they are host-kernel cgroups, enforced by the host.
Treating the container as a black box that needs its own monitoring. Standard ps, top, iostat, ss, dmesg on the host see everything.
Running --privileged because "containers are isolated anyway." --privileged disables the kernel-level isolation (seccomp, capability drops, some namespaces' restrictions) and is effectively running the process on the host.
Expecting a Windows container to run on a Linux host. The kernel is Linux; a Windows binary's syscalls have nothing to call.
Forgetting that overlayfs, the thing that makes container images layered, is a host kernel feature. If it's buggy on your host, every container on the host is affected.
Using "Docker" as a synonym for "container." Docker is one implementation of the OCI spec; containerd, CRI-O, podman, and runc are others. The underlying primitives are identical.

KNOWLEDGE CHECK

You run `uname -r` on a Ubuntu 22.04 host and get `6.5.0-generic`. You then run `docker run --rm alpine:3.19 uname -r`. What do you expect to see, and what does it tell you?

Continue

The Three Primitives: Namespaces, cgroups, chroot

←→ navigateM toggle sidebar