Containers Are Not VMs
A new engineer joins the team. They ssh into a production Kubernetes node and run
ps aux. Thousands of lines scroll past — nginx workers, Python processes, Redis, their own shell, and hundreds of processes they do not recognize. They ask the senior on call: "Are all these from the host, or from the containers?" The senior shrugs: "Both. Every container you see inkubectl get podsis just a group of processes down there." The new engineer looks confused. "But containers are isolated. They are like tiny VMs, right?"They are not. And carrying the "lightweight VM" mental model into production is the source of more confusion than almost any other misconception in modern infrastructure. Containers do not have their own kernel. They do not have their own network stack (unless you count a namespace-scoped slice of the host's). They do not boot. They have no BIOS, no UEFI, no init ramdisk, no bootloader. A container is a process — or a tree of processes — that the kernel has made look isolated by lying about what it can see. The kernel that runs inside the container is the same kernel running on the host.
Once you see that, everything else about Docker becomes easy. Images, volumes, networking, security, debugging — they all collapse into standard Linux concepts with a thin wrapper around them. This lesson is the mental reset.
The One-Picture Difference
VMs virtualize hardware. Each guest has its own kernel, its own memory manager, its own scheduler, its own device drivers. The hypervisor's job is to give every guest the illusion of its own computer.
Containers virtualize nothing. There is one kernel. There is one scheduler. There is one set of device drivers. The kernel uses namespaces and cgroups to show each container a custom view of the host — but underneath the view, every read(), every clone(), every page fault is handled by the same kernel for every container on the machine.
A VM runs a kernel. A container runs as a process. This is the single most important mental model shift. VMs have boot sequences, guest operating systems, and virtualized hardware. Containers have namespaces, cgroups, and a pivoted rootfs — plain Linux features you can invoke by hand from a shell. Every Docker concept collapses into this one distinction.
What Is Shared, What Is Not
| What | VM | Container |
|---|---|---|
| Kernel | Separate per guest | Shared with host and all other containers |
| Device drivers | In the guest kernel | In the host kernel |
| CPU scheduler | Guest schedules guest threads; host schedules guests | Host scheduler handles all container processes |
| Memory manager | Guest memory manager + hypervisor | Host kernel only |
| Filesystem | Entire disk image | Overlay layers on top of host filesystem |
| Network stack | Virtual NIC + guest TCP/IP | Host TCP/IP with namespaced interfaces |
| Boot time | Seconds to minutes (full OS boot) | Milliseconds (just a fork + exec) |
| Image size | GB (includes whole OS) | MB (just the app + deps) |
| Memory overhead | Hundreds of MB per VM | A few MB per container |
| Isolation strength | Strong (separate kernel) | Weaker (shared kernel = shared attack surface) |
The sharing is the whole point. Containers are fast because there is no kernel to boot, no drivers to load, no BIOS to initialize. They are small because you ship only the userspace. They are dense — you can run hundreds of containers on one host — because they cost almost nothing beyond the process itself.
The trade-off is the isolation story. A kernel bug that affects containers is still there for every container on the host. A container escape (a user inside a container gaining access to the host) usually comes from a kernel-level vulnerability, which is why some workloads still use VMs (or VM-per-container technologies like Firecracker, Kata, gVisor) for stronger isolation.
See It With Your Own Eyes
Everything we have said is testable. No special tools — just ps and some grep.
# Start a simple container in the background
docker run -d --name demo alpine sleep 1000
# Find its PID on the host (yes — the host can see the container's PID)
docker inspect --format='{{.State.Pid}}' demo
# 18342
# Confirm: the container's main process is a normal host process
ps -p 18342 -o pid,ppid,user,cmd
# PID PPID USER CMD
# 18342 18320 root sleep 1000
# The whole "container" is just a cgroup containing this PID
cat /proc/18342/cgroup
# 0::/system.slice/docker-abc123def456....scope
# And its namespaces (see Linux course Module 5)
ls -l /proc/18342/ns
# lrwxrwxrwx 1 root root 0 ... cgroup -> 'cgroup:[4026531835]'
# lrwxrwxrwx 1 root root 0 ... ipc -> 'ipc:[4026532152]'
# lrwxrwxrwx 1 root root 0 ... mnt -> 'mnt:[4026532150]'
# lrwxrwxrwx 1 root root 0 ... net -> 'net:[4026532155]'
# lrwxrwxrwx 1 root root 0 ... pid -> 'pid:[4026532153]'
# lrwxrwxrwx 1 root root 0 ... uts -> 'uts:[4026532151]'
# Compare the kernel seen from inside the container vs the host
docker exec demo uname -r
uname -r
# Both return the same number — the host kernel
docker rm -f demo
docker inspect --format='{{.State.Pid}}' NAME gives you the host-side PID for a container's main process. From there you can reach every debug tool you know — ps, cat /proc/[pid]/*, nsenter, strace — and operate on the container as if it were any other process on the host. Most "I cannot debug inside the container because the image is too minimal" problems disappear when you realize you do not have to go inside.
Uname: the smoking gun
# Start a container with a different distro than the host
docker run --rm ubuntu:22.04 uname -r
# Host
uname -r
# Both print the SAME kernel version.
# Ubuntu's container does NOT ship its own kernel. It uses yours.
This is the moment the "containers are VMs" theory breaks for most people. An Ubuntu container running on a Fedora host is not running Ubuntu's kernel — it is running Fedora's kernel, with Ubuntu's userspace (glibc, coreutils, apt) on top. The /lib/x86_64-linux-gnu/libc.so.6 inside the container is Ubuntu's, but every syscall that libc makes goes to the host's kernel.
This also explains why you cannot run a Windows container on a Linux host (or vice versa) — Windows userspace makes Windows syscalls; there is no Linux kernel under it to translate.
Why This Matters in Production
The "shared kernel" fact reshapes every production decision you make about containers.
- Kernel-level bugs affect every container on the host. Spectre, Meltdown, CVE-2022-0185 (cgroup v1 vulnerability), the dirty pipe vulnerability — a single kernel patch is all your containers for host-kernel patches.
- Resource limits are kernel features. When you say "limit this container to 2 CPUs and 4 GiB of memory," you are setting cgroup values. There is no guest kernel to convince — the host's cgroup controllers enforce the limit.
- Syscall profiles matter. seccomp is a list of allowed syscalls. A container is sandboxed by denying certain syscalls at the kernel. This is why
--privilegedis dangerous: it disables seccomp and gives the process full syscall access. - Docker is not a security boundary by itself. Kubernetes runs both trusted and semi-trusted workloads on the same kernel. For untrusted workloads you want a VM boundary — Firecracker microVMs, Kata Containers, AWS Lambda's Firecracker backend.
- Performance is almost native. Because there is no guest kernel or hypervisor, container CPU and memory performance is within 1–2% of bare metal. I/O performance depends on the driver (overlayfs, bind mounts, network); we cover this in Module 2 and 3.
A team ran their Kubernetes cluster on Ubuntu 20.04 nodes. The application ran in Alpine-based images. A critical musl-libc bug affected Alpine's DNS resolver specifically — but the ops team was sure the fix was "wait for Ubuntu to patch its kernel." Four days of production incidents later, someone pointed out that the resolver is in userspace (inside the container image), not the kernel. A one-line FROM-tag bump in the Dockerfile and a rebuild fixed the issue. The lesson: kernel bugs are fixed on the host; userspace bugs are fixed in the image. Knowing which is which is fundamental.
When You Want VMs, Not Containers
If your mental model was "containers = lightweight VMs," you never thought about when to choose a VM. With the shared-kernel model, the decision sharpens:
| Use a VM when… | Use a container when… |
|---|---|
| Running untrusted code (customer workloads on a shared host) | Running your own code at scale |
| You need a different OS or kernel version | Your workload runs fine on the host kernel |
| You need stronger security isolation (PCI, HIPAA, defense) | You accept shared-kernel risk for speed/density |
| You need hardware emulation (GPU passthrough, specific NIC behavior) | You can run on the host's hardware directly |
| Boot once per hour, run for days | Start in ms, die in seconds, repeat |
| Memory/CPU overhead is acceptable | You need to pack hundreds of workloads on a node |
Most modern infrastructure uses both. Kubernetes nodes are VMs (for cloud isolation from the provider). The workloads inside those VMs are containers (for fast deploy and density). Firecracker-based systems do container-per-microVM, getting both benefits at the cost of more complexity.
Where This Course Fits
We spent the Linux Fundamentals course on the primitives — namespaces and cgroups (Module 5). Docker is the automation on top. Specifically:
- Module 2 (Images) — how Docker packages your userspace so the "no guest kernel" model works in practice.
- Module 3 (Running containers) — what
docker rundoes in namespace/cgroup terms. - Module 4 (Compose) — how multi-container apps share the host.
- Module 5 (Security) — dropping privileges, seccomp, and image hardening.
- Module 6 (Debugging) — using the host's tools to diagnose container issues.
If you have not read the Linux course's Module 5, skim it after this lesson. The two courses interlock: Linux gives you the primitives; Docker automates them.
Key Concepts Summary
- Containers are processes, not VMs. They share the host kernel. There is no guest OS.
- VMs virtualize hardware; containers virtualize views. The kernel uses namespaces and cgroups to show each container a tailored slice of the host.
uname -rinside a container returns the host kernel. This is the clearest proof the kernel is shared.- Shared kernel means shared attack surface. Kernel CVEs affect every container on the host. Userspace CVEs live in the image.
- Containers are fast because they skip the kernel. No boot, no drivers, no BIOS — just a fork + exec + cgroup assignment.
- Resource limits are cgroup values. Set by the host kernel, enforced by the host kernel. There is no guest to negotiate with.
- For untrusted workloads or exotic OS needs, choose VMs. For your own code at scale, containers win on every axis except strong isolation.
- Debug containers with host tools.
ps,cat /proc/[pid]/*,nsenter,strace— they all work because the container is just a process.
Common Mistakes
- Saying "the container's OS" when you mean "the image's userspace." The image contains libc, coreutils, package managers — but no kernel.
- Assuming container resource limits are separate from host limits. They are not — they are host-kernel cgroups, enforced by the host.
- Treating the container as a black box that needs its own monitoring. Standard
ps,top,iostat,ss,dmesgon the host see everything. - Running
--privilegedbecause "containers are isolated anyway."--privilegeddisables the kernel-level isolation (seccomp, capability drops, some namespaces' restrictions) and is effectively running the process on the host. - Expecting a Windows container to run on a Linux host. The kernel is Linux; a Windows binary's syscalls have nothing to call.
- Forgetting that overlayfs, the thing that makes container images layered, is a host kernel feature. If it's buggy on your host, every container on the host is affected.
- Using "Docker" as a synonym for "container." Docker is one implementation of the OCI spec; containerd, CRI-O, podman, and runc are others. The underlying primitives are identical.
You run `uname -r` on a Ubuntu 22.04 host and get `6.5.0-generic`. You then run `docker run --rm alpine:3.19 uname -r`. What do you expect to see, and what does it tell you?