Namespaces Explained
An engineer asks: "What is a container, actually?" The usual answer — "a lightweight VM" — is wrong. A container is not a virtualized machine. It is a normal Linux process that the kernel has lied to. The process sees a filesystem that is not the host's filesystem. It sees a network interface that does not exist outside of it. It sees PID 1 where there is no PID 1 on the host. It sees its own hostname, its own IPC, its own users. None of it is emulated — the kernel simply shows this process a custom view of reality.
The mechanism that enables those lies is namespaces. Seven of them, each isolating one kind of kernel resource. Once you understand namespaces as a list of seven things the kernel can lie about, containers stop being magic — they are a kernel feature you could invoke by hand from a shell. This lesson is that list: what each namespace isolates, how to see them on a running system, and how to build your own.
What a Namespace Is
A namespace is a kernel-level abstraction that isolates some class of global resource, so that the processes inside the namespace see a different instance of that resource than processes outside it. The kernel manages many global things: the set of PIDs, the set of mounts, the hostname, the routing table. A namespace says "make a separate set of these, visible only to members of this namespace."
Three operations define the interface:
clone()orunshare()with one or moreCLONE_NEW*flags — creates a new namespace for the calling process.setns()— move an existing process into an existing namespace.- Bind-mounting
/proc/[pid]/ns/<kind>— pin a namespace so it outlives its last process (the core trickip netns adduses).
Every process belongs to one namespace per kind. Linux has seven kinds:
| Namespace | CLONE_NEW* flag | What it isolates |
|---|---|---|
mnt | CLONE_NEWNS | The mount table — what is mounted where |
pid | CLONE_NEWPID | The PID number space — who is PID 1 |
net | CLONE_NEWNET | Network interfaces, routing tables, firewall rules, sockets |
ipc | CLONE_NEWIPC | System V IPC, POSIX message queues |
uts | CLONE_NEWUTS | Hostname and NIS domain name |
user | CLONE_NEWUSER | User and group IDs, capabilities |
cgroup | CLONE_NEWCGROUP | The view into /sys/fs/cgroup |
time (newer) | CLONE_NEWTIME | The monotonic and boot clocks (available but rarely used) |
A container is a process (and its children) that belongs to a set of namespaces different from the host's. That is the entire definition. Everything else a container runtime does — pulling images, setting up overlay mounts, configuring networking, applying seccomp filters — is plumbing around the core idea: clone a process with new namespaces, then make sure the inside looks the way you want.
Seeing Namespaces on a Live System
Every process has a /proc/[pid]/ns/ directory with a symlink per namespace kind. The symlink's target contains a unique inode — two processes with the same inode are in the same namespace.
# Your shell's namespaces
ls -l /proc/$$/ns
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 cgroup -> 'cgroup:[4026531835]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 ipc -> 'ipc:[4026531839]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 mnt -> 'mnt:[4026531840]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 net -> 'net:[4026531992]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 pid -> 'pid:[4026531836]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 time -> 'time:[4026531834]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 user -> 'user:[4026531837]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 uts -> 'uts:[4026531838]'
# Compare with a containerized process — find a container PID on the host
pgrep -f nginx | head -1 # pick any running container process
ls -l /proc/$PID/ns
# Some inodes differ — those are the namespaces the container runtime created
# Group processes by what namespace they live in
lsns # list every namespace on the system
# NS TYPE NPROCS PID USER COMMAND
# 4026531836 pid 245 1 root /lib/systemd/systemd
# 4026532420 pid 8 12345 65535 nginx: master
# 4026532421 net 8 12345 65535 nginx: master
# 4026532422 mnt 8 12345 65535 nginx: master
# ...
# Just one kind of namespace
lsns --type net
If two processes have the same inode for net:, they share a network namespace (same interfaces, same routing table). The host and a container almost always differ on mnt, pid, net, and uts; they may share ipc and user depending on the runtime.
The Seven Namespaces in Detail
mnt — Mount Namespace
Isolates the mount table. A process in its own mnt namespace sees a different set of mounts from the host — the same kernel, a different view of the filesystem tree.
This is what lets a container have / be its overlay rootfs while the host has / on nvme0n1p2. The container's mounts do not appear on the host, and the host's mounts do not appear in the container (unless the runtime explicitly bind-mounts them in).
# Inside the container's mount namespace
sudo nsenter -t $CPID -m -- cat /proc/self/mountinfo | head -5
# Different from the host's /proc/self/mountinfo
# Or create one by hand right now
sudo unshare -m bash
# Inside: every mount you make is invisible to the host
mount -t tmpfs tmpfs /tmp
findmnt /tmp
exit
# Back in the host: /tmp is still its original mount
pid — PID Namespace
Isolates the PID number space. The first process in a new PID namespace gets PID 1. It is also the namespace's init — if it dies, every process in the namespace dies with it.
# Create a new PID namespace — the shell inside is PID 1
sudo unshare --pid --fork --mount-proc bash
# Inside:
ps aux
# USER PID ... COMMAND
# root 1 ... bash <- our shell is PID 1
# root 8 ... ps
# nothing from the host is visible
# Exit the shell and the namespace is cleaned up
exit
Two important properties:
- PIDs in a child namespace are different numbers from the host. The same process has a different PID depending on which namespace is asking.
- The host can still see every process — it just sees them with their host-side PIDs.
/proc/$HOSTPID/statusshows theNSpid:field, which lists the PIDs this process has in each namespace it belongs to.
cat /proc/$PID/status | grep NSpid tells you the PID a process has in its own PID namespace. On the host, the container's "PID 1" process might be PID 18234; inside the container, the same process sees itself as PID 1. The NSpid: 18234 1 field captures both views.
net — Network Namespace
Isolates networking: interfaces, routing tables, ARP tables, firewall rules (iptables/nftables), sockets. Inside a fresh net namespace you see only the loopback (lo) and you need to add interfaces to it — typically by moving a veth pair in from the host.
# Create a network namespace with the iproute2 tool
sudo ip netns add demo
sudo ip netns list
# demo
# Run a command in it
sudo ip netns exec demo ip link
# 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN ...
# only lo — no eth0, no wireless, nothing
# Bring lo up inside it
sudo ip netns exec demo ip link set lo up
sudo ip netns exec demo ip addr
# 1: lo: <LOOPBACK,UP> ... inet 127.0.0.1/8 ...
# Clean up
sudo ip netns del demo
All container networking (Docker bridges, Kubernetes CNI plugins, Istio sidecars) is built on manipulations of network namespaces: create the namespace, add a virtual interface (veth) with one end inside and one end on the host, set up routing, apply iptables rules, done.
ipc — IPC Namespace
Isolates System V IPC (shared memory segments, message queues, semaphores) and POSIX message queues. Processes in different IPC namespaces cannot see each other's IPC objects.
This matters for legacy software that uses shmget() / msgget() / semget() — multiple instances in separate containers will not collide. For modern code, which mostly uses Unix sockets or shared memory via shm_open()/mmap(), this namespace rarely affects you.
ipcs # see IPC objects in your namespace
# ------ Message Queues --------
# ...
# ------ Shared Memory Segments --------
# ...
# ------ Semaphore Arrays --------
# ...
sudo unshare --ipc bash # new IPC namespace
ipcs # completely empty — host's IPC not visible
uts — UTS Namespace
Isolates hostname and NIS domain. Lets a container sethostname("web-01") without affecting the host.
hostname
# host.example.com
sudo unshare --uts bash
hostname container-1
hostname
# container-1 <- changed inside the namespace
exit
hostname
# host.example.com <- host unaffected
Small namespace, but every container runtime uses it — hostname inside a pod reporting the pod name or something else the runtime chose is pure UTS namespace at work.
user — User Namespace
The most powerful and dangerous one. Isolates user IDs and group IDs — a process can be UID 0 (root) inside the namespace while being a regular unprivileged user on the host. Combined with capabilities, this is how rootless containers work.
# Create a user namespace and map your host UID 1000 to UID 0 inside
unshare --user --map-root-user bash
# Inside:
id
# uid=0(root) gid=0(root) groups=0(root),...
cat /proc/self/uid_map
# 0 1000 1 <- inside UID 0 = host UID 1000
# But real privileges are still limited — can't chown host files
touch /etc/hosts
# Touch: cannot touch '/etc/hosts': Permission denied
exit
User namespaces are how Docker-rootless, Podman, and Kubernetes' user-namespace support let non-root host users run containers with "root" inside. It is a big deal for security: a container escape as "root-in-namespace" still lands on the host as UID 1000, limiting blast radius.
cgroup — Cgroup Namespace
Isolates the view into /sys/fs/cgroup. Processes in a cgroup namespace see their cgroup hierarchy rooted at their own cgroup, not the host's.
Before cgroup namespaces existed, a container could cat /proc/self/cgroup and see /docker/abc123... — exposing the container runtime's paths. With cgroup namespaces, it sees / — a clean, rooted view.
cat /proc/self/cgroup
# 0::/user.slice/user-1000.slice/session-3.scope
# Inside a container (with a cgroup namespace):
# cat /proc/self/cgroup
# 0::/ <- rooted at the container's own cgroup
time — Time Namespace
Newer (Linux 5.6+). Isolates CLOCK_MONOTONIC and CLOCK_BOOTTIME — the clocks that count forward from boot. Lets checkpoint/restore tools preserve monotonic time when a process moves between hosts.
Rarely used directly. Docker and most container runtimes do not use it today.
How Containers Assemble Them
A Docker container typically creates new namespaces for:
mnt— so the overlay rootfs is the container's/.pid— so the container has its own PID 1.net— so the container has its own interfaces (unless--network=host).uts— so the container can set its own hostname.ipc— for isolation of IPC objects.- Sometimes
user— for rootless mode or explicit user namespace mapping. - Always
cgroup— for a clean cgroup view.
# See what namespaces a specific container uses
docker run -d --name demo alpine sleep 1000
PID=$(docker inspect --format='{{.State.Pid}}' demo)
ls -l /proc/$PID/ns
# Compare to your shell
diff <(ls -l /proc/$$/ns | awk '{print $9, $11}') \
<(ls -l /proc/$PID/ns | awk '{print $9, $11}')
docker rm -f demo
The differences you will see are exactly the namespaces Docker chose to create a new instance of.
Tools That Work With Namespaces
unshare — run a command in new namespaces
# All common namespaces + new hostname + new PID tree
sudo unshare --uts --pid --fork --mount --mount-proc --ipc bash
# Now you are effectively in a mini-container:
hostname sandbox
ps aux
mount -t tmpfs tmpfs /mnt
exit
# Everything dissolves when the shell exits
nsenter — enter an existing namespace
The essential tool for debugging containers from the host.
# Enter all namespaces of a container's PID
sudo nsenter -t $PID -a
# Enter just the network namespace (super handy for debugging container networking)
sudo nsenter -t $PID -n ip addr
sudo nsenter -t $PID -n ss -tlnp
# Enter mount namespace to see what the container sees in the FS
sudo nsenter -t $PID -m ls /etc
sudo nsenter -t $CPID -n ip addr is the 30-second debug for "why can the container not reach X?" It drops you into the container's network namespace with host-side tools like ip, ss, tcpdump, ping, etc. — none of which you have to install inside the container image. Learning nsenter removes 80% of the pain of debugging minimal container images.
ip netns — dedicated to network namespaces
sudo ip netns add lab
sudo ip netns exec lab ip addr
sudo ip netns exec lab bash # drop into a shell in the namespace
sudo ip netns del lab
lsns — list every namespace
lsns # all of them
lsns --type net # just network namespaces
lsns -p 1 # what namespaces does PID 1 belong to
Namespaces Are Not Security
Namespaces isolate resource views. They do not by themselves sandbox a process. A process with CAP_SYS_ADMIN inside a mount namespace can mount arbitrary things. A process with CAP_NET_ADMIN in a network namespace can manipulate routes, and if it can see the host network (shared namespace), it can manipulate the host's. A process that can see /proc or /sys paths outside its namespace can read or write them.
Containers are secure because namespaces are combined with:
- Capabilities — dropping CAP_SYS_ADMIN, CAP_NET_ADMIN, etc. from the container's process.
- seccomp — a BPF filter that blocks specific syscalls (like
mount,reboot,kexec_load). - LSMs — AppArmor or SELinux policies that restrict what the process can touch.
- cgroups — resource limits, preventing one container from starving others.
- User namespaces — root-in-namespace mapped to unprivileged on the host.
- Read-only rootfs / no-new-privileges / dropped SUID — belt-and-suspenders.
A container escape is not a "namespace escape" — it is usually finding a syscall or kernel bug that bypasses one of these layers. Namespaces are the enabling primitive, not the security story.
Running a container with --privileged disables most of the extra layers: full capability set, no seccomp, writable /sys and /dev, host devices available. The namespaces are still there — so it still "looks" isolated — but a privileged container can trivially escape to the host. Avoid --privileged unless you genuinely need it, and when you do, treat the container's security as identical to running the process directly on the host.
Debugging With Namespaces
A few techniques you will reach for:
# Is this process in the same netns as the host? (are they in the same inode?)
sudo readlink /proc/1/ns/net /proc/$PID/ns/net
# net:[4026531992] <- if same, shared
# net:[4026532421] <- if different, isolated
# See every container's network connections from the host
for pid in $(pgrep -f containerd-shim | head); do
echo "=== PID $pid ==="
sudo nsenter -t $pid -n ss -tanlp 2>/dev/null
done
# Which container is a mystery process in?
grep -E '^[0-9]+' /proc/$PID/cgroup
# 0::/system.slice/docker-abc123deadbeef.scope
# Compare two processes' namespace memberships
diff <(ls -l /proc/$PID1/ns/ | awk '{print $9, $11}') \
<(ls -l /proc/$PID2/ns/ | awk '{print $9, $11}')
Key Concepts Summary
- Namespaces isolate kernel resources. One per kind: mount, PID, network, IPC, UTS, user, cgroup, (time).
- Processes belong to one namespace per kind.
/proc/[pid]/ns/<kind>is a symlink with a unique inode — same inode = same namespace. - Containers are processes in custom namespace sets. No virtualization — just the kernel lying consistently.
unsharecreates;nsenterenters;ip netnsmanages network namespaces. Learn all three.mntisolates the mount table;netisolates interfaces and routing;pidisolates the PID numbering;utsisolates the hostname. The others (ipc, user, cgroup, time) matter in specific situations.- User namespaces map UIDs. Root inside can be unprivileged outside — the basis of rootless containers.
- Namespaces are not security alone. They isolate views. Security comes from combining them with capabilities, seccomp, LSMs, cgroups, and rootfs hardening.
lsnslists every namespace;NSpidin /proc/[pid]/status shows nested PIDs. Core inspection tools.
Common Mistakes
- Treating namespaces as virtualization. They are view isolation — the kernel is still shared, and a bug in one namespace's kernel path affects everyone.
- Assuming the PID inside a container is the same as the host-side PID. They are almost always different — use
NSpid:in /proc/[pid]/status ordocker inspectto translate. - Debugging container networking from the host without
nsenter. You look at the wrong interfaces and get the wrong answer. - Running
--privilegedcontainers in production. It bypasses the layered security that makes namespaces useful and defeats the whole point. - Confusing user namespaces with UIDs. A user namespace is a mapping — a process can be root inside while being unprivileged outside, and that is a feature, not a hack.
- Thinking "a network namespace needs DNS configuration." The network namespace has nothing but
lountil you add interfaces to it — then it inherits whatever routing you set up, including resolver config in the container's/etc/resolv.conf(which comes from the mount namespace). - Deleting a network namespace with
ip netns delwhile processes are still inside it. They survive momentarily but lose all network access immediately. Usensenterto cleanly stop them first. - Believing that namespace isolation prevents filesystem access. A mount namespace shares the underlying storage; two containers on the same host can fight over page cache, I/O bandwidth, and disk space if they share storage. Namespaces isolate what you see, not what you share.
You have two running processes on a host. `readlink /proc/123/ns/net` returns `net:[4026532421]`, and `readlink /proc/456/ns/net` returns `net:[4026531992]`. What does that tell you, and what practical consequence does it have?