Linux Fundamentals for Engineers

Namespaces Explained

An engineer asks: "What is a container, actually?" The usual answer, "a lightweight VM", is wrong. A container is not a virtualized machine. It is a normal Linux process that the kernel has lied to. The process sees a filesystem that is not the host's filesystem. It sees a network interface that does not exist outside of it. It sees PID 1 where there is no PID 1 on the host. It sees its own hostname, its own IPC, its own users. None of it is emulated, the kernel simply shows this process a custom view of reality.

The mechanism that enables those lies is namespaces. Seven of them, each isolating one kind of kernel resource. Once you understand namespaces as a list of seven things the kernel can lie about, containers stop being magic, they are a kernel feature you could invoke by hand from a shell. This lesson is that list: what each namespace isolates, how to see them on a running system, and how to build your own.

What a Namespace Is

A namespace is a kernel-level abstraction that isolates some class of global resource, so that the processes inside the namespace see a different instance of that resource than processes outside it. The kernel manages many global things: the set of PIDs, the set of mounts, the hostname, the routing table. A namespace says "make a separate set of these, visible only to members of this namespace."

Three operations define the interface:

clone() or unshare() with one or more CLONE_NEW* flags, creates a new namespace for the calling process.
setns(), move an existing process into an existing namespace.
Bind-mounting /proc/[pid]/ns/<kind>, pin a namespace so it outlives its last process (the core trick ip netns add uses).

Every process belongs to one namespace per kind. Linux has seven kinds:

Namespace	`CLONE_NEW*` flag	What it isolates
`mnt`	`CLONE_NEWNS`	The mount table, what is mounted where
`pid`	`CLONE_NEWPID`	The PID number space, who is PID 1
`net`	`CLONE_NEWNET`	Network interfaces, routing tables, firewall rules, sockets
`ipc`	`CLONE_NEWIPC`	System V IPC, POSIX message queues
`uts`	`CLONE_NEWUTS`	Hostname and NIS domain name
`user`	`CLONE_NEWUSER`	User and group IDs, capabilities
`cgroup`	`CLONE_NEWCGROUP`	The view into `/sys/fs/cgroup`
`time` (newer)	`CLONE_NEWTIME`	The monotonic and boot clocks (available but rarely used)

KEY CONCEPT

A container is a process (and its children) that belongs to a set of namespaces different from the host's. That is the entire definition. Everything else a container runtime does, pulling images, setting up overlay mounts, configuring networking, applying seccomp filters, is plumbing around the core idea: clone a process with new namespaces, then make sure the inside looks the way you want.

Seeing Namespaces on a Live System

Every process has a /proc/[pid]/ns/ directory with a symlink per namespace kind. The symlink's target contains a unique inode, two processes with the same inode are in the same namespace.

# Your shell's namespaces
ls -l /proc/$$/ns
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 cgroup -> 'cgroup:[4026531835]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 ipc    -> 'ipc:[4026531839]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 mnt    -> 'mnt:[4026531840]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 net    -> 'net:[4026531992]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 pid    -> 'pid:[4026531836]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 time   -> 'time:[4026531834]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 user   -> 'user:[4026531837]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 uts    -> 'uts:[4026531838]'

# Compare with a containerized process — find a container PID on the host
pgrep -f nginx | head -1                   # pick any running container process
ls -l /proc/$PID/ns
# Some inodes differ — those are the namespaces the container runtime created

# Group processes by what namespace they live in
lsns                         # list every namespace on the system
# NS         TYPE   NPROCS  PID USER   COMMAND
# 4026531836 pid        245    1 root   /lib/systemd/systemd
# 4026532420 pid          8 12345 65535 nginx: master
# 4026532421 net          8 12345 65535 nginx: master
# 4026532422 mnt          8 12345 65535 nginx: master
# ...

# Just one kind of namespace
lsns --type net

If two processes have the same inode for net:, they share a network namespace (same interfaces, same routing table). The host and a container almost always differ on mnt, pid, net, and uts; they may share ipc and user depending on the runtime.

The Seven Namespaces in Detail

`mnt`: Mount Namespace

Isolates the mount table. A process in its own mnt namespace sees a different set of mounts from the host, the same kernel, a different view of the filesystem tree.

This is what lets a container have / be its overlay rootfs while the host has / on nvme0n1p2. The container's mounts do not appear on the host, and the host's mounts do not appear in the container (unless the runtime explicitly bind-mounts them in).

# Inside the container's mount namespace
sudo nsenter -t $CPID -m -- cat /proc/self/mountinfo | head -5
# Different from the host's /proc/self/mountinfo

# Or create one by hand right now
sudo unshare -m bash
# Inside: every mount you make is invisible to the host
mount -t tmpfs tmpfs /tmp
findmnt /tmp
exit
# Back in the host: /tmp is still its original mount

`pid`: PID Namespace

Isolates the PID number space. The first process in a new PID namespace gets PID 1. It is also the namespace's init, if it dies, every process in the namespace dies with it.

# Create a new PID namespace — the shell inside is PID 1
sudo unshare --pid --fork --mount-proc bash
# Inside:
ps aux
# USER   PID ... COMMAND
# root     1 ... bash        <- our shell is PID 1
# root     8 ... ps
# nothing from the host is visible
# Exit the shell and the namespace is cleaned up
exit

Two important properties:

PIDs in a child namespace are different numbers from the host. The same process has a different PID depending on which namespace is asking.
The host can still see every process: it just sees them with their host-side PIDs. /proc/$HOSTPID/status shows the NSpid: field, which lists the PIDs this process has in each namespace it belongs to.

PRO TIP

cat /proc/$PID/status | grep NSpid tells you the PID a process has in its own PID namespace. On the host, the container's "PID 1" process might be PID 18234; inside the container, the same process sees itself as PID 1. The NSpid: 18234 1 field captures both views.

`net`: Network Namespace

Isolates networking: interfaces, routing tables, ARP tables, firewall rules (iptables/nftables), sockets. Inside a fresh net namespace you see only the loopback (lo) and you need to add interfaces to it, typically by moving a veth pair in from the host.

# Create a network namespace with the iproute2 tool
sudo ip netns add demo
sudo ip netns list
# demo

# Run a command in it
sudo ip netns exec demo ip link
# 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN ...
# only lo — no eth0, no wireless, nothing

# Bring lo up inside it
sudo ip netns exec demo ip link set lo up
sudo ip netns exec demo ip addr
# 1: lo: <LOOPBACK,UP> ... inet 127.0.0.1/8 ...

# Clean up
sudo ip netns del demo

All container networking (Docker bridges, Kubernetes CNI plugins, Istio sidecars) is built on manipulations of network namespaces: create the namespace, add a virtual interface (veth) with one end inside and one end on the host, set up routing, apply iptables rules, done.

`ipc`: IPC Namespace

Isolates System V IPC (shared memory segments, message queues, semaphores) and POSIX message queues. Processes in different IPC namespaces cannot see each other's IPC objects.

This matters for legacy software that uses shmget() / msgget() / semget(), multiple instances in separate containers will not collide. For modern code, which mostly uses Unix sockets or shared memory via shm_open()/mmap(), this namespace rarely affects you.

ipcs                              # see IPC objects in your namespace
# ------ Message Queues --------
# ...
# ------ Shared Memory Segments --------
# ...
# ------ Semaphore Arrays --------
# ...

sudo unshare --ipc bash           # new IPC namespace
ipcs                              # completely empty — host's IPC not visible

`uts`: UTS Namespace

Isolates hostname and NIS domain. Lets a container sethostname("web-01") without affecting the host.

hostname
# host.example.com

sudo unshare --uts bash
hostname container-1
hostname
# container-1         <- changed inside the namespace
exit

hostname
# host.example.com    <- host unaffected

Small namespace, but every container runtime uses it, hostname inside a pod reporting the pod name or something else the runtime chose is pure UTS namespace at work.

`user`: User Namespace

The most powerful and dangerous one. Isolates user IDs and group IDs, a process can be UID 0 (root) inside the namespace while being a regular unprivileged user on the host. Combined with capabilities, this is how rootless containers work.

# Create a user namespace and map your host UID 1000 to UID 0 inside
unshare --user --map-root-user bash
# Inside:
id
# uid=0(root) gid=0(root) groups=0(root),...

cat /proc/self/uid_map
# 0       1000     1                <- inside UID 0 = host UID 1000

# But real privileges are still limited — can't chown host files
touch /etc/hosts
# Touch: cannot touch '/etc/hosts': Permission denied
exit

User namespaces are how Docker-rootless, Podman, and Kubernetes' user-namespace support let non-root host users run containers with "root" inside. It is a big deal for security: a container escape as "root-in-namespace" still lands on the host as UID 1000, limiting blast radius.

`cgroup`: Cgroup Namespace

Isolates the view into /sys/fs/cgroup. Processes in a cgroup namespace see their cgroup hierarchy rooted at their own cgroup, not the host's.

Before cgroup namespaces existed, a container could cat /proc/self/cgroup and see /docker/abc123..., exposing the container runtime's paths. With cgroup namespaces, it sees /, a clean, rooted view.

cat /proc/self/cgroup
# 0::/user.slice/user-1000.slice/session-3.scope

# Inside a container (with a cgroup namespace):
# cat /proc/self/cgroup
# 0::/                <- rooted at the container's own cgroup

`time`: Time Namespace

Newer (Linux 5.6+). Isolates CLOCK_MONOTONIC and CLOCK_BOOTTIME, the clocks that count forward from boot. Lets checkpoint/restore tools preserve monotonic time when a process moves between hosts.

Rarely used directly. Docker and most container runtimes do not use it today.

How Containers Assemble Them

A Docker container typically creates new namespaces for:

mnt, so the overlay rootfs is the container's /.
pid, so the container has its own PID 1.
net, so the container has its own interfaces (unless --network=host).
uts, so the container can set its own hostname.
ipc, for isolation of IPC objects.
Sometimes user, for rootless mode or explicit user namespace mapping.
Always cgroup, for a clean cgroup view.

# See what namespaces a specific container uses
docker run -d --name demo alpine sleep 1000
PID=$(docker inspect --format='{{.State.Pid}}' demo)
ls -l /proc/$PID/ns

# Compare to your shell
diff <(ls -l /proc/$$/ns | awk '{print $9, $11}') \
     <(ls -l /proc/$PID/ns | awk '{print $9, $11}')

docker rm -f demo

The differences you will see are exactly the namespaces Docker chose to create a new instance of.

Tools That Work With Namespaces

`unshare`: run a command in new namespaces

# All common namespaces + new hostname + new PID tree
sudo unshare --uts --pid --fork --mount --mount-proc --ipc bash
# Now you are effectively in a mini-container:
hostname sandbox
ps aux
mount -t tmpfs tmpfs /mnt
exit
# Everything dissolves when the shell exits

`nsenter`: enter an existing namespace

The essential tool for debugging containers from the host.

# Enter all namespaces of a container's PID
sudo nsenter -t $PID -a

# Enter just the network namespace (super handy for debugging container networking)
sudo nsenter -t $PID -n ip addr
sudo nsenter -t $PID -n ss -tlnp

# Enter mount namespace to see what the container sees in the FS
sudo nsenter -t $PID -m ls /etc

PRO TIP

sudo nsenter -t $CPID -n ip addr is the 30-second debug for "why can the container not reach X?" It drops you into the container's network namespace with host-side tools like ip, ss, tcpdump, ping, etc., none of which you have to install inside the container image. Learning nsenter removes 80% of the pain of debugging minimal container images.

`ip netns`: dedicated to network namespaces

sudo ip netns add lab
sudo ip netns exec lab ip addr
sudo ip netns exec lab bash          # drop into a shell in the namespace
sudo ip netns del lab

`lsns`: list every namespace

lsns                                 # all of them
lsns --type net                      # just network namespaces
lsns -p 1                            # what namespaces does PID 1 belong to

Namespaces Are Not Security

Namespaces isolate resource views. They do not by themselves sandbox a process. A process with CAP_SYS_ADMIN inside a mount namespace can mount arbitrary things. A process with CAP_NET_ADMIN in a network namespace can manipulate routes, and if it can see the host network (shared namespace), it can manipulate the host's. A process that can see /proc or /sys paths outside its namespace can read or write them.

Containers are secure because namespaces are combined with:

Capabilities: dropping CAP_SYS_ADMIN, CAP_NET_ADMIN, etc. from the container's process.
seccomp: a BPF filter that blocks specific syscalls (like mount, reboot, kexec_load).
LSMs: AppArmor or SELinux policies that restrict what the process can touch.
cgroups: resource limits, preventing one container from starving others.
User namespaces: root-in-namespace mapped to unprivileged on the host.
Read-only rootfs / no-new-privileges / dropped SUID, belt-and-suspenders.

A container escape is not a "namespace escape", it is usually finding a syscall or kernel bug that bypasses one of these layers. Namespaces are the enabling primitive, not the security story.

WARNING

Running a container with --privileged disables most of the extra layers: full capability set, no seccomp, writable /sys and /dev, host devices available. The namespaces are still there, so it still "looks" isolated, but a privileged container can trivially escape to the host. Avoid --privileged unless you genuinely need it, and when you do, treat the container's security as identical to running the process directly on the host.

Debugging With Namespaces

A few techniques you will reach for:

# Is this process in the same netns as the host? (are they in the same inode?)
sudo readlink /proc/1/ns/net /proc/$PID/ns/net
# net:[4026531992]            <- if same, shared
# net:[4026532421]            <- if different, isolated

# See every container's network connections from the host
for pid in $(pgrep -f containerd-shim | head); do
  echo "=== PID $pid ==="
  sudo nsenter -t $pid -n ss -tanlp 2>/dev/null
done

# Which container is a mystery process in?
grep -E '^[0-9]+' /proc/$PID/cgroup
# 0::/system.slice/docker-abc123deadbeef.scope

# Compare two processes' namespace memberships
diff <(ls -l /proc/$PID1/ns/ | awk '{print $9, $11}') \
     <(ls -l /proc/$PID2/ns/ | awk '{print $9, $11}')

Key Concepts Summary

Namespaces isolate kernel resources. One per kind: mount, PID, network, IPC, UTS, user, cgroup, (time).
Processes belong to one namespace per kind. /proc/[pid]/ns/<kind> is a symlink with a unique inode, same inode = same namespace.
Containers are processes in custom namespace sets. No virtualization, just the kernel lying consistently.
unshare creates; nsenter enters; ip netns manages network namespaces. Learn all three.
mnt isolates the mount table; net isolates interfaces and routing; pid isolates the PID numbering; uts isolates the hostname. The others (ipc, user, cgroup, time) matter in specific situations.
User namespaces map UIDs. Root inside can be unprivileged outside, the basis of rootless containers.
Namespaces are not security alone. They isolate views. Security comes from combining them with capabilities, seccomp, LSMs, cgroups, and rootfs hardening.
lsns lists every namespace; NSpid in /proc/[pid]/status shows nested PIDs. Core inspection tools.

Common Mistakes

Treating namespaces as virtualization. They are view isolation, the kernel is still shared, and a bug in one namespace's kernel path affects everyone.
Assuming the PID inside a container is the same as the host-side PID. They are almost always different, use NSpid: in /proc/[pid]/status or docker inspect to translate.
Debugging container networking from the host without nsenter. You look at the wrong interfaces and get the wrong answer.
Running --privileged containers in production. It bypasses the layered security that makes namespaces useful and defeats the whole point.
Confusing user namespaces with UIDs. A user namespace is a mapping: a process can be root inside while being unprivileged outside, and that is a feature, not a hack.
Thinking "a network namespace needs DNS configuration." The network namespace has nothing but lo until you add interfaces to it, then it inherits whatever routing you set up, including resolver config in the container's /etc/resolv.conf (which comes from the mount namespace).
Deleting a network namespace with ip netns del while processes are still inside it. They survive momentarily but lose all network access immediately. Use nsenter to cleanly stop them first.
Believing that namespace isolation prevents filesystem access. A mount namespace shares the underlying storage; two containers on the same host can fight over page cache, I/O bandwidth, and disk space if they share storage. Namespaces isolate what you see, not what you share.

KNOWLEDGE CHECK

You have two running processes on a host. `readlink /proc/123/ns/net` returns `net:[4026532421]`, and `readlink /proc/456/ns/net` returns `net:[4026531992]`. What does that tell you, and what practical consequence does it have?

Logs with journalctl

Continue

cgroups v1 vs v2

←→ navigateM toggle sidebar

Namespaces Explained

What a Namespace Is

Seeing Namespaces on a Live System

The Seven Namespaces in Detail

mnt: Mount Namespace

pid: PID Namespace

net: Network Namespace

ipc: IPC Namespace

uts: UTS Namespace

user: User Namespace

cgroup: Cgroup Namespace

time: Time Namespace