Linux Fundamentals for Engineers

Filesystem Hierarchy and Mounts

A developer ships a Docker image that works perfectly on their Mac, runs fine in staging, and fails in production with permission denied on /var/cache/app. They check the image — the directory exists. They check the user — it has write access. They check SELinux — disabled. Eventually someone notices the production pod mounts a ConfigMap at /var/cache/app as read-only. The directory is there, the user can read it, but nothing inside can be written.

This is a mount problem. Linux's file system is not one tree on one disk — it is a carefully assembled graph of mounts, bind mounts, overlay mounts, namespace-local mounts, and read-only mounts. When a path "exists but behaves strangely," the answer is almost always in the mount table. This lesson gives you the mental model for the Linux filesystem hierarchy — why directories are organized the way they are, how mounts stitch together real storage, special filesystems, and container overlays, and how mount namespaces make two processes see two different trees on the same machine.


One Tree, Many Sources

On Windows, each disk gets its own letter (C:, D:, E:). On Linux, every disk, every network share, every USB stick, every pseudo-filesystem is grafted into a single tree rooted at /. There are no drive letters; there is just the tree.

A mount is the operation that attaches a source (a block device, a network share, a tmpfs, a bind from elsewhere in the tree) onto a mount point (an existing directory). After the mount, reading that directory reads the attached source. The original contents of the mount point are hidden until the mount is undone.

# See every current mount
mount | head -10
# sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
# proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
# devtmpfs on /dev type devtmpfs (rw,nosuid,size=16384000k,...)
# tmpfs on /run type tmpfs (rw,nosuid,nodev,size=3289136k,...)
# /dev/nvme0n1p2 on / type ext4 (rw,relatime,errors=remount-ro)
# ...

# Cleaner output with tree structure
findmnt | head -10
# TARGET                         SOURCE         FSTYPE     OPTIONS
# /                              /dev/nvme0n1p2 ext4       rw,relatime
# ├─/sys                         sysfs          sysfs      rw,nosuid,...
# ├─/proc                        proc           proc       rw,nosuid,...
# ├─/dev                         devtmpfs       devtmpfs   rw,nosuid,...
# ├─/run                         tmpfs          tmpfs      rw,nosuid,...
# ├─/boot/efi                    /dev/nvme0n1p1 vfat       rw,relatime
# └─/home                        /dev/nvme1n1   ext4       rw,relatime

# Or as a tree only
findmnt --tree

This is the actual topology. One root, many mounts grafted in at specific points.

KEY CONCEPT

Everything in / is either on the root filesystem or is a mount point of something else. When a path "does not work right," the first question is: is this path its own mount, or is it part of its parent's mount? findmnt /path/to/thing answers it in one command. The answer determines whether your problem is about disk space, permissions, mount flags, or network connectivity — four completely different debug paths.


The FHS: Why Directories Are Named What They Are

The Filesystem Hierarchy Standard (FHS) is the agreement every major Linux distro follows about where things live. It is not a law — it is a convention — but every package manager, every piece of software you install, every tutorial you read assumes it. Knowing the FHS means knowing where to look.

PathWhat goes thereWho writes to it
/Root of the tree. Only essential boot-time stuffPackage manager
/bin, /sbinEssential commands (often symlinked into /usr/bin, /usr/sbin on modern distros)Package manager
/usrDistro-managed programs, libraries, docsPackage manager. Typically read-only for humans
/usr/localPrograms you install yourself (not via package manager)You
/optThird-party packages that want their own treeYou / vendor installers
/etcSystem-wide configurationPackage manager + you
/varVariable data: logs, spool, caches, stateServices at runtime
/var/logSystem and service logsjournald, syslog, apps
/var/libPer-service persistent state (databases, package caches)Services
/var/cacheRegenerable cachesServices
/var/run, /runRuntime state (PID files, sockets). Usually a tmpfsServices
/tmpScratch space. Cleared on reboot on most distros. Usually a tmpfsAnyone
/homeUser home directoriesUsers
/rootRoot user's homeroot
/bootKernel images, initramfs, bootloader configPackage manager
/devDevice files. A kernel-managed devtmpfsKernel
/procPer-process and system information. A procfsKernel
/sysDevices, drivers, subsystem info. A sysfsKernel
/srvData served by this server (HTTP, FTP, etc.)You
/media, /mntMount points for removable media (/media) and manual mounts (/mnt)Auto-mount systems

A few themes emerge:

  1. Binaries: /usr/bin for distro stuff, /usr/local/bin for your stuff, /opt/<vendor>/bin for third-party tarballs.
  2. Configs: almost always in /etc. Often /etc/$SERVICE/$SERVICE.conf with a drop-in /etc/$SERVICE/$SERVICE.d/*.conf.
  3. Logs: /var/log. Service-specific logs usually in /var/log/$SERVICE/.
  4. State: /var/lib/$SERVICE/ holds databases and state that must survive reboots. /var/cache/$SERVICE/ holds regenerable caches. /run/$SERVICE/ holds runtime sockets and PIDs (wiped on reboot).
  5. Kernel views: /proc and /sys are not real disks — they are the kernel telling you about itself.
PRO TIP

When you install a new service and wonder "where do I configure it / where are its logs / where is its data?", the answer is almost always: /etc/NAME/, /var/log/NAME/, /var/lib/NAME/. This pattern is the whole reason the FHS exists, and it is the reason you can ssh into a machine running software you have never seen and find your way around in seconds.


/etc/fstab — The Permanent Mounts

When the system boots, systemd reads /etc/fstab and mounts everything listed there. Each line is one mount.

cat /etc/fstab
# # <file system>        <mount point>  <type>  <options>             <dump> <pass>
# UUID=3f8a-01c9-...     /              ext4    defaults,errors=remount-ro  0  1
# UUID=a8f4-2d1e         /boot/efi      vfat    defaults              0       1
# UUID=c112-91bd         /home          ext4    defaults,nodev        0       2
# /dev/mapper/swap       none           swap    sw                    0       0
# tmpfs                  /tmp           tmpfs   defaults,size=2G      0       0
# 10.0.0.5:/exports/nfs  /mnt/nfs       nfs     defaults,_netdev      0       0

The six fields:

  1. Source — usually UUID=... (safest — survives device renames), sometimes a device path (/dev/sda1), sometimes a network path (host:/export), sometimes just a filesystem type (tmpfs).
  2. Mount point — where in the tree.
  3. Typeext4, xfs, vfat, tmpfs, nfs, cifs, etc.
  4. Options — comma-separated flags. Common ones below.
  5. dump — 0 (never dump) is fine for modern systems.
  6. pass — fsck order at boot: 0 (skip), 1 (root), 2 (everything else).

The mount options you will actually use

OptionMeaning
defaultsShorthand for rw,suid,dev,exec,auto,nouser,async
ro / rwRead-only / read-write
noexecFiles on this mount cannot be executed (good for /tmp on shared systems)
nodevIgnore device nodes — stops someone creating a fake /dev/null and SUID-owning it
nosuidIgnore the setuid bit — stops privilege escalation from user-writable mounts
relatime / noatimeDon't update access time on every read (big performance win)
sync / asyncWrite synchronously or let the kernel buffer (default is async)
remountChange options on an existing mount without unmounting
_netdevThis mount needs the network — systemd waits for networking first
x-systemd.automountMount on first access instead of at boot
# Mount everything in fstab (happens at boot, but useful manually after edits)
sudo mount -a

# Remount the root filesystem read-only without rebooting (useful during fsck)
sudo mount -o remount,ro /

# Mount something ad-hoc, not in fstab
sudo mount -t nfs 10.0.0.5:/exports/data /mnt/data

# Unmount (the device or the mount point, both work)
sudo umount /mnt/data
sudo umount 10.0.0.5:/exports/data
WARNING

A broken line in /etc/fstab can prevent your system from booting. The kernel drops to emergency mode when a pass=1 or pass=2 fstab entry fails to mount. Always (1) test with mount -a before rebooting, and (2) for optional mounts, use the nofail option so a missing network share does not stop the boot.


Special Filesystems: Not On Disk

Several mount types you see in mount output are not real storage — they are kernel views.

TypeWhat it isTypical mount point
procProcess and kernel info/proc
sysfsDevices, drivers, kernel subsystems/sys
tmpfsRAM-backed filesystem/tmp, /run, /dev/shm
devtmpfsDevice nodes managed by the kernel/dev
cgroup2 (or cgroup)Control groups v2/sys/fs/cgroup
overlayLayered union filesystemContainer rootfs
fuse.*Userspace filesystems (sshfs, s3fs, etc.)Anywhere
nsfsNamespaces exposed as files/proc/[pid]/ns/*
pipefs, sockfsPipes and sockets live here logicallyInvisible to users

tmpfs is worth understanding specifically because it is everywhere:

# /tmp is almost always a tmpfs on modern systems
findmnt /tmp
# TARGET SOURCE FSTYPE OPTIONS
# /tmp   tmpfs  tmpfs  rw,nosuid,nodev,size=2097152k

# tmpfs lives in RAM. Writing to it is instant — but the data is lost on reboot
# and counts against memory
free -h
# Used memory includes tmpfs contents

# Why /run is a tmpfs: PID files and sockets should not survive reboots
findmnt /run
# /run tmpfs tmpfs rw,nosuid,nodev,size=3289136k

Bind Mounts: Make One Path Appear in Two Places

A bind mount grafts an existing directory or file elsewhere in the tree. No new storage — the same blocks just show up at a second path.

# Make /var/log/nginx also visible at /srv/nginx-logs
sudo mkdir /srv/nginx-logs
sudo mount --bind /var/log/nginx /srv/nginx-logs

# Confirm
ls /srv/nginx-logs
# access.log  error.log  ...   <- same files

# Undo
sudo umount /srv/nginx-logs

Bind mounts are everywhere in modern Linux:

  • Docker volumes: -v /host/path:/container/path is a bind mount from the host into the container's mount namespace.
  • Kubernetes hostPath and emptyDir volumes: bind mounts from the node filesystem into the pod.
  • systemd BindPaths= / BindReadOnlyPaths=: bind mounts into a service's private filesystem view.
  • chroot / jail setups: bind mount /proc, /dev, /sys into the chroot.
KEY CONCEPT

Bind mounts are how Linux containers get data in and out. When you read "mounted as a volume" in a Kubernetes or Docker doc, what is happening underneath is always mount --bind or its overlay cousin. Understanding bind mounts is understanding how your container reads its ConfigMap, writes to a PersistentVolume, or sees the host's /proc.

Bind-mount a single file

# Bind-mount one file — perfect for read-only config injection
sudo mount --bind /etc/myapp.conf.new /etc/myapp.conf
# Now myapp sees the new config without any copy/link

This is exactly how ConfigMaps and Secrets mount in Kubernetes — each file in the map becomes a bind-mounted file inside the container's /etc/... (or wherever).


Overlay Filesystems: How Containers Actually Store Their Root

Every modern container runtime (Docker, containerd, CRI-O) stacks filesystem layers with OverlayFS. The idea: multiple read-only "lower" layers plus a single writable "upper" layer, unified so the container sees them as one tree. When the container writes to a file, the file is copied to the upper layer on demand (copy-on-write).

# A running container's rootfs is an overlay
mount | grep overlay | head -2
# overlay on /var/lib/docker/overlay2/.../merged type overlay \
#   (rw,relatime,lowerdir=.../l/XYZ:.../l/ABC,upperdir=.../diff,workdir=.../work)

# lowerdir is one or more read-only image layers (from "docker pull")
# upperdir is the container's writable layer
# merged is what the container sees as /

# Inspect a container's storage
docker inspect <container> --format='{{.GraphDriver.Data.MergedDir}}'

When the container writes a file, OverlayFS copies the old content up from the lower layers on first write, then applies the write to the copy in the upper layer. Readers see the upper layer's version if it exists, otherwise fall through to the lower layers.

WAR STORY

A team's CI was failing intermittently with "no space left on device" even though the nodes had 500 GB free. The issue: Docker defaulted to storing images in /var/lib/docker/overlay2 on a 40 GB / partition, not the 500 GB /data mount where everyone assumed it lived. Each CI job left image layers behind; once that partition filled, new containers could not even write their upperdir. The fix was daemon.json with "data-root": "/data/docker", followed by moving the existing images. The lesson: overlay layers live somewhere on a real disk, and that disk is not always where you expect.


Mount Propagation and Shared Subtrees

When you mount a filesystem inside a namespace (e.g. inside a container), does that mount become visible in other namespaces? The answer depends on mount propagation flags.

  • private — mounts and unmounts do not propagate in or out
  • shared — mounts propagate both ways
  • slave — mounts propagate one-way (from master to slave, not back)
  • unbindable — cannot be bind-mounted

Most of the time you do not think about this. But when you do — when your container needs to see hot-plugged disks from the host, or when you are debugging why /dev shows different things in different contexts — the answer is in /proc/self/mountinfo.

# The authoritative mount list for this process — shows propagation flags
cat /proc/self/mountinfo | head -5
# 22 28 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
# 23 28 0:24 / /proc rw,nosuid,nodev,noexec,relatime shared:13 - proc proc rw
#                                                    ^^^^^^^^
#                                                    propagation flag

# Compare two processes — they may differ
diff <(cut -d' ' -f5 /proc/$PID1/mountinfo | sort) \
     <(cut -d' ' -f5 /proc/$PID2/mountinfo | sort)

Mount Namespaces: Why / Looks Different to Two Processes

Every process belongs to a mount namespace. Processes in different mount namespaces can see completely different filesystem trees — different mounts, different root, different everything — while running on the same kernel.

This is how containers work: the container runtime creates a new mount namespace, sets up its own mounts (overlay rootfs, bind mounts for volumes, tmpfs for /dev/shm), pivots root into it, and starts your process there.

# Every process has a mnt namespace. Two processes in the same container
# have the same inode for mnt; the host has a different one.
ls -l /proc/1/ns/mnt /proc/$$/ns/mnt
# lrwxrwxrwx 1 root  root  0 Apr 19 08:00 /proc/1/ns/mnt -> 'mnt:[4026531840]'
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 /proc/$$/ns/mnt -> 'mnt:[4026531840]'
# Same inode -> same mount namespace -> same view

# Enter another process's mount namespace to see what it sees
sudo nsenter -t $CONTAINER_PID -m -- ls /
# (lists the *container's* / not the host's)

# Same tool, different view
sudo nsenter -t $CONTAINER_PID -m -- cat /etc/hosts

We cover namespaces in full in Module 5 — for now, the takeaway is: when you say "the /etc/hosts file" on a machine, you must mean "in which mount namespace?" Otherwise the answer is ambiguous.


Debugging Mount Problems

# What is this path a mount of?
findmnt /var/log
# TARGET   SOURCE         FSTYPE OPTIONS
# /var/log /dev/nvme0n1p2 ext4   rw,relatime

# Is this directory on its own mount, or part of its parent's?
findmnt --target /some/path
# Tells you the closest enclosing mount point

# Who has this mount "busy" (preventing unmount)?
sudo lsof +D /mnt/foo     # every process with an open fd under /mnt/foo
sudo fuser -cm /mnt/foo   # processes using this mount

# How much space is this mount using?
df -h /var/log
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/nvme0n1p2   50G   23G   25G  49% /

# How many inodes? (running out of inodes looks like full-disk but df shows free space)
df -i /var/log

# Compare mounts between two processes/namespaces
diff <(sudo nsenter -t $PID1 -m -- mount) <(sudo nsenter -t $PID2 -m -- mount)
PRO TIP

df -h tells you about space. df -i tells you about inodes. A filesystem with 0% space used but 100% inodes used will return "No space left on device" on every new file — confusing until you run df -i. This is common on systems that create millions of small files (mail spools, session directories, git working copies of huge repos). Always run both.


Key Concepts Summary

  • Linux has one tree. No drive letters. Everything mounts somewhere under /.
  • The FHS is a convention, not a law, but every distro follows it. /etc for config, /var/log for logs, /var/lib for state, /usr/local for your binaries.
  • /proc and /sys are not on disk. They are kernel views exposed as filesystems.
  • /etc/fstab drives boot-time mounts. Use UUID=... for device names that survive reordering. Use nofail for optional mounts.
  • Mount options matter. ro, noexec, nosuid, nodev, relatime, _netdev, nofail — each has a specific safety or performance role.
  • tmpfs is RAM-backed. /tmp and /run are usually tmpfs; data there vanishes on reboot.
  • Bind mounts graft one path onto another. Docker volumes, Kubernetes hostPath, systemd BindPaths — all bind mounts underneath.
  • Overlay filesystems stack layers. Every container rootfs is an overlay: read-only image layers plus one writable upper layer, copy-on-write for modifications.
  • Mount namespaces give each container its own tree. Two processes on the same kernel can see completely different filesystems.
  • findmnt, /proc/*/mountinfo, lsof, and df -i are the core debugging tools.

Common Mistakes

  • Using device paths like /dev/sda1 in fstab when drives can be renamed on boot. Always prefer UUID= or LABEL= for stability.
  • Editing fstab and rebooting before testing with mount -a. A typo can keep the system from booting.
  • Filling /tmp and then wondering why RAM usage spiked — tmpfs contents count as RAM.
  • Ignoring df -i when disk-full errors appear on a mount with plenty of free space.
  • Mounting over a non-empty directory without realizing the original contents are hidden until the mount is undone.
  • Running mount without -a after editing fstab and thinking "it worked." mount alone just shows what is currently mounted.
  • Debugging a container and looking at the host's mount table. Use nsenter -t $PID -m (or cat /proc/$PID/mountinfo) to see what the container actually sees.
  • Using umount -l (lazy unmount) as a shortcut for a busy mount. It hides the mount from new accesses but does not fix the underlying file-descriptor leak that kept it busy — find the process with lsof +D and fix the real problem.
  • Not setting _netdev on NFS/CIFS/iSCSI mounts in fstab. Without it, systemd may try to mount them before networking is up and fail the boot.

KNOWLEDGE CHECK

You have a pod in Kubernetes that mounts a ConfigMap at /etc/app/config.yaml. Inside the pod, `ls -l /etc/app/config.yaml` shows the file exists with the right content, but your app gets `EROFS: read-only filesystem` when it tries to write to `/etc/app/cache/foo.tmp`. The cache directory exists. Why?