Volumes and Data
At 2 AM, a developer runs
docker-compose down -vto "clean up" before a fresh start. At 2:01 AM they realize the-vflag means "remove volumes" — including the one backing Postgres. By 2:02 they are staring at an empty/var/lib/postgresql/dataand the previous week's work of development data is gone. They start typing "docker volume recover" into Google.Container storage seems simple: write to disk, the disk keeps your data. It is not simple. Containers have four different kinds of places writes can go — writable layer, bind mount, named volume, tmpfs — and each has different rules about when data persists and who owns it. Getting this wrong manifests as vanished database data, permission-denied errors, silent disk fill-ups, and mystery-slow I/O. This lesson explains all four, when to use each, and the small number of patterns that keep your data safe.
Where Container Writes Actually Go
A container's / is an OverlayFS mount with four potential destinations for any write:
- Writable upper layer (the default). Anything written to a path that is not a mount point lands here. Destroyed when the container is removed.
- Bind mount. A host directory or file grafted into the container at a specific path. Persists on the host, not tied to the container's lifecycle.
- Named volume. A Docker-managed storage location, typically under
/var/lib/docker/volumes/. Persists until explicitly removed withdocker volume rm. - tmpfs. A RAM-backed filesystem mount. Fast, ephemeral, counts against container memory limits.
The rule: any data you want to survive container removal must live in a bind mount or a named volume. The writable layer is convenient for everything else, but it is wiped on docker rm. One line of config difference between "stored properly" and "lost forever" is what catches teams out; always look at where your app writes data and confirm that path is mounted somewhere persistent.
The Writable Layer: Fast but Ephemeral
docker run -it --name temp alpine sh
# Inside the container
echo "notes" > /root/note.txt
cat /root/note.txt
exit
# Container is stopped but not removed
docker start -a temp
cat /root/note.txt
# notes ← still here because the container was only stopped
# Now remove
docker rm temp
# The note is gone forever. The overlay diff was deleted.
The writable layer is where uncaptured writes go. Logs that the app writes to /var/log/app.log without a volume. Caches the app creates in /tmp. Python __pycache__ files. All of it lives in the container's overlay diff/ directory and is deleted with docker rm.
This is usually what you want for ephemeral state. It becomes a problem when you accidentally put something important there.
Bind Mounts: Host Path → Container Path
# Mount the host's /home/user/src into /app inside the container
docker run --rm -v /home/user/src:/app -w /app node:20 npm test
# Read-only
docker run --rm -v /etc/ssl/certs:/etc/ssl/certs:ro alpine ls /etc/ssl/certs
# The more-explicit --mount form
docker run --rm \
--mount type=bind,source=/home/user/src,target=/app,readonly \
-w /app node:20 npm test
A bind mount grafts a specific host directory or file into the container at the target path. The container sees its /app; the host still sees its /home/user/src. Changes are instantly visible on both sides — it is the same inode.
When to use bind mounts
- Local development. Mount your source tree into the container, run
npm run devorpython -m flask run, edit files on the host, watch the container pick up changes immediately. - Config injection. Mount a single config file into the container (
-v /etc/myapp/config.yaml:/etc/app/config.yaml:ro). - Host paths the container needs to see.
/var/run/docker.sock(for tools that talk to Docker),/etc/hosts,/sysfor system tooling. - Persistent data in small, simple setups. On a single server, you can bind-mount
/srv/myapp/datainto the container. Backups are just tarballs of the host path.
Common pitfalls
Host path does not exist
docker run --rm -v /does/not/exist:/app alpine ls /app
# Docker creates /does/not/exist on the host (as root) and mounts it empty.
# The container sees an empty directory, not an error.
Docker silently creates missing bind-mount source paths as root. This is why a typo in your compose file leads to "my app's config is empty" instead of a clear error.
Permission mismatches
# Your host user is UID 1000
ls -ld /home/user/src
# drwxr-xr-x 20 user user 4096 Apr 20 10:00 /home/user/src
# The image's USER is node (UID 1000 — happens to match on Node image)
docker run --rm -v /home/user/src:/app node:20 ls -l /app
# Works fine — UIDs match.
# The image's USER is nobody (UID 65534) or root (UID 0)
docker run --rm -v /home/user/src:/app --user nobody alpine touch /app/newfile
# touch: /app/newfile: Permission denied
The container sees host files with their host UIDs, not "translated." If the container's user does not have permission on the host path, writes fail. Fixes:
- Run the container as a UID that matches the host file ownership:
--user $(id -u):$(id -g). chownthe host path to a UID the container uses.- Use a named volume instead (Docker can initialize it with the image's ownership).
- Use user namespaces (
--userns) to remap — more advanced.
On macOS and Windows with Docker Desktop, bind mounts are proxied through a VM's shared filesystem. This is slow — often 10-50× slower than native — especially for many small files (npm install, Python site-packages, Git operations). Workarounds: use named volumes for node_modules/venv/vendor dirs (not bind mounts), use Docker Desktop's "VirtioFS" sharing mode (enabled by default in recent versions), or use WSL2 on Windows where Linux paths are native.
Overlay a single file
# Mount just one file
docker run --rm -v /etc/myconfig.yaml:/etc/app/config.yaml:ro myapp
Critical detail: Docker cannot replace a file that does not exist in the image. If /etc/app/config.yaml is not in the image, the bind mount creates it as a directory on the host and you get confusing errors. Make sure the target file exists in the image first (even if empty).
Named Volumes: Docker-Managed Storage
# Create explicitly
docker volume create mydata
# Or let Docker create on first use
docker run -d --name db \
-v mydata:/var/lib/postgresql/data \
postgres:16
# List
docker volume ls
# DRIVER VOLUME NAME
# local mydata
# Inspect to see where Docker stores it
docker volume inspect mydata
# [
# {
# "Name": "mydata",
# "Mountpoint": "/var/lib/docker/volumes/mydata/_data",
# "Driver": "local",
# ...
# }
# ]
# Remove (DESTROYS THE DATA)
docker volume rm mydata
Named volumes are stored by Docker at /var/lib/docker/volumes/<name>/_data. You interact with them by name; Docker manages the filesystem layout.
When to use named volumes
- Databases. Postgres, MySQL, Mongo, Redis — the canonical use. The volume survives container restarts, upgrades, and recreations.
- Dependencies caches in dev loops. Mount a named volume at
/app/node_modulesso deps are persistent and the host OS cannot interfere (especially important on Docker Desktop). - Any state you do not want bound to a specific host path. Portable across docker hosts if you back the volume up and restore it.
Named volumes and first-time population
A cool named-volume behavior: when a volume is empty and mounted at a path in the image that already contains files, Docker copies the image's files into the volume on first use.
# The node:20 image has /app pre-populated with... well, nothing typically
# But the mariadb image has /var/lib/mysql with initialization files
docker run -d --name db -v dbdata:/var/lib/mysql mariadb:10.11
# Docker copies mariadb's initial /var/lib/mysql into the 'dbdata' volume
docker run --rm -v dbdata:/data alpine ls /data
# ibdata1 ib_logfile0 ib_logfile1 mysql performance_schema ...
This is different from bind mounts, which always show only the host path's contents (the image's files at that path are hidden).
Use named volumes for anything the image initializes. Pointing a bind mount at /var/lib/mysql starts the database with an empty directory (because the host path overrides the image's version), which breaks the initialization scripts. A named volume is empty the first time, gets populated from the image, and persists across restarts. Tutorials that use bind mounts for databases tend to include hand-crafted init steps to paper over this.
tmpfs: RAM-Backed Scratch
# With -v-style syntax on Linux
docker run --rm --tmpfs /tmp:rw,size=100m alpine sh
# With --mount
docker run --rm --mount type=tmpfs,target=/tmp,tmpfs-size=100m alpine sh
Inside the container, /tmp is a tmpfs capped at 100 MB. Writes are in RAM. The tmpfs evaporates when the container stops.
When to use tmpfs
- High-churn caches that the container generates and consumes (build caches, temp files).
- Secrets at runtime — mount a tmpfs, write secret into it, use it, have it disappear on restart.
- Pair with
--read-onlyto give the container a writable/tmpwhile the rest of the root is immutable:
docker run --rm --read-only --tmpfs /tmp --tmpfs /run alpine sh -c 'touch /app/x'
# touch: /app/x: Read-only file system ← expected
# Inside /tmp and /run, writes work because of tmpfs
Trade-off: tmpfs counts against the container's memory limit. A 1 GB tmpfs with a 512 MB memory limit will OOM-kill the container once the tmpfs fills.
The Four Places Compared
| Feature | Writable layer | Bind mount | Named volume | tmpfs |
|---|---|---|---|---|
| Persistence | Until docker rm | Host's lifecycle | Until docker volume rm | Container stop |
| Storage location | /var/lib/docker/overlay2/<id>/diff | Any host path | /var/lib/docker/volumes/<name> | RAM |
| Portability across hosts | No | Yes (same path) | Via backup/restore | No |
| Host ownership | Docker's | Host user | Docker's | Docker's |
| Best for | Ephemeral writes | Dev loops, configs | Databases, stateful apps | Scratch, secrets |
| Performance | Fast | Slow on macOS/Win | Native speed | RAM speed |
| Can be mounted read-only | N/A | Yes (ro) | Yes (ro) | Yes |
| Shows image's files when empty | N/A | No (hidden) | Yes (populated) | No |
Backing Up and Restoring Volumes
Since volumes are just directories, a portable backup is a tarball:
# Back up 'mydata' to /tmp/mydata.tar.gz on the host
docker run --rm \
-v mydata:/data \
-v $(pwd):/backup \
alpine \
tar czf /backup/mydata.tar.gz -C /data .
# Restore from tarball into a (possibly new) volume 'newdata'
docker run --rm \
-v newdata:/data \
-v $(pwd):/backup \
alpine \
tar xzf /backup/mydata.tar.gz -C /data
For databases specifically, use the database's native backup tool (pg_dump, mysqldump) — it gives consistent snapshots while the DB is running, and the backup is portable to different DB versions. Tarring /var/lib/postgresql/data from a running Postgres risks a torn, unrecoverable backup.
A team "backed up" their production Postgres volume by tarring its data directory every night. When they tried to restore after a hardware failure, the restored database refused to start — "invalid checkpoint record." Turns out tarring live Postgres data produces a snapshot Postgres cannot recognize as valid WAL state. The actual fix: switch to pg_dump/pg_basebackup + WAL archiving. Lesson: a volume is just a directory; "backing up the directory" is not the same as "backing up the database." Always use the app's native backup tooling for stateful services.
Volume Drivers: Beyond Local Storage
The default volume driver is local (stores in /var/lib/docker/volumes). Other drivers exist for NFS, cloud storage, and distributed systems:
# NFS volume
docker volume create --driver local \
--opt type=nfs \
--opt o=addr=10.0.0.5,rw \
--opt device=:/exports/data \
nfs-data
# Then mount normally
docker run -v nfs-data:/app/data myapp
For production multi-host setups, use an orchestrator's storage abstraction (Kubernetes PersistentVolume + CSI drivers). Docker's local driver is fine for single-host; beyond that, you want a proper storage layer.
Cleaning Up
# Remove stopped containers AND anonymous volumes they created
docker container prune -f
# Remove dangling volumes (no container using them)
docker volume prune -f
# Remove everything: stopped containers, unused images, unused volumes, build cache
# DANGEROUS on shared machines; read the prompt
docker system prune -a --volumes
# See what docker is using disk-wise
docker system df
# TYPE TOTAL ACTIVE SIZE RECLAIMABLE
# Images 35 8 12.3GB 8.1GB (65%)
# Containers 8 3 234MB 100MB (42%)
# Local Volumes 12 4 890MB 600MB (67%)
# Build Cache 0 0 0B 0B
Volumes are NOT removed by docker rm <container> unless you pass -v:
docker rm -v <container> # removes anonymous volumes it created
docker rm <container> # leaves anonymous volumes behind
Named volumes are never removed by docker rm — they belong to the user, not the container. This is by design (safer defaults), but also why "dangling" volumes accumulate.
Inspecting a Container's Storage
# See every mount on a container
docker inspect demo --format='{{json .Mounts}}' | jq
# [
# {
# "Type": "volume",
# "Name": "dbdata",
# "Source": "/var/lib/docker/volumes/dbdata/_data",
# "Destination": "/var/lib/postgresql/data",
# "RW": true,
# ...
# },
# {
# "Type": "bind",
# "Source": "/home/user/src",
# "Destination": "/app",
# "RW": true,
# ...
# }
# ]
# See the OverlayFS layout
docker inspect demo --format='{{json .GraphDriver.Data}}' | jq
# {
# "LowerDir": "/var/lib/docker/overlay2/<hash>/l/XXX:...",
# "UpperDir": "/var/lib/docker/overlay2/<hash>/diff",
# "WorkDir": "/var/lib/docker/overlay2/<hash>/work",
# "MergedDir":"/var/lib/docker/overlay2/<hash>/merged"
# }
Key Concepts Summary
- Four storage destinations. Writable layer (ephemeral), bind mount (host path), named volume (Docker-managed), tmpfs (RAM).
- Only bind mounts and named volumes persist across container removal. Everything else is wiped.
- Bind mounts expose host paths. Great for dev and config; watch out for permissions and Docker Desktop slowness.
- Named volumes are Docker-managed. Great for databases and any state the image initializes.
- First-time population. Named volumes are populated from the image on first use; bind mounts hide the image's content at the target path.
- tmpfs for scratch and read-only patterns. RAM-backed, counts against memory limits.
- Always use the database's native backup tool for stateful services — tarring the data directory is not a backup.
- Volumes are not auto-deleted with containers. Use
docker rm -vfor anonymous volumes; explicitdocker volume rmfor named. --mountis the explicit form of-v. Use it in production for clarity.docker system dfshows disk usage;docker system prune --volumesreclaims unused (carefully).
Common Mistakes
- Running a database with no volume, losing all data on
docker rm.docker run -d postgreswith no-vis a test-only pattern. - Using a bind mount on a database's data dir, breaking the image's initialization. Use a named volume.
- Typo'ing a bind-mount host path. Docker silently creates the directory as root; your app sees empty data.
- Running the container as a user that doesn't have permission on the host-side bind-mount path. Use
--user $(id -u)or chown. - Using bind mounts for
node_modules/.venvon Docker Desktop macOS/Windows. Named volume is 10-50× faster. - Calling
docker-compose down -vwithout realizing-vdeletes the named volumes. Backup first! - Assuming
--rmremoves volumes. It only removes anonymous volumes; named volumes stay. - Forgetting that tmpfs counts against the container's memory limit. A 2 GB tmpfs with a 1 GB limit crashes quickly.
- Not backing up volumes. Docker doesn't do it for you; tarballs or native DB tools are yours to manage.
- Trusting bind-mount config injection with missing target files.
-v /host/config.yaml:/etc/app/config.yamlrequires/etc/app/config.yamlto exist in the image — otherwise Docker creates a directory there.
Your teammate runs `docker run -d --name pg -p 5432:5432 postgres:16` and loads production-seeding data into it. Two days later, after a `docker rm -f pg` and a `docker run -d --name pg postgres:16`, the data is gone. Where did it go, and what was the correct setup?