Writing a Dockerfile That Doesn't Suck
A developer opens a ticket: "Our CI takes 8 minutes per build because Docker reinstalls all Python dependencies every time." The Dockerfile starts with
FROM python:3.11, thenCOPY . /app, thenRUN pip install -r /app/requirements.txt. The order looks natural — you copy your code, then install its deps. But it is destroying the layer cache. Every code change invalidates theCOPY . /applayer, which invalidates theRUN pip installlayer, which reinstalls every package from scratch. A two-line reordering drops build time from 8 minutes to 15 seconds.Writing a good Dockerfile is not mysterious. It is about understanding what each instruction does to the layer cache, and ordering things so that slow, stable layers come first and fast, changing layers come last. This lesson is the patterns that actually matter: cache-friendly ordering, COPY vs ADD, multi-stage builds, and the handful of flags that collapse image size 10×.
The Rule That Explains Everything
Put instructions that change least often at the top. Put instructions that change often at the bottom.
The cache invalidates at the first changed instruction and cascades down. If your RUN pip install runs after COPY . /app, then any code change re-triggers the install. If your RUN pip install runs before the full copy, then code changes do not touch the install.
Bad: invalidates pip install on every code change
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "main.py"]
Good: install runs only when requirements.txt changes
FROM python:3.11-slim
WORKDIR /app
# 1. Install deps (changes rarely)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 2. Copy code (changes every commit)
COPY . .
CMD ["python", "main.py"]
The only difference: one line moved. The pip install layer now only rebuilds when requirements.txt changes. Code changes only re-run the fast COPY . . at the end.
The universal Dockerfile recipe for interpreted languages: (1) set WORKDIR, (2) copy just the dependency manifest (package.json, requirements.txt, go.mod, Gemfile, pom.xml), (3) install dependencies, (4) then copy the rest of the code. This four-step pattern gives you near-instant rebuilds on code changes across every language.
COPY vs ADD
Two instructions that look almost identical, with one key difference.
COPY src dst— copies files from the build context to the image. Plain and predictable.ADD src dst— same as COPY, plus two magic behaviors: (1) ifsrcis a URL, downloads it; (2) ifsrcis a local tarball (.tar,.tar.gz,.tgz,.tar.bz2,.tar.xz), extracts it.
# These are identical:
COPY app.py /app/
ADD app.py /app/
# This is different — ADD extracts:
ADD something.tar.gz /opt/ # extracts the tarball into /opt/
COPY something.tar.gz /opt/ # copies the tarball file verbatim
# This is different — ADD downloads:
ADD https://example.com/file.zip /tmp/ # downloads, does NOT extract (zip not supported)
# Use RUN curl instead of ADD for URLs — better for caching and error handling
The rule: use COPY unless you specifically need ADD's extraction behavior. URL fetching via ADD is almost always a mistake — it does not cache well, does not verify signatures, and does not give you clean error messages if the download fails.
# GOOD: explicit curl with checksum
RUN curl -fsSL https://example.com/tool.tar.gz -o /tmp/tool.tar.gz && \
echo "abc123... /tmp/tool.tar.gz" | sha256sum -c && \
tar -xzf /tmp/tool.tar.gz -C /usr/local && \
rm /tmp/tool.tar.gz
Multi-Stage Builds: Cut Image Size 10×
The problem: compiling code requires big toolchains (gcc, make, JDK, Go toolchain, webpack, Python build headers). Running the code does not. Multi-stage builds let you use a heavy image to compile and a minimal image to run.
# Stage 1: build (heavy, throwaway)
FROM golang:1.21 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /out/app ./cmd/server
# Stage 2: runtime (minimal)
FROM gcr.io/distroless/static-debian12
COPY --from=build /out/app /app
USER nonroot
ENTRYPOINT ["/app"]
The first stage compiles the Go binary with a full toolchain. The second stage starts from distroless/static (~2 MB — just CA certs and a static linker). COPY --from=build copies only the compiled binary.
Final image: ~20 MB. Without multi-stage, the same image would be ~900 MB.
Multi-stage works for every language
# Node.js
FROM node:20 AS build
WORKDIR /src
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY --from=build /src/dist ./dist
CMD ["node", "dist/index.js"]
# Python with compiled deps (e.g., numpy, cryptography)
FROM python:3.11 AS build
WORKDIR /src
COPY requirements.txt .
RUN pip install --prefix=/install -r requirements.txt
FROM python:3.11-slim
COPY --from=build /install /usr/local
COPY . /app
WORKDIR /app
CMD ["python", "main.py"]
# Java
FROM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /src
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src/ ./src/
RUN mvn -q package -DskipTests
FROM eclipse-temurin:17-jre-alpine
COPY --from=build /src/target/*.jar /app.jar
CMD ["java", "-jar", "/app.jar"]
Multi-stage builds are free. There is no runtime cost — each stage is its own image, and only the final stage ships. They are the single biggest image-size win available, typically cutting production images by 5-10×. Use them for compiled-language workloads always, and for interpreted-language workloads that need build tools.
Choosing the Right Base Image
| Base | Typical size | When to use |
|---|---|---|
scratch | 0 MB | Pure static binaries (Go, Rust) with no OS deps |
distroless/static | ~2 MB | Static binaries that need CA certs |
distroless/base | ~20 MB | Binaries that need glibc + CA certs |
alpine:3.19 | ~5 MB | Small, musl-based, package manager available |
python:3.11-alpine | ~50 MB | Python on Alpine (watch out for musl-vs-glibc issues) |
python:3.11-slim | ~130 MB | Python on Debian-slim — fewer gotchas than alpine |
python:3.11 | ~1 GB | Full Python — only for builders, not final images |
ubuntu:22.04 | ~80 MB | Full Debian userland — when you need apt's big archive |
gcr.io/distroless/nodejs20-debian12 | ~110 MB | Node.js with minimal OS |
Alpine tradeoffs
Alpine uses musl libc, not glibc. Most software works fine, but a few places to watch:
- DNS: musl's resolver does not support
search+ndotsthe same way glibc does — you can see subtle DNS issues in Kubernetes. - Python wheels: many precompiled wheels are built for glibc. On Alpine, pip may compile from source (slow builds, sometimes failures). Newer
musllinuxwheels are closing this gap. - Locale: Alpine lacks a default locale; some software (JVM, PostgreSQL client) expects
en_US.UTF-8and complains.
For most services, alpine or python:3.11-alpine works well and the size savings are worth it. For software that struggles with musl, slim or debian:bookworm-slim is the safe default.
Distroless (from Google)
gcr.io/distroless/... images contain only your app and its runtime dependencies — no shell, no package manager, no debugging tools. This is the tightest you can go without scratch.
FROM gcr.io/distroless/java17-debian12
COPY target/app.jar /app.jar
CMD ["/app.jar"]
- Pros: smallest attack surface, smallest image, no CVEs in a shell that your app does not use.
- Cons: no
docker exec -it container bash— there is no shell. You debug through logs anddocker cp.
Use distroless in production when you value security and size over interactive debuggability. Use slim or alpine when you want to be able to exec in and poke around.
The ENV / ARG / LABEL / EXPOSE Trio
ARG — build-time variables
ARG NODE_VERSION=20
FROM node:${NODE_VERSION}-alpine
ARG BUILD_NUMBER=dev
LABEL org.opencontainers.image.revision=${BUILD_NUMBER}
Set with --build-arg:
docker build --build-arg NODE_VERSION=20 --build-arg BUILD_NUMBER=$(git rev-parse HEAD) -t myapp .
ENV — runtime environment
ENV NODE_ENV=production \
PORT=8080 \
LOG_LEVEL=info
ENV is available at build time to subsequent instructions and to the running container. ARG is build-time only.
LABEL — metadata
LABEL org.opencontainers.image.source="https://github.com/you/myapp"
LABEL org.opencontainers.image.description="My web app"
LABEL org.opencontainers.image.licenses="MIT"
LABEL org.opencontainers.image.revision=${BUILD_NUMBER}
Labels cost nothing. Use the OCI image spec labels so tooling (registries, SBOM scanners, deploy tools) can find your metadata.
EXPOSE — documentation only
EXPOSE 8080
EXPOSE documents which port your container listens on. It does not publish the port — that is what -p 8080:8080 does at runtime. The only effect of EXPOSE is: (1) display in docker ps, (2) hint to docker run -P (uppercase P) which ports to auto-publish.
CMD vs ENTRYPOINT — The Two Forms of Every Instruction
Both control what the container runs. Where they differ: how arguments work.
# Exec form (recommended): no shell, args passed directly
CMD ["python", "main.py"]
ENTRYPOINT ["python", "main.py"]
# Shell form: runs via /bin/sh -c; shell signals not forwarded
CMD python main.py # BAD: signals don't reach python
ENTRYPOINT python main.py # BAD: same problem
Use exec form always. Shell form wraps your command in /bin/sh -c, which means your main process is sh, not your app. Signals sent to the container go to sh, which does not forward them. Your graceful-shutdown handler never runs. (Recap from the Linux course, Module 2 Lesson 2.)
When to use ENTRYPOINT vs CMD
- ENTRYPOINT sets the binary; users cannot override without
--entrypoint. - CMD sets the default arguments; users can override by appending args to
docker run.
ENTRYPOINT ["python", "main.py"]
CMD ["--config", "/etc/app/default.yaml"]
# docker run myapp → python main.py --config /etc/app/default.yaml
# docker run myapp --config /etc/b.yaml → python main.py --config /etc/b.yaml
# docker run --entrypoint bash myapp → bash (for debugging)
This is the pattern most production Dockerfiles use: ENTRYPOINT for "this container IS my app," CMD for the default flags.
A Complete Real-World Example
Here is a production-quality Dockerfile combining everything:
# syntax=docker/dockerfile:1.7 # enables BuildKit features, RUN --mount, etc.
ARG PYTHON_VERSION=3.11
ARG BASE=slim-bookworm
# --- build stage ---
FROM python:${PYTHON_VERSION}-${BASE} AS build
WORKDIR /src
# System build deps (won't ship in final image)
RUN apt-get update && \
apt-get install -y --no-install-recommends build-essential && \
rm -rf /var/lib/apt/lists/*
# Install Python deps into an isolated prefix so we can copy them cleanly
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# --- runtime stage ---
FROM python:${PYTHON_VERSION}-${BASE}
# Metadata
LABEL org.opencontainers.image.title="myapp"
LABEL org.opencontainers.image.source="https://github.com/example/myapp"
# Runtime env
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PATH=/usr/local/bin:/usr/local/sbin:$PATH
# Non-root user
RUN groupadd --system --gid 10001 app && \
useradd --system --uid 10001 --gid app --no-create-home --shell /usr/sbin/nologin app
WORKDIR /app
# Bring over only the installed Python packages (no build tools)
COPY --from=build /install /usr/local
# App code — last so it changes don't invalidate earlier layers
COPY --chown=app:app . .
USER app
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"
ENTRYPOINT ["python", "main.py"]
What each piece buys you:
# syntax=...— enables BuildKit frontend, which supportsRUN --mount=type=cache, secrets, and SSH forwarding.- ARGs at the top — easy to parameterize in CI (
--build-arg PYTHON_VERSION=3.12). - Multi-stage — build tools stay in the first stage; only the installed packages ship.
--no-install-recommends— skip "recommended" Debian packages you don't need.rm -rf /var/lib/apt/lists/*in the same RUN — clear apt cache; saves ~50 MB.- Non-root user (UID > 10000) — security baseline (full story in Module 5 Lesson 1).
--chown=app:appon COPY — avoid runningchownlater, which would create a separate layer with duplicate files.ENV PYTHON*— idiomatic Python-in-Docker settings.- HEALTHCHECK — orchestrators use this to tell if the container is ready.
- Exec-form ENTRYPOINT — signals reach the app.
BuildKit Features Worth Knowing
Modern Docker uses BuildKit as the default builder. It adds a few things classic builder did not have.
RUN --mount=type=cache — persistent caches across builds
# syntax=docker/dockerfile:1.7
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
The pip cache persists across builds (on the BuildKit builder, not in the image). Subsequent builds with the same requirements hit the cache and skip downloads.
Same pattern for apt:
RUN --mount=type=cache,target=/var/cache/apt \
--mount=type=cache,target=/var/lib/apt \
apt-get update && apt-get install -y build-essential
RUN --mount=type=secret — inject secrets without baking them in
RUN --mount=type=secret,id=github_token,target=/run/secrets/github_token \
curl -H "Authorization: token $(cat /run/secrets/github_token)" https://api.github.com/user
docker build --secret id=github_token,src=$HOME/.github-token -t myapp .
The secret is mounted for that one RUN and is never in any layer. No more "oops, token in image history."
Inspecting Your Build
# Size breakdown
docker image ls myapp
# REPOSITORY TAG IMAGE ID CREATED SIZE
# myapp latest abc123def456 2 minutes ago 145MB
# Layer-by-layer
docker history myapp
# Size per layer using dive
dive myapp
# See what the merged image contains
docker run --rm myapp sh -c 'du -sh /app /usr/local 2>/dev/null'
An engineering team's service images averaged 1.4 GB. A 2-hour audit with dive and a rewritten Dockerfile brought the average to 180 MB. Specific wins: (1) multi-stage to drop build tools (-400 MB); (2) .dockerignore to exclude .git (-900 MB); (3) moving pip install before COPY . . (build time from 6 min → 30 s on code-only changes); (4) distroless base for compiled services (-200 MB each). None of this was novel — it was the standard patterns applied consistently. Most teams underinvest in Dockerfile quality because it does not break anything visible; the cost shows up as slow CI and bloated registry bills.
Key Concepts Summary
- Order matters. Put rarely-changing instructions first, frequently-changing ones last, so the cache stays warm.
- Install deps before copying code.
COPY <manifest>→RUN <install>→COPY . . - Use COPY, not ADD. Reserve ADD for its magic behavior (tarball extraction); use explicit
RUN curlfor URLs. - Multi-stage builds cut size 5-10×. Build in a heavy stage, copy the artifact into a minimal stage.
- Pick the right base.
scratch/distroless/staticfor static binaries,slimoralpinefor services, full distro bases only in build stages. - Exec form always.
CMD ["binary","arg"]— shell form breaks signal handling. - Run as non-root.
RUN useradd+USER app. Covered fully in Module 5. - Combine RUN + clean up in one layer.
apt install ... && rm -rf /var/lib/apt/lists/*. - HEALTHCHECK tells orchestrators when the container is ready.
- BuildKit caches (
RUN --mount=type=cache) speed up CI dramatically for package managers. .dockerignorekeeps.git,node_modules, caches out of the build context.
Common Mistakes
- Copying the entire project before running package install. Any code edit re-triggers the install. Copy the manifest, install, then copy everything else.
- Using shell-form CMD/ENTRYPOINT and then wondering why SIGTERM does not reach the app.
- Using ADD to download URLs. It caches poorly and does not verify checksums. Use
RUN curl -fsSL ... && sha256sum -cinstead. - Picking alpine without understanding musl vs glibc. When precompiled wheels fail to install or DNS behaves oddly, slim/debian-bookworm-slim is often the less painful default.
- Skipping multi-stage for compiled languages. Go, Rust, Java, C++ images without multi-stage are 5-10× bigger than they should be.
- Using
:latestinFROM. Every build might get a different base — reproducibility gone. Pin a version, ideally a digest. - Running the app as root. "But containers are isolated" is the worst reason to keep root. Always drop privileges; see Module 5.
- Forgetting
--no-install-recommendsonapt-get install. Debian's "recommends" chain can drag in 200 MB of packages you do not need. - Creating a layer just to fix file ownership.
COPY --chown=app:appdoes it without an extra layer. - Using
docker committo "save" a running container as an image. The result is opaque, un-rebuildable, and cached poorly. Always rebuild from a Dockerfile.
A Python service's Dockerfile is FROM python:3.11, COPY . /app, WORKDIR /app, RUN pip install -r requirements.txt, CMD python main.py. Every code commit triggers a full pip install on CI, even though requirements.txt has not changed in weeks. What is the fix and why does it work?