Docker & Container Fundamentals

Image Scanning and Vulnerabilities

A platform team ran Trivy on their production images for the first time. The output was 4,000 CVEs across 30 images. Panic. Three hours of triage later, the picture was more manageable: 12 CVEs were critical and high with known fixes, 200 were relevant but lower severity, and the remaining 3,800 were duplicates across identical base images, false positives from the scanner misidentifying package versions, or CVEs in dev-only dependencies that never ship. The team learned two things that day: (1) scanning matters, the 12 critical fixes needed attention; (2) the output of a scanner is not a to-do list. It is a raw signal you have to triage.

Image scanning is a cornerstone of supply chain security and also a source of false alarm fatigue. This lesson covers what scanners are actually doing, how to read their output, the three major base-image families (Debian-slim, Alpine, distroless) and their security profiles, and the operational patterns that keep scanning useful rather than a weekly noise generator.

What Scanners Actually Do

An image scanner:

Inventories the image: lists every OS package and every language-level dependency.
Matches against CVE databases: NVD (NIST), OSV, vendor feeds (Alpine, Red Hat, Debian), Python PyPI advisories, npm audit, etc.
Reports matches: a CVE, its severity, the affected package, and usually a fixed version if one exists.

The inventory is straightforward on OS packages (read /var/lib/dpkg/status for Debian, /lib/apk/db/installed for Alpine, RPM DB for RHEL). It is subtler for language-level deps: scanners look at package-lock.json, requirements.txt, go.sum, vendored dirs, etc.

Two popular, free, widely-used scanners:

Trivy (Aqua): comprehensive, single binary, fast.
Grype (Anchore), pairs with Syft (SBOM generator); clean output.

Run Trivy on an image

trivy image nginx:1.25-alpine

# Output (abbreviated)
# nginx:1.25-alpine (alpine 3.18)
# ===============================
# Total: 5 (UNKNOWN: 0, LOW: 0, MEDIUM: 2, HIGH: 2, CRITICAL: 1)
#
# ┌─────────────────────┬───────────────┬──────────┬───────────────────┬───────────────┬──────────┐
# │      Library        │ Vulnerability │ Severity │ Installed Version │ Fixed Version │  Title   │
# ├─────────────────────┼───────────────┼──────────┼───────────────────┼───────────────┼──────────┤
# │ libssl3             │ CVE-2024-0727 │ CRITICAL │ 3.1.4-r1          │ 3.1.4-r2      │ openssl: │
# │ libcrypto3          │ CVE-2024-0727 │ CRITICAL │ 3.1.4-r1          │ 3.1.4-r2      │ openssl: │
# │ ...                                                                                            │
# └─────────────────────┴───────────────┴──────────┴───────────────────┴───────────────┴──────────┘

# Filter to severity that matters
trivy image --severity HIGH,CRITICAL nginx:1.25-alpine

# JSON for CI
trivy image --format json --severity HIGH,CRITICAL --output report.json nginx:1.25-alpine

# Scan a running container's filesystem
trivy image --scanners vuln nginx:1.25-alpine

# Cache the vuln DB locally (much faster)
trivy image --cache-dir ~/.cache/trivy ...

Run Grype

grype nginx:1.25-alpine

# NAME         INSTALLED      FIXED-IN   VULNERABILITY  SEVERITY
# libssl3      3.1.4-r1       3.1.4-r2   CVE-2024-0727  Critical
# libcrypto3   3.1.4-r1       3.1.4-r2   CVE-2024-0727  Critical
# curl         8.5.0-r0       8.5.0-r1   CVE-2024-0852  Medium

Grype pairs well with Syft for SBOMs (Software Bills of Materials):

syft nginx:1.25-alpine -o cyclonedx-json > sbom.json
grype sbom:./sbom.json

Generating and storing SBOMs gives you the ability to ask "which of our images are affected by this new CVE?" without rescanning the whole fleet.

KEY CONCEPT

Scanning catches the known-unpatched. It does not catch unknown vulnerabilities, logic bugs, or misconfigurations. It is one signal among several: also scan for secrets (gitleaks, trufflehog), for misconfigs (hadolint for Dockerfiles, Checkov for IaC), and for runtime behavior (Falco, Tetragon). No single tool is complete.

What CVE Severity Actually Means

CVSS (Common Vulnerability Scoring System) produces a 0.0-10.0 score based on exploitability and impact. Mapped to text:

CVSS range	Severity
9.0, 10.0	Critical
7.0, 8.9	High
4.0, 6.9	Medium
0.1, 3.9	Low
0.0	None

But severity is context-free. A "Critical" SQL injection in a library your app never imports is zero risk to you. A "Medium" path-traversal in a library you use for user uploads is potentially huge. Pure severity ranking is noise if you take it literally.

Real triage looks at:

Is the vulnerable code actually on the execution path? Many CVEs are in code your app never calls.
Is there a fix? If the fix is "upgrade to X" and you can, do it. If no fix exists, you evaluate workarounds.
What is your exposure model? Public-facing vs internal vs air-gapped changes the calculus.
Is the CVE already mitigated by your other layers? Running as non-root, dropping caps, seccomp, read-only filesystem, a lot of exploits fail against a hardened setup even if the underlying CVE is "critical."

EPSS: the 'is this actually being exploited' score

Trivy and others now report EPSS (Exploit Prediction Scoring System) alongside CVSS:

trivy image --severity HIGH,CRITICAL --ignore-unfixed --include-dev-deps=false nginx:1.25-alpine

EPSS estimates the probability a CVE will be exploited in the wild in the next 30 days. A "Critical" CVE with EPSS 0.001 is much less urgent than a "High" with EPSS 0.85. This is one of the best signals for prioritization.

Base Image Families

Your base image is the biggest single factor in how many CVEs you carry. Three dominant families:

Debian-slim (`debian:bookworm-slim`, `python:3.11-slim`, etc.)

Size: 60-130 MB base.
Package manager: apt.
Security: Debian backports fixes via versioned tags; usually quick for popular CVEs.
CVE count at any moment: 20-100 on a fresh slim image, mostly Low/Medium in unused system libraries.
Tradeoffs: Comfortable compatibility, standard userland, bash, apt, easier to debug.

Alpine (`alpine:3.19`, `nginx:alpine`, `python:3.11-alpine`)

Size: 5-50 MB.
Package manager: apk.
libc: musl, not glibc.
Security: Small attack surface (minimal package set). CVE count is typically lower than Debian.
Tradeoffs: musl-vs-glibc edge cases (DNS quirks, non-glibc-compatible binaries, occasional Python wheel mismatches). Ships busybox tools (fewer features).

Distroless (`gcr.io/distroless/...`)

Size: 2-100 MB depending on language.
Package manager: none.
Shell: none (by default). The :debug tag adds busybox.
Security: Smallest attack surface; no package manager means no classic remote-attack-via-shell-command paths. Often zero CVEs reported on a new distroless image.
Tradeoffs: No docker exec -it container bash, there is no shell. Debugging is through logs or by rebuilding against :debug. No apt-get install once the image is built.

Characteristic	debian-slim	alpine	distroless
Typical size	60-130 MB	5-50 MB	2-100 MB
Typical CVE count (OS)	20-100	10-40	0-5
Has a shell	Yes (`/bin/bash`)	Yes (`/bin/sh` busybox)	No (unless `:debug`)
Package manager	apt	apk	None
libc	glibc	musl	glibc or static
Great for	Broad compatibility	Size-sensitive services	Hardened production

Picking one

Default for app services: <language>:<version>-slim (Debian slim). Modest size, broad compatibility, well-understood.
Size-sensitive or cost-sensitive: Alpine, but be ready for musl edge cases.
Security-sensitive or minimal-attack-surface: Distroless. Accept the lack of shell.
Go, Rust, static binaries: gcr.io/distroless/static (5 MB) or scratch (0 MB).

PRO TIP

For a Go service, shipping FROM gcr.io/distroless/static-debian12 with a static binary yields a ~20 MB image with often zero CVEs. Teams switching from FROM alpine to distroless static for Go services report CVE counts dropping from 30+ to zero, image size dropping 2-3×, and zero operational pain because Go programs do not need a shell. Same pattern for Rust.

Where CVEs Usually Come From

In practice, the CVE count on most images decomposes roughly as:

Base image OS packages (55-70%). libc, openssl, zlib, libxml2, etc. Fixed by rebuilding the base image.
Language runtime and deps (20-35%). Python cryptography, Node.js dependencies, Java libraries. Fixed by upgrading the dep.
Transitive deps (5-15%). Packages pulled in by your direct deps. Harder to see; SBOMs help.
System utilities you do not need (5-10%). Left over from the base image.

The easiest wins:

Use a smaller base image (alpine, distroless), fewer OS CVEs by having fewer OS packages.
Pin and update the base image tag, python:3.11.9-slim-bookworm (specific) > python:3.11-slim (floating). Rebuild regularly.
Run apt-get upgrade during build, pulls in fixes available after the base image was published.
Remove dev dependencies in final stage, multi-stage builds, --omit=dev / --production.
Regularly update your app's deps: npm audit fix, dependabot, renovate.

An image that rebuilds is an image that stays patched

The most important habit is rebuilding regularly:

# GitHub Actions: rebuild base image weekly
on:
  schedule:
    - cron: '0 3 * * 1'            # Mondays, 03:00 UTC
  workflow_dispatch:

A weekly rebuild pulls in any base-image updates (new OpenSSL patch, updated CA certs, new musl bug fixes). Between rebuilds, images accumulate CVEs. The goal is not "zero CVEs forever" but "CVEs never live more than a week."

Integrating Scans into CI

The standard pattern: scan on every build, fail the build if new Critical/High CVEs appear.

# GitHub Actions
- name: Build image
  run: docker build -t myapp:${{ github.sha }} .

- name: Scan with Trivy
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myapp:${{ github.sha }}
    format: sarif
    output: trivy-results.sarif
    severity: 'HIGH,CRITICAL'
    exit-code: 1              # fail the job on findings at that severity
    ignore-unfixed: true      # don't fail on CVEs with no fix available yet

- name: Upload SARIF to GitHub Security
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: trivy-results.sarif

Key flags for sanity:

--ignore-unfixed, do not fail on CVEs that have no fix available. These are meaningful information but not actionable at build time.
--severity HIGH,CRITICAL, focus on severities that matter. Medium and Low as warnings.
.trivyignore, a file listing CVE IDs to explicitly accept. Use sparingly, document why.

# .trivyignore
# CVE-2023-XXXXX   -- accepted: only affects scenarios we don't use, fix ETA Q2
# CVE-2024-YYYYY   -- accepted: false positive, version detected is patched upstream

WARNING

An empty .trivyignore is better than a growing one. Each ignore is a risk you accepted: list them, date them, justify them, and review them. Teams end up with 200+ entries over time that nobody remembers the context for. Review quarterly; expire old entries.

Base Image Update Strategies

Pin the major.minor, update the patch

FROM python:3.11.9-slim-bookworm

Pros: Reproducible for a time; easy to bump when patch versions roll out.
Cons: You have to actively bump.

Pin by digest (most rigorous)

FROM python:3.11.9-slim-bookworm@sha256:abc123...

Pros: Binary-exact reproducibility.
Cons: Manual update process for every new image.

Floating minor tag

FROM python:3.11-slim

Pros: Auto-picks up patch updates on rebuild.
Cons: Build-to-build non-determinism; reproducibility requires pinning at build time.

For production services, pin by digest and automate the update PR (Renovate and Dependabot both handle this). The bot opens a PR when a new digest is published; CI scans the new image; if clean, merge. Regular, automated, auditable.

Signing and Verifying Images (A Quick Recap)

Scanning tells you about known CVEs. Signing tells you an image came from your CI pipeline and has not been tampered with. Both matter.

# Sign with Cosign (keyless, via OIDC)
cosign sign --identity-token "$(gh api /repos/myorg/repo/actions/jobs/<id>/tokens)" \
    myorg/myapp:v1.2.3

# Verify
cosign verify \
    --certificate-identity-regexp='https://github.com/myorg/.*' \
    --certificate-oidc-issuer='https://token.actions.githubusercontent.com' \
    myorg/myapp:v1.2.3

# In Kubernetes, enforce via admission controller (Kyverno, Connaisseur, Sigstore Policy Controller)

See Module 2 Lesson 3 for full treatment of registries and signing.

A Realistic Scan Output Triage Session

Fresh image, run trivy. Output looks like:

Total: 47 (UNKNOWN: 0, LOW: 23, MEDIUM: 14, HIGH: 8, CRITICAL: 2)

Triage:

Two Critical. CVE-2024-0727 in openssl (libssl3), CVE-2024-XXXX in curl. Both have fixed versions in the base image. Action: rebuild with latest python:3.11.9-slim-bookworm@<new-digest>.
Eight High. Look at each: is the affected code on my path?
- Six are in apt, dpkg, base-files: Debian-system bugs, typically only exploitable during package install. Real but low exposure in a running container.
- One is in libxml2, which my app does use. Fixable by upgrading the package. Action: add apt-get upgrade -y during build, or pin to a newer bookworm point release.
- One is in git, which my image does not actually use at runtime (it was used during build). Action: multi-stage build; strip git from the final stage.
Fourteen Medium, 23 Low. Note them, schedule for next regular update.
Ignored/accepted: zero: do not add any unless there is a fix and you cannot take it.

One iteration of the Dockerfile (multi-stage to drop build tools, apt-get upgrade in the base, pinned digest for the new base) takes the scan from 47 → 5 or so. Most of those 5 are system-base noise that will disappear on next Debian point release.

WAR STORY

A team's security scoreboard measured "CVEs per image" as a KPI. Teams started to play the metric, adding .trivyignore entries for everything, switching to older base images that Trivy hadn't fully indexed, even forking base images to rename packages. None of this made them safer; it just made the dashboard green. The fix was to change the metric to "average age of fixable CVEs", which rewards actually-patching, not just scanner evasion. Be careful what you measure.

Key Concepts Summary

Scanners inventory package versions and match them against CVE databases. Trivy and Grype are the common free choices.
CVSS severity is context-free. Real triage combines severity with exploitation probability (EPSS), exposure, and reachable code.
Base image choice dominates CVE count. Distroless < Alpine < Debian-slim < full distro.
Rebuild regularly. Weekly rebuilds keep base-image fixes flowing. Automate with Renovate/Dependabot.
Pin for reproducibility; update automatically. Digest pins + bot-driven PRs are the production pattern.
--ignore-unfixed prevents noise. .trivyignore is the escape hatch; keep it small and documented.
CI-integrated scanning fails the build on new Critical/High CVEs that have fixes.
Scanning is not enough alone. Pair with SBOMs, misconfig scanning, runtime security, and signing.

Common Mistakes

Treating raw scanner output as a to-do list. 4,000 CVEs is a signal, not 4,000 tickets.
Ignoring EPSS. "Critical" with 0.001 exploit probability is not an emergency.
Pinning base images by major tag only (python:3-slim). Non-reproducible builds, unpredictable CVE drift.
Not rebuilding regularly. Images rot: CVEs that were zero last month are open today.
Using .trivyignore as a bypass for CVEs you could actually fix. Fix the CVE if you can.
Distroless images with apk add curl in a debug branch, forgetting to remove it. Distroless is mostly "avoid adding packages after base."
Scanning only images, not their dependency manifests. SBOMs and language-level audits catch things OS-package scans miss.
Measuring "zero CVEs" as the goal. You will never hit it; "fixable CVEs older than X days" is a better metric.
Running scanners on images that were never signed. If you cannot prove an image came from your pipeline, scanning a random image tells you nothing about production.
Stopping after one scanner. Different scanners catch different things; running two (e.g., Trivy + Grype) gives better coverage.

KNOWLEDGE CHECK

Your CI scans a new image and Trivy reports 2 Critical and 8 High CVEs. The build fails. Your lead asks if you can just add those CVEs to `.trivyignore` to unblock the deploy. When is that the right call, and when is it not?

The Root Problem

Continue

Resource Limits and Health Checks

←→ navigateM toggle sidebar