Docker & Container Fundamentals

Compose in Production?

A founding engineer at a growing startup runs every service on a single docker compose up -d on a single EC2 instance. Postgres, Redis, API, worker, cron jobs — all in one compose file. It has been running for 18 months, handling 500 req/s, with effectively zero downtime. The team keeps "migrating to Kubernetes" on the roadmap and keeps bumping it down the priority list because — genuinely — they do not need it yet. Eventually the compose stack will hit a ceiling, but the trigger has not come, and the team is shipping features instead of managing a control plane.

"Compose in production" is a contested topic. Some engineers swear it is always wrong. Others run production on it for years. Both are right, depending on the workload. This lesson is the honest tradeoff analysis: what compose gives you in production, what it does not, when the jump to Kubernetes actually pays off, and how the concepts transfer when you do make that jump.


What Compose Actually Is in Production Terms

A single compose file on a single host is a production-grade setup for single-host workloads. It gives you:

  • Declarative service definition. Deploys are git pull && docker compose up -d --build.
  • Restart policies. restart: unless-stopped survives host reboots (if the Docker daemon is enabled at boot).
  • Resource limits. Cgroup-backed memory/CPU caps per service.
  • Network isolation between stacks. Compose projects get their own networks.
  • Volume management. Named volumes backed by local disk or network storage drivers.
  • Dependency gating. depends_on with healthchecks.
  • Secrets. File-based secrets (simple but real).

It does not give you:

  • High availability. One host = one point of failure.
  • Horizontal scaling across hosts. You can --scale web=3, but all three are on the same machine.
  • Self-healing across nodes. If the host dies, everything dies.
  • Rolling updates with health gates. Compose stops and starts; it does not "drain connections, bring up new, check health, shift traffic."
  • Service mesh, ingress, mTLS, policy — the cloud-native control plane.

These are Kubernetes concerns. The question is: do you need them yet?

KEY CONCEPT

The single best predictor of "should I use compose or Kubernetes in prod" is whether your workload fits on one machine and how much downtime you can tolerate. If you can tolerate "30 seconds of downtime per deploy" and your traffic fits on a single beefy VM, compose is viable for years. The moment you need "zero-downtime deploys across a fleet," you want an orchestrator. Pretending otherwise just moves pain around.


When Compose Is Good Enough

Signals that compose is the right tool for now:

  1. One (or two) hosts total. Primary production + a backup/standby.
  2. Low-to-moderate traffic. Fits comfortably on one beefy VM (say < 2000 req/s at typical p99 targets).
  3. Low downtime tolerance — minutes per deploy is acceptable, rather than seconds.
  4. Small team. Nobody has time to operate Kubernetes.
  5. Stateful services work naturally. Postgres + volume + simple compose file; no StatefulSet / CSI headache.

Real-world fits:

  • Internal tools. A company's admin dashboard, metabase, internal API — no public-facing SLA pressure.
  • Early-stage startups. Pre-product-market-fit, simple deploy model, one compose file per environment.
  • Edge / on-prem deployments. A single compose file on a customer's VM; no cloud-native expectations.
  • Simple stacks that don't need the fleet story. A wiki, a pricing API, a Slack bot — these do not need 30 replicas and a service mesh.

What a production-leaning compose file looks like

# compose.prod.yaml
name: myapp
services:
  api:
    image: ghcr.io/myorg/api:${VERSION}        # NOT :latest
    restart: unless-stopped
    environment:
      NODE_ENV: production
      DATABASE_URL: ${DATABASE_URL}
    secrets:
      - api_key
    deploy:
      resources:
        limits:
          memory: 1G
          cpus: "1.0"
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
      interval: 15s
      timeout: 3s
      retries: 3
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "5"
    depends_on:
      db: { condition: service_healthy }
    security_opt:
      - no-new-privileges:true
    read_only: true
    tmpfs:
      - /tmp

  db:
    image: postgres:16
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./backups:/backups
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password
    secrets:
      - postgres_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]

  proxy:
    image: nginx:alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./certs:/etc/nginx/certs:ro
    depends_on:
      api: { condition: service_healthy }

volumes:
  pgdata:

secrets:
  api_key:
    file: ./secrets/api_key
  postgres_password:
    file: ./secrets/postgres_password

Pair it with:

  • Host-level boot enable for Docker: systemctl enable docker.
  • Volume backups: a cron job that pg_dumps and uploads to S3.
  • Host monitoring: node-exporter + Prometheus somewhere collecting metrics.
  • A reverse proxy for TLS: nginx or Caddy in a compose service, or the hosted traefik with Let's Encrypt.

This stack is "production" in a meaningful sense — for its target tier.


Deploying With Compose

# On the production host (over SSH or via CI)
cd /srv/myapp
git pull
docker compose -f compose.prod.yaml pull         # pull new image tags
docker compose -f compose.prod.yaml up -d        # recreate changed services

This is the "git-based" deploy: the compose file is in git, the prod host pulls it, pull new images, up -d. Simple, auditable, rollback-by-git-revert.

Alternatives:

  • CI builds + pushes image + SSHes to prod to run docker compose pull && up. Cleaner; removes the "prod needs source checkout" dependency.
  • watchtower — a compose service that auto-pulls and restarts when new image tags are available. Works but makes deploys less deterministic.

Zero-downtime (ish) with compose

services:
  api:
    deploy:
      update_config:
        order: start-first            # start new before stopping old

With order: start-first, docker compose up -d starts the new container, waits for its healthcheck, then stops the old. Not truly zero-downtime — two containers briefly bind the same port? — but handling shared ports via the reverse proxy (nginx picking up the new container via healthcheck) gets you close.

For real zero-downtime you generally want an orchestrator.


Where Compose Falls Short

No rolling updates

docker compose up -d stops all changed containers and starts new ones. For services with multiple replicas on one host (--scale web=3), this is close to coordinated recreation, but it is not a rolling update with per-instance health gates.

No horizontal scaling across hosts

docker compose --scale web=5 runs 5 containers on the same host. If you need to spread across hosts, compose does not help. Docker Swarm was the "compose on a cluster" answer, but it has fallen out of favor — most teams going multi-host jump to Kubernetes instead.

Limited secrets story

secrets: in compose loads from files on the host. Fine for small scale; it does not integrate with Vault, AWS Secrets Manager, GCP Secret Manager the way Kubernetes External Secrets or CSI Secret drivers do.

No built-in observability

Kubernetes gives you metrics, events, audit logs, and a declared-state loop for free. Compose gives you docker logs and docker stats. You bolt on observability externally (Prometheus + cAdvisor + node-exporter + a dashboard).

Host failure = full outage

One host dying takes everything with it. A standby host + some DNS failover helps but it is manual. No node auto-healing.


When to Move to Kubernetes

You are likely ready for Kubernetes when:

  • You have multiple hosts to manage. Running the same compose file on 5 hosts and not using an orchestrator is an accident waiting to happen.
  • You need zero-downtime deploys at meaningful traffic. Rolling updates across replicas are Kubernetes' job.
  • You have more than one team deploying to shared infrastructure. K8s' namespaces, RBAC, and resource quotas are designed for this.
  • Your workload needs self-healing across nodes. A host dies → pods reschedule. Compose cannot do this.
  • You want a platform. Kubernetes gives your team a common substrate; different services, same deploy patterns.

You are not ready when:

  • You have one host and no strong SLA.
  • Your team does not have the bandwidth to operate a control plane.
  • Most of your workload is stateful and poorly-suited to orchestration (big monolithic DBs).

Middle ground: managed Kubernetes (GKE, EKS, AKS, DigitalOcean K8s) dramatically lowers the operational cost. The control plane is managed; you operate workloads. For a team of 5-10 running 10+ services, a managed K8s cluster is usually worth it. For a team of 2 with 3 services, compose on a VM is probably fine.


How Compose Concepts Map to Kubernetes

When you do make the jump, the mental model is surprisingly linear:

Compose conceptKubernetes equivalentNotes
services: entryDeploymentLong-running container(s); multiple replicas across nodes
services: for a DBStatefulSetStable network identity, ordered start/stop
networks: defaultCluster-wide SDN (CNI)Every pod can reach every pod by default
networks: internal: trueNetworkPolicyExplicit allow/deny between pods
volumes: namedPersistentVolumeClaim + PersistentVolumeDeclarative storage, often backed by CSI
ports: "80:8080"Service (type ClusterIP / NodePort / LoadBalancer)Virtual IP + label selector to pods
Public HTTP entryIngress + controller (nginx, Traefik, ALB)Compose's nginx is the Kubernetes Ingress pattern
depends_on: db: condition: service_healthyinitContainers + readiness probesK8s doesn't gate pod start on another pod's health; probes drain traffic
environment:ConfigMap + env refs, Secret + env refsSeparates values from manifests
secrets:Secret (base64 in etcd, or external)K8s Secrets are not encrypted by default; use External Secrets for Vault/AWS SM
deploy.resourcesresources.requests + resources.limitsSame cgroup underpinnings
healthcheck:livenessProbe, readinessProbe, startupProbeMore granular than compose
restart: unless-stoppedDefault pod behaviorK8s restarts failed pods automatically

A near-literal translation

# Compose
services:
  api:
    image: ghcr.io/myorg/api:v1.2.3
    ports:
      - "8080:8080"
    environment:
      DATABASE_URL: postgres://db:5432/app
    depends_on:
      db: { condition: service_healthy }

  db:
    image: postgres:16
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:
# Kubernetes (abbreviated)
apiVersion: apps/v1
kind: Deployment
metadata: { name: api }
spec:
  replicas: 3
  selector: { matchLabels: { app: api } }
  template:
    metadata: { labels: { app: api } }
    spec:
      containers:
      - name: api
        image: ghcr.io/myorg/api:v1.2.3
        env:
        - name: DATABASE_URL
          value: postgres://db:5432/app
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata: { name: api }
spec:
  selector: { app: api }
  ports: [{ port: 8080, targetPort: 8080 }]
---
apiVersion: apps/v1
kind: StatefulSet
metadata: { name: db }
spec:
  serviceName: db
  replicas: 1
  selector: { matchLabels: { app: db } }
  template:
    metadata: { labels: { app: db } }
    spec:
      containers:
      - name: db
        image: postgres:16
        volumeMounts:
        - { name: data, mountPath: /var/lib/postgresql/data }
  volumeClaimTemplates:
  - metadata: { name: data }
    spec:
      accessModes: [ReadWriteOnce]
      resources: { requests: { storage: 20Gi } }
---
apiVersion: v1
kind: Service
metadata: { name: db }
spec:
  selector: { app: db }
  clusterIP: None  # headless, for StatefulSet DNS
  ports: [{ port: 5432 }]

More YAML, but each piece corresponds directly to a compose concept. The skills you built with compose transfer.

PRO TIP

If you are moving compose → Kubernetes, kompose convert -f compose.yaml is a real tool that generates Kubernetes manifests from your compose file. It is not the right answer for long-term maintenance (generated YAML is ugly) but it is a great way to see the mapping and get started.


Swarm: The "Compose on a Cluster" Answer Nobody Uses

Docker Swarm lets you run compose-like stacks across multiple nodes. It supports rolling updates, per-service secrets, cluster networking, and works with your existing compose files plus a few deploy: fields.

It was popular ~2016-2018 and has since lost mindshare to Kubernetes. Most production fleets today are Kubernetes; Swarm persists in specific niches (on-prem simplicity, IoT fleets) but the ecosystem has shrunk.

If you are considering Swarm today, you are probably better off with either (a) compose on one host, or (b) Kubernetes (managed). Swarm's middle ground is a lonely place.


Real-World Examples

  • Production compose is fine for: WordPress blogs, internal admin panels, small SaaS pilots, side projects with real users, edge deployments, pre-Series-A startups.
  • Production compose breaks down for: consumer products with real SLA pressure, multi-tenant platforms, workloads that need horizontal scale, teams where operations and development are separate roles.
WAR STORY

A team ran a "temporary" compose stack on a single EC2 instance for two years. It grew to handle 40 million requests per month with one unplanned outage (a full-disk event that took about an hour to fix). They eventually migrated to EKS — not because compose stopped working, but because their team tripled and the new engineers expected a Kubernetes workflow. The migration took 3 weeks. The 18 months of "we should migrate to K8s" before that would have been pure opportunity cost if they had started earlier. Rule of thumb: migrate when the cost of not migrating exceeds the cost of migrating, not before.


Key Concepts Summary

  • Compose is single-host. Production-viable for the right workloads, obviously wrong for multi-host.
  • Production compose signals: one host fits, minutes of downtime per deploy is OK, small team, simple architecture.
  • Kubernetes signals: multiple hosts, zero-downtime deploys needed, self-healing required, platform for multiple teams.
  • The translation is direct: compose services → Deployments, networks → CNI, volumes → PVCs, depends_on → probes, secrets: → Secrets + External Secrets.
  • Swarm is rarely the right answer today. Either compose or Kubernetes.
  • Managed Kubernetes (EKS, GKE, AKS) lowers operational cost dramatically; it is the usual destination when compose outgrows its role.
  • Migration does not have to be all-at-once. Stateless services move first; stateful services last.

Common Mistakes

  • "Compose is never production-ready." It is, for the right scale. The argument becomes real as you scale up.
  • Running multi-host without an orchestrator. "Five nodes all with the same compose file" is a recipe for drift and manual fixes.
  • Migrating to Kubernetes too early. A team without K8s experience running 3 services on K8s is paying heavy overhead for no benefit.
  • Migrating to Kubernetes too late. A team of 30 engineers running 20 services on a single compose host is about to have a bad time when that host dies.
  • Using :latest tags in production compose files. Same mistake as with Kubernetes; deploys become unpredictable.
  • Forgetting the backup story. Compose is easy to deploy; nothing about it backs up your data. Volume backups are your responsibility.
  • Running compose on a laptop and claiming "it works in production." Production has SLAs, monitoring, logs, disaster recovery — compose is one piece of that, not the whole thing.
  • Pretending Swarm fills the gap between compose and Kubernetes. It does not anymore; the ecosystem moved on.
  • Skipping healthchecks in production compose. Without them, restart: unless-stopped only catches hard crashes, not hangs.
  • Bind-mounting config files that the prod team edits directly. Commit the configs to git and mount from the checkout, or use a proper config service — never "ssh in and edit."

KNOWLEDGE CHECK

Your team has grown from 3 to 15 engineers. Production is 4 services on a single compose file on one beefy VM; you deploy 3x/week with ~60s of downtime per deploy. Traffic is growing. Several teammates are pushing to migrate to Kubernetes because they think you are too big for compose. What is the right framing to decide?