All posts
Kubernetes Operations

kubectl drain Has Been Running for 4 Hours. Your PodDisruptionBudget Is Why.

Drains hang forever when a PodDisruptionBudget can never be satisfied. The four trap configurations, how to diagnose which one is biting, and the right PDB design that does not break node maintenance.

By Sharon Sahadevan··9 min read

You need to upgrade a node. You run:

kubectl drain node-7 --ignore-daemonsets --delete-emptydir-data

Output starts looking healthy:

node/node-7 cordoned
evicting pod prod/payments-7d8f-xyz
evicting pod prod/orders-9c2a-abc

Then this:

error when evicting pods/"payments-7d8f-xyz" -n "prod"
  (will retry after 5s):
  Cannot evict pod as it would violate the pod's disruption budget.

Four hours later, the drain is still retrying. Every 5 seconds. The pod is never evicted because the PodDisruptionBudget refuses. Your maintenance window is gone, the next on-call inherits the same problem, and the cluster is half-upgraded.

This is the PDB-drain-hang. It happens for one of four reasons, and the fix is fast once you know which one. This post is the four traps, the diagnostic that names them, and the right PDB design.

What a PDB actually does#

The eviction API (POST /api/v1/namespaces/.../pods/.../eviction) is the polite way to tell a pod to terminate. kubectl drain, the cluster autoscaler, and node-upgrade tools all use it. Unlike kubectl delete, the eviction API consults PodDisruptionBudgets first. If evicting this pod would violate any matching PDB, the eviction is rejected.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payments-pdb
  namespace: prod
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: payments

This says: "for any voluntary disruption involving pods labeled app: payments, at least 2 such pods must remain in the Ready state."

A PDB checks at the time of the eviction request. If currently 3 payments pods are Ready and you ask to evict one, leaving 2 Ready, allowed. If currently 2 are Ready, evicting one would leave 1, denied.

The eviction API rejection is what shows up in kubectl drain as "violate the pod's disruption budget." The drain retries forever because the PDB might become satisfiable later (a pod might come back to Ready). When it does not, drain hangs forever.

Trap 1: maxUnavailable: 0#

spec:
  maxUnavailable: 0

"Allow zero unavailable pods." Sounds like a strong availability guarantee. In practice: no eviction can ever proceed. Every eviction would make at least one pod unavailable temporarily during the rollout. Drain hangs forever, every time.

This is the most common cause and the easiest to fix. Replace with maxUnavailable: 1 (or a percentage):

spec:
  maxUnavailable: 1

If the workload genuinely cannot tolerate any unavailable pods, the workload needs more replicas, not a maxUnavailable: 0 PDB. A 2-replica workload with maxUnavailable: 1 is the right target if you can briefly run on 1 replica during maintenance.

Trap 2: minAvailable equal to (or greater than) replicas#

spec:
  minAvailable: 3
---
# In the Deployment:
spec:
  replicas: 3

"Always keep at least 3 of 3 available." Mathematically equivalent to maxUnavailable: 0. Drain hangs.

The bug: the PDB and the Deployment were designed independently. The Deployment has 3 replicas; the PDB requires 3 always; nobody noticed they cannot coexist with eviction.

Fix: keep PDB strictly less than replicas. For 3-replica workloads with PDB allowing 1 disruption:

spec:
  minAvailable: 2          # 3 replicas - 1 disruption budget

For workloads that scale, prefer maxUnavailable so the PDB tracks replica count automatically:

spec:
  maxUnavailable: 25%      # always allow 25% of replicas to be disrupted

maxUnavailable: 25% means: if 4 replicas, 1 can be disrupted; if 8 replicas, 2 can; if 12 replicas, 3 can. Scales with the deployment.

Trap 3: pods are not Ready in the first place#

The PDB counts Ready pods. A pod that is in Pending, CrashLoopBackOff, ImagePullBackOff, or any not-Ready state is treated as "currently unavailable" toward the PDB's allowance.

Scenario: 3-replica workload, minAvailable: 2. One pod is in CrashLoopBackOff (image bug from yesterday's deploy). Two are Ready. Drain tries to evict one of the Ready pods; PDB sees that would leave 1 Ready, less than minAvailable: 2, denies.

The drain hangs because of an unrelated bug (the crashlooping pod). Not until you fix the crashlooping pod can the drain proceed.

This is hostile because the failing workload hides the drain block. The on-call sees "drain stuck on PDB" and looks at the PDB config (which is fine), not at the unrelated crashlooping pod that is the actual cause.

How to diagnose:

# How many pods does the PDB think are Ready?
kubectl get pdb payments-pdb -n prod
# NAME           MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
# payments-pdb   2               N/A               0                     5d

# ALLOWED DISRUPTIONS = 0 means: the PDB has zero budget right now.
# Either the PDB itself is too strict, or some pods are not Ready.

# Check which pods match the PDB selector and their status
kubectl get pods -n prod -l app=payments -o wide
# Look for any not-Ready, Pending, CrashLoopBackOff

Fix: address the not-Ready pods first. After they recover, ALLOWED DISRUPTIONS becomes positive and the drain succeeds.

Trap 4: multiple PDBs cover the same pods#

Two PDBs with overlapping selectors:

# PDB 1
spec:
  selector:
    matchLabels:
      app: payments
  maxUnavailable: 1
---
# PDB 2
spec:
  selector:
    matchLabels:
      tier: critical
  maxUnavailable: 0

Pods labeled app: payments, tier: critical match both. The eviction API requires all matching PDBs to be satisfied. If any matching PDB has maxUnavailable: 0, all evictions are blocked even though the more permissive PDB would allow them.

This is hostile because the offending PDB might be in a different namespace or owned by a different team. The drain operator sees PDB-1 (looks fine) and does not realize PDB-2 (in a "system-critical" namespace, owned by another team) is the actual blocker.

How to diagnose:

# List all PDBs in the namespace
kubectl get pdb -n prod

# For each PDB, see which pods it covers
for pdb in $(kubectl get pdb -n prod -o name); do
  echo "=== $pdb ==="
  kubectl describe $pdb -n prod | grep -A 5 "Selector\|Allowed"
done

# Reverse lookup: for a specific pod, which PDBs cover it?
POD=payments-7d8f-xyz
LABELS=$(kubectl get pod $POD -n prod -o jsonpath='{.metadata.labels}')
# Manually compare labels against each PDB's selector
# (no built-in kubectl command for this, sadly)

Fix: ensure each pod is covered by exactly one PDB. Either remove the redundant PDB or refine the selector so they don't overlap.

How to break out of a hung drain#

Sometimes you need to make progress now and fix the underlying issue later. Three escalating options.

Option 1: temporarily relax the PDB.

kubectl edit pdb payments-pdb -n prod
# Change maxUnavailable from 0 to 1, or minAvailable from 3 to 2

The drain proceeds within seconds of the change.

Option 2: delete the PDB temporarily.

kubectl get pdb payments-pdb -n prod -o yaml > /tmp/pdb-backup.yaml
kubectl delete pdb payments-pdb -n prod
# Drain succeeds. Then:
kubectl apply -f /tmp/pdb-backup.yaml

Risky because for the duration the PDB is gone, all replicas of the workload are unprotected from concurrent disruptions.

Option 3: bypass the eviction API (last resort).

# Force-delete the pod directly. Bypasses PDB entirely.
kubectl delete pod payments-7d8f-xyz -n prod --force --grace-period=0

This is a sledgehammer. It does not respect the workload's graceful shutdown. Use only when the alternative is worse (cluster maintenance window expiring, security patch pending, etc.) and the workload can tolerate a hard kill.

How to design PDBs that work with maintenance#

Three rules for PDBs that do their job without breaking drains:

Rule 1: use maxUnavailable with a percentage, not minAvailable.

spec:
  maxUnavailable: 25%

Scales with replica count automatically. Cannot become "stricter than the deployment can support" by accident.

Rule 2: pin replica counts higher than your minimum availability.

If your workload genuinely needs 2 replicas always available, set replicas: 3 and maxUnavailable: 1. A 2-replica deployment with maxUnavailable: 0 cannot be drained. A 3-replica deployment with maxUnavailable: 1 can.

Rule 3: one PDB per workload. No overlapping selectors.

If multiple PDBs need to cover related workloads, give each its own PDB with a unique selector. A "blanket PDB" covering everything in a namespace is convenient and dangerous: a single workload's misconfiguration can deadlock everything.

# Good: per-workload PDB
spec:
  selector:
    matchLabels:
      app: payments

# Bad: namespace-wide PDB
spec:
  selector: {}    # matches every pod in the namespace

What to monitor#

Two metrics catch PDB-related issues before they bite:

# PDBs with zero allowed disruptions (cannot tolerate a drain)
kube_poddisruptionbudget_status_disruptions_allowed == 0

# PDBs with current pods less than expected (hidden block)
kube_poddisruptionbudget_status_current_healthy
  < kube_poddisruptionbudget_status_desired_healthy

The first one is your "drain will hang on this PDB" alert. The second is "this PDB has unhealthy pods, drains will fail."

Add both to a monitoring dashboard. Pre-drain, glance at the dashboard. Any PDB at zero allowed disruptions: fix before draining.

Quick reference: the hung-drain checklist#

1. Identify which pod is blocked:
   kubectl drain ...   # last "evicting pod" message tells you

2. Find the PDB(s) covering that pod:
   POD=...
   kubectl get pdb -n $NAMESPACE -o yaml \
     | yq '.items[] | select(.spec.selector.matchLabels...)'

3. For each matching PDB, check the disruption budget:
   kubectl get pdb $PDB -n $NAMESPACE
   - ALLOWED DISRUPTIONS = 0?
     - Why? maxUnavailable: 0? minAvailable >= replicas? unhealthy pods?
   - Fix the cause:
     - Edit PDB to relax (Trap 1, 2)
     - Fix unhealthy pods (Trap 3)
     - Resolve overlapping PDBs (Trap 4)

4. If stuck and need to make progress now:
   - Edit PDB to relax (least bad)
   - Force-delete pod with --grace-period=0 (last resort)

5. Pre-drain checklist for next time:
   - kube_poddisruptionbudget_status_disruptions_allowed > 0 for all PDBs
   - All pods covered by PDBs are Ready
   - Replica counts > minAvailable

The mental model#

A PodDisruptionBudget is a contract: "do not voluntarily disrupt my pods unless this many remain Ready." It is enforced at the eviction API, not at delete time. Drains use eviction. Force-delete bypasses.

The PDB's correctness depends on three things being true: the budget is reasonable (not zero), the deployment has more replicas than the budget requires, and the pods are healthy. Break any one and the budget becomes "no eviction ever," which means drains hang.

Most PDB-related production pain is caused by PDBs that were written once during a "we need availability" panic and never reviewed. Audit them when you set up the cluster, again when workloads scale, and pre-drain when something feels wrong. The 5 minutes you spend looking at kubectl get pdb is the difference between a fast drain and a 4-hour incident.


Node lifecycle, drain semantics, and the maintenance-window patterns that keep clusters operational are part of the Production Kubernetes Operations course. The cluster-upgrade flow specifically (where drains run as part of the upgrade) is covered in the Kubernetes Cluster Upgrades course.

More in Kubernetes Operations