Kubernetes Performance Optimization

The Performance Mindset: Measure, Don't Guess

Your team says "the cluster is slow." Deployments take forever, pods take 90 seconds to start, and API calls are timing out. Everyone has a different theory. Nobody has data.

This is the opening of every cluster performance investigation I have ever been pulled into. "The cluster is slow" is a statement of feeling, not a statement of fact, and the difference between a junior engineer and a senior one shows up in the next five minutes. The junior engineer reaches for kubectl top nodes and starts adjusting things. The senior engineer asks: slow how, slow when, slow for whom, and against what baseline.

This lesson is the mindset that turns vague performance complaints into bounded engineering problems with measurable answers. Every other lesson in this course assumes this one.

The problem

"Slow" is the wrong word for production work. It collapses several distinct failure modes into a single complaint:

  • The API server is slow to respond to kubectl and to controller reconciliation
  • The scheduler is slow to bind pods to nodes
  • Pods are slow to go from Pending to Running
  • Workloads inside pods are slow to serve requests
  • Cluster autoscaling is slow to react to traffic
  • The whole cluster feels slow only at certain times of day

Each of these has a different cause, a different fix, and a different team that owns it. Treating them as one problem is the most common reason performance investigations stall. The first job is to figure out which kind of slow you have, and the only way to do that is to measure.

The other half of the problem is the absence of a baseline. If you do not know how long deployments "should" take or what API server p99 latency was last week, you cannot tell whether anything is actually worse. Half of "the cluster is slow" tickets I have investigated turned out to be "the cluster has always been this slow and someone finally noticed." That is a different problem from a regression and warrants a different conversation.

KEY CONCEPT

Performance work needs three things to even start: a specific failure mode (not "slow"), a baseline (what is the historical normal?), and a target (what number are we trying to hit?). Without all three, you are tuning blind. Adding the discipline to insist on these three before touching configuration is half the value a senior platform engineer brings.

How it works under the hood

Kubernetes performance is not one number. It is the composite of four largely-independent concerns, each with its own failure mode, its own metrics, and its own owner.

The four pillars of Kubernetes performance

Workload throughput (apps inside pods)

Requests per second, p99 latency, error rate. Owned by the application team. Affected by CPU throttling, memory pressure, kernel tuning, network performance. The pillar users actually feel.

Pod startup time (Pending to Ready)

Time from a pod being scheduled to being ready. Driven by image pull, container create, app startup, and readiness probes. Critical for autoscaling and rolling deploys.

Scheduling speed (Pending to Scheduled)

How quickly the scheduler binds pending pods to nodes. Driven by scheduler scoring plugins, queue depth, and the size of the node pool being scored.

Control plane latency (apiserver, etcd, controllers)

How quickly the apiserver responds, how quickly etcd persists state, how quickly controllers reconcile. Affects every operation the cluster performs.

Hover to expand each layer

Read this top to bottom and you have the order in which a single user-facing request hits each pillar. A user sends an HTTP request → the workload pod handles it (workload throughput). The pod was created earlier (pod startup time), placed by the scheduler (scheduling speed), through API calls that hit the control plane (control plane latency). When users say "the cluster is slow," they are seeing one or more of these layers; the diagnostic skill is identifying which.

Each pillar has a small set of canonical metrics. We cover them in detail in the next lesson; for the mindset lesson, the load-bearing point is that every "slow" complaint must be traced to one of these four, and you cannot fix anything until you know which.

Diagnosis and measurement

Before any tuning, three questions need answers. Refuse to start work until you have them.

Question 1: What specifically is slow?

Push back on "the cluster is slow" with concrete failure modes. Useful follow-ups:

  • Are kubectl get pods or kubectl apply slow? (control plane latency)
  • Are pods sitting in Pending for a long time? (scheduling speed or capacity)
  • Are pods sitting in ContainerCreating or PodInitializing? (pod startup time)
  • Are application requests slow once pods are serving? (workload throughput)
  • Is autoscaling failing to keep up with traffic? (could be any of the four)

Ten minutes of asking these questions saves ten hours of guessing.

Question 2: How slow, and against what baseline?

A specific number turns a vague feeling into a measurable target. Pull the relevant metric with concrete numbers:

# API server p99 request duration over the last 24 hours
histogram_quantile(0.99,
  sum by (le, verb) (
    rate(apiserver_request_duration_seconds_bucket[5m])
  )
)

# Pod startup time p95
histogram_quantile(0.95,
  sum by (le) (
    rate(kubelet_pod_start_duration_seconds_bucket[5m])
  )
)

# Scheduler attempt duration p99
histogram_quantile(0.99,
  sum by (le) (
    rate(scheduler_pod_scheduling_attempt_duration_seconds_bucket[5m])
  )
)

Without these numbers, "slow" is unfalsifiable. With them, you have a target.

Question 3: When did it start, and what changed?

Performance regressions almost always trace back to a recent change: a node pool resize, a Kubernetes upgrade, a new admission webhook, an extra DaemonSet, a workload that 10x'd traffic. Compare the metric over the last 7 days. The "before" date is your investigation starting point.

# Compare today vs last week
histogram_quantile(0.99,
  sum by (le) (
    rate(apiserver_request_duration_seconds_bucket[5m])
  )
)
/ ignoring(__name__)
(histogram_quantile(0.99,
  sum by (le) (
    rate(apiserver_request_duration_seconds_bucket[5m] offset 7d)
  )
))

A ratio of 1.0 means no change. 1.5 means 50% slower than last week. 3.0 means a real regression and a clear "look here first" pointer.

The fix

The "fix" for a performance complaint is rarely the first configuration change. It is the diagnostic loop you run on every complaint:

  1. Translate the vague complaint into one of the four pillars
  2. Pull the metric for that pillar with a specific number
  3. Compare to the baseline (last week, last month, the SLO)
  4. If there is a real regression, identify what changed in that window
  5. Apply the smallest possible fix
  6. Measure again; confirm the metric moved

The discipline is in step 6. Half of all "performance fixes" I have seen shipped in panic mode never got a follow-up measurement. Sometimes the fix did nothing. Sometimes it made things worse. Without a post-fix measurement, you will never know.

A useful structural change for any platform team that handles performance complaints regularly: build a performance baseline dashboard. One Grafana board, four panels (one per pillar), each showing the canonical p95/p99 metric over the last 7 days with an overlay of the prior 7 days. When a "the cluster is slow" ticket comes in, the first thing you do is open that dashboard. If nothing is regressing, the problem is somewhere else (the user's local network, a specific workload, a perception issue).

WAR STORY

A team I worked with kept getting "Kubernetes is slow" complaints from their developer platform. Every investigation took two days and found nothing. We built the four-pillar dashboard and instrumented kubectl calls to log latency client-side. Within a week we found the actual problem: the corporate VPN was adding 600ms to every kubectl call, and the cluster had been fine the whole time. The cluster team had been chasing ghost performance issues for six months. Lesson: always ask whether the problem is in the cluster or between the user and the cluster. The four-pillar dashboard plus client-side latency was the difference.

Before and after

A real performance investigation produces a before/after table. For the dashboard-driven workflow above, a typical result table for one investigation looks like:

MetricBefore (week of complaint)After (one week post-fix)Change
API server p99 latency2,400 ms180 ms13x improvement
Pod start p9542 s9 s4.7x improvement
Scheduling p993.2 s280 ms11x improvement
Cluster autoscaler ready time5 min90 s3.3x improvement

The point of capturing this is not vanity. It is the contract with whoever was complaining. "The cluster is no longer slow" is unverifiable. "P99 API latency dropped from 2.4s to 180ms" is what you put in the postmortem and on the slide.

Common mistakes

  • Starting with kubectl top nodes. Node CPU and memory utilization tell you almost nothing about whether the cluster is performing well. Jumping there first is the most common diagnostic mistake.
  • Tuning before measuring. Every config change carries risk. Without a baseline measurement, you cannot tell if your change helped, hurt, or did nothing.
  • Treating "slow" as a single problem. Slow API and slow pod startup have different causes and different fixes. Conflating them wastes hours.
  • No baseline dashboard. Every team that handles performance complaints regularly should have one. Without it, every investigation is from scratch.
  • No follow-up measurement. Shipping a fix without confirming it worked is shipping a fix on faith.
  • Confusing "high CPU utilization" with "performance problem." A node at 90% CPU running fine is healthy capacity utilization. A node at 30% CPU with throttled containers is a performance problem. Utilization is not the same as performance.
  • Letting users dictate which knob to turn. "We need bigger nodes" is a hypothesis, not a diagnosis. Treat it as one input among many; verify with metrics before committing.
INTERVIEW QUESTION

An engineer says "Kubernetes is slow." Walk me through how you'd diagnose and quantify the actual performance problem.