The Performance Mindset: Measure, Don't Guess
Your team says "the cluster is slow." Deployments take forever, pods take 90 seconds to start, and API calls are timing out. Everyone has a different theory. Nobody has data.
This is the opening of every cluster performance investigation I have ever been pulled into. "The cluster is slow" is a statement of feeling, not a statement of fact, and the difference between a junior engineer and a senior one shows up in the next five minutes. The junior engineer reaches for kubectl top nodes and starts adjusting things. The senior engineer asks: slow how, slow when, slow for whom, and against what baseline.
This lesson is the mindset that turns vague performance complaints into bounded engineering problems with measurable answers. Every other lesson in this course assumes this one.
The problem
"Slow" is the wrong word for production work. It collapses several distinct failure modes into a single complaint:
- The API server is slow to respond to
kubectland to controller reconciliation - The scheduler is slow to bind pods to nodes
- Pods are slow to go from
PendingtoRunning - Workloads inside pods are slow to serve requests
- Cluster autoscaling is slow to react to traffic
- The whole cluster feels slow only at certain times of day
Each of these has a different cause, a different fix, and a different team that owns it. Treating them as one problem is the most common reason performance investigations stall. The first job is to figure out which kind of slow you have, and the only way to do that is to measure.
The other half of the problem is the absence of a baseline. If you do not know how long deployments "should" take or what API server p99 latency was last week, you cannot tell whether anything is actually worse. Half of "the cluster is slow" tickets I have investigated turned out to be "the cluster has always been this slow and someone finally noticed." That is a different problem from a regression and warrants a different conversation.
Performance work needs three things to even start: a specific failure mode (not "slow"), a baseline (what is the historical normal?), and a target (what number are we trying to hit?). Without all three, you are tuning blind. Adding the discipline to insist on these three before touching configuration is half the value a senior platform engineer brings.
How it works under the hood
Kubernetes performance is not one number. It is the composite of four largely-independent concerns, each with its own failure mode, its own metrics, and its own owner.
The four pillars of Kubernetes performance
Requests per second, p99 latency, error rate. Owned by the application team. Affected by CPU throttling, memory pressure, kernel tuning, network performance. The pillar users actually feel.
Time from a pod being scheduled to being ready. Driven by image pull, container create, app startup, and readiness probes. Critical for autoscaling and rolling deploys.
How quickly the scheduler binds pending pods to nodes. Driven by scheduler scoring plugins, queue depth, and the size of the node pool being scored.
How quickly the apiserver responds, how quickly etcd persists state, how quickly controllers reconcile. Affects every operation the cluster performs.
Hover to expand each layer
Read this top to bottom and you have the order in which a single user-facing request hits each pillar. A user sends an HTTP request → the workload pod handles it (workload throughput). The pod was created earlier (pod startup time), placed by the scheduler (scheduling speed), through API calls that hit the control plane (control plane latency). When users say "the cluster is slow," they are seeing one or more of these layers; the diagnostic skill is identifying which.
Each pillar has a small set of canonical metrics. We cover them in detail in the next lesson; for the mindset lesson, the load-bearing point is that every "slow" complaint must be traced to one of these four, and you cannot fix anything until you know which.
Diagnosis and measurement
Before any tuning, three questions need answers. Refuse to start work until you have them.
Question 1: What specifically is slow?
Push back on "the cluster is slow" with concrete failure modes. Useful follow-ups:
- Are
kubectl get podsorkubectl applyslow? (control plane latency) - Are pods sitting in
Pendingfor a long time? (scheduling speed or capacity) - Are pods sitting in
ContainerCreatingorPodInitializing? (pod startup time) - Are application requests slow once pods are serving? (workload throughput)
- Is autoscaling failing to keep up with traffic? (could be any of the four)
Ten minutes of asking these questions saves ten hours of guessing.
Question 2: How slow, and against what baseline?
A specific number turns a vague feeling into a measurable target. Pull the relevant metric with concrete numbers:
# API server p99 request duration over the last 24 hours
histogram_quantile(0.99,
sum by (le, verb) (
rate(apiserver_request_duration_seconds_bucket[5m])
)
)
# Pod startup time p95
histogram_quantile(0.95,
sum by (le) (
rate(kubelet_pod_start_duration_seconds_bucket[5m])
)
)
# Scheduler attempt duration p99
histogram_quantile(0.99,
sum by (le) (
rate(scheduler_pod_scheduling_attempt_duration_seconds_bucket[5m])
)
)
Without these numbers, "slow" is unfalsifiable. With them, you have a target.
Question 3: When did it start, and what changed?
Performance regressions almost always trace back to a recent change: a node pool resize, a Kubernetes upgrade, a new admission webhook, an extra DaemonSet, a workload that 10x'd traffic. Compare the metric over the last 7 days. The "before" date is your investigation starting point.
# Compare today vs last week
histogram_quantile(0.99,
sum by (le) (
rate(apiserver_request_duration_seconds_bucket[5m])
)
)
/ ignoring(__name__)
(histogram_quantile(0.99,
sum by (le) (
rate(apiserver_request_duration_seconds_bucket[5m] offset 7d)
)
))
A ratio of 1.0 means no change. 1.5 means 50% slower than last week. 3.0 means a real regression and a clear "look here first" pointer.
The fix
The "fix" for a performance complaint is rarely the first configuration change. It is the diagnostic loop you run on every complaint:
- Translate the vague complaint into one of the four pillars
- Pull the metric for that pillar with a specific number
- Compare to the baseline (last week, last month, the SLO)
- If there is a real regression, identify what changed in that window
- Apply the smallest possible fix
- Measure again; confirm the metric moved
The discipline is in step 6. Half of all "performance fixes" I have seen shipped in panic mode never got a follow-up measurement. Sometimes the fix did nothing. Sometimes it made things worse. Without a post-fix measurement, you will never know.
A useful structural change for any platform team that handles performance complaints regularly: build a performance baseline dashboard. One Grafana board, four panels (one per pillar), each showing the canonical p95/p99 metric over the last 7 days with an overlay of the prior 7 days. When a "the cluster is slow" ticket comes in, the first thing you do is open that dashboard. If nothing is regressing, the problem is somewhere else (the user's local network, a specific workload, a perception issue).
A team I worked with kept getting "Kubernetes is slow" complaints from their developer platform. Every investigation took two days and found nothing. We built the four-pillar dashboard and instrumented kubectl calls to log latency client-side. Within a week we found the actual problem: the corporate VPN was adding 600ms to every kubectl call, and the cluster had been fine the whole time. The cluster team had been chasing ghost performance issues for six months. Lesson: always ask whether the problem is in the cluster or between the user and the cluster. The four-pillar dashboard plus client-side latency was the difference.
Before and after
A real performance investigation produces a before/after table. For the dashboard-driven workflow above, a typical result table for one investigation looks like:
| Metric | Before (week of complaint) | After (one week post-fix) | Change |
|---|---|---|---|
| API server p99 latency | 2,400 ms | 180 ms | 13x improvement |
| Pod start p95 | 42 s | 9 s | 4.7x improvement |
| Scheduling p99 | 3.2 s | 280 ms | 11x improvement |
| Cluster autoscaler ready time | 5 min | 90 s | 3.3x improvement |
The point of capturing this is not vanity. It is the contract with whoever was complaining. "The cluster is no longer slow" is unverifiable. "P99 API latency dropped from 2.4s to 180ms" is what you put in the postmortem and on the slide.
Common mistakes
- Starting with
kubectl top nodes. Node CPU and memory utilization tell you almost nothing about whether the cluster is performing well. Jumping there first is the most common diagnostic mistake. - Tuning before measuring. Every config change carries risk. Without a baseline measurement, you cannot tell if your change helped, hurt, or did nothing.
- Treating "slow" as a single problem. Slow API and slow pod startup have different causes and different fixes. Conflating them wastes hours.
- No baseline dashboard. Every team that handles performance complaints regularly should have one. Without it, every investigation is from scratch.
- No follow-up measurement. Shipping a fix without confirming it worked is shipping a fix on faith.
- Confusing "high CPU utilization" with "performance problem." A node at 90% CPU running fine is healthy capacity utilization. A node at 30% CPU with throttled containers is a performance problem. Utilization is not the same as performance.
- Letting users dictate which knob to turn. "We need bigger nodes" is a hypothesis, not a diagnosis. Treat it as one input among many; verify with metrics before committing.
An engineer says "Kubernetes is slow." Walk me through how you'd diagnose and quantify the actual performance problem.