Liveness probes that fire before your app is ready. Readiness probes that check the database. Exec probes leaking zombie processes by the thousands. The six mistakes that turn health checks into the cause of the outage they were supposed to prevent.
A node runs out of memory. The kernel and the kubelet both pick which pod to kill. Neither of them picks the leaky one. They pick the well-behaved BestEffort pod next door. The QoS, oom_score_adj, and eviction-priority story most engineers never learn.
Kubernetes certificates expire silently. No warning, no alert, no graceful degradation, just a dead cluster. Here is how to fix it in five minutes and how to make sure it never happens again.