Kubernetes Debugging··11 min read
How to Debug Kubernetes OOMKilled (Exit Code 137): The Complete Guide
Three completely different problems hide behind exit code 137. Most engineers fix the wrong one and the pod keeps crashing.
Read postThe DevOpsBeast Blog
Field notes on Kubernetes, GPUs, Linux, and the rest of the production stack, from engineers who run real infrastructure.
Three completely different problems hide behind exit code 137. Most engineers fix the wrong one and the pod keeps crashing.
Read postYour pod's memory limit isn't measuring what you think it is. A tour of cgroup v2 accounting and the surprises hiding inside memory.current.
Read postThe default 0.9 is wrong for almost every production deployment. Here's how to pick the right number for your model, GPU, and traffic shape.
Read postEvery Kubernetes upgrade I've watched fail in production failed for a reason that was visible an hour earlier. Here's the checklist.
Read post