The DevOpsBeast Blog

Production engineering notes.

Field notes on Kubernetes, GPUs, Linux, and the rest of the production stack, from engineers who run real infrastructure.

RSS·subscribe in your feed reader

All (19)GPU (1)GPU Infrastructure (1)Kubernetes (1)Kubernetes Debugging (4)Kubernetes Networking (2)Kubernetes Operations (2)Kubernetes Performance (1)Linux (2)Networking (2)Security (3)

GPU Infrastructure·May 8, 2026·11 min read

Your 8B Model Won't Fit on an A100 With 50GB Free. Welcome to GPU Memory Fragmentation.

The model weights are 16GB. The KV cache is 20GB. The A100 has 80GB. nvidia-smi shows 50GB free. The next request OOMs. The CUDA memory allocator's fragmentation story most ML engineers never learn.

Read post