The DevOpsBeast Blog

Production engineering notes.

Field notes on Kubernetes, GPUs, Linux, and the rest of the production stack, from engineers who run real infrastructure.

GPU·Apr 24, 2026·8 min read

Tuning vLLM gpu_memory_utilization Without Breaking Production

The default 0.9 is wrong for almost every production deployment. Here's how to pick the right number for your model, GPU, and traffic shape.