Linux Fundamentals for Engineers

Performance Triage (The USE Method)

A production API's p99 latency just doubled. Ten engineers are in a war-room chat. Someone shares a CPU graph, "CPU looks fine." Someone else shares a memory graph, "memory looks fine." The APM says some requests are slow, others are fast. Nobody knows where to look next. An hour of staring at dashboards produces no hypothesis. Eventually a senior engineer ssh's into one affected node and runs four commands, uptime, vmstat 1 5, iostat -x 1 3, sar -n DEV 1 3, and says "the disk is saturated, see that 12ms await and 100% util on nvme1n1? That is where to look." Two minutes of terminal work outperformed an hour of dashboards.

Linux gives you better performance diagnostics than any monitoring tool, if you know which command to run for which resource. Brendan Gregg's USE method is the checklist: for every resource, measure Utilization, Saturation, and Errors. Work through the list in under five minutes and you know where the bottleneck is. This lesson is the USE method as a production triage recipe, with the specific commands for each resource on Linux.

What USE Is

USE: Utilization, Saturation, Errors. For every resource in the system, check these three:

Utilization: what percentage of time the resource was busy doing work.
Saturation: how much extra demand is queued up for the resource (work waiting to be done).
Errors: how many errors the resource has reported.

The insight: a resource is a bottleneck when its utilization is high and there is saturation. Utilization alone is not enough, a CPU at 100% but no queue is just doing its job. A CPU at 50% with a queue of waiting threads is a dispatcher problem. Saturation is the signal that work is waiting.

The method is exhaustive by design: you list every resource, tick each box, and move on. It stops you chasing the wrong thing and hands you the shortest path to the right one.

KEY CONCEPT

USE is not a tool, it is a checklist. Apply it to every resource your system has: CPUs, memory, disks, network interfaces, interconnects. The engineer who memorizes "utilization and saturation for every resource" and knows the Linux command for each has a 5-minute diagnostic that beats any dashboard on unfamiliar systems.

The Resources on a Linux Box

For a typical production server, the resources you check are:

Resource	Utilization	Saturation	Errors
CPU	`%CPU` in top, `vmstat` `us/sy/wa/id`	run queue length (`vmstat` `r`), load avg	`perf stat`, rare
Memory	`used/total` in free	swap used, `vmstat si/so`, PSI memory	OOM kills in dmesg
Disks	`%util` per device (with caveats)	`await`, `aqu-sz`	read/write errors in dmesg, SMART
Network	`%rxbw` / `%txbw` vs NIC max	backlog queues, retransmits	`netstat -s`, interface errors
File descriptors	`/proc/sys/fs/file-nr`	n/a (FDs do not queue)	EMFILE errors
Kernel socket buffers	depends	`netstat -s` overruns	`Tcp: drops` counters

Everything below is the specific command per cell.

The 60-Second Triage

Brendan Gregg's own recommendation for the first minute on a slow Linux box:

uptime
dmesg -T | tail
vmstat -SM 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top

Let us walk through the ones that matter most for USE.

CPU

Utilization

# Overall — user/system/iowait/idle per CPU, updated every second
mpstat -P ALL 1 3
# 10:00:01  CPU   %usr  %nice  %sys  %iowait  %irq  %soft  %idle
# 10:00:02  all   25.5   0.0    3.5    12.0    0.0    0.5   58.5
# 10:00:02    0   40.0   0.0    5.0    20.0    0.0    1.0   34.0
# 10:00:02    1   10.0   0.0    2.0     5.0    0.0    0.0   83.0
# ...
# One CPU at 90%+ while others idle = single-threaded bottleneck

# Quicker, classic
top                  # then press '1' to spread out per-CPU
htop                 # prettier, same info; press 'F2' to customize columns

# Per-process
pidstat -u 1 3
# 10:00:01  UID   PID  %usr %system %guest %CPU   CPU  Command
# 10:00:02  1000 1234  25.0    3.0    0.0  28.0    2   myapp

Saturation

CPU saturation = threads waiting to run. Two ways to measure:

# The run queue length from vmstat — column 'r'
vmstat -SM 1 5
# procs  ---memory--- ---swap-- -----io---- --system-- ----cpu----
#  r  b   swpd   free  si   so  bi    bo  in  cs  us sy id wa st
#  4  0      0   4096   0    0   3    12 100 230  20  5 70  5  0
#  ^ runnable threads
# r > number of CPUs consistently = CPU saturation

# Pressure Stall Information — the modern answer
cat /proc/pressure/cpu
# some avg10=12.34 avg60=8.90 avg300=5.67 total=123456789
# CPU PSI measures the % of time *something* was stalled waiting on CPU
# "some avg10=12.34" = 12% of the last 10s had threads waiting for CPU

PSI (Pressure Stall Information) is the single best saturation signal Linux exposes: it is per-resource, normalized, and available for CPU, memory, and I/O. Monitor it.

Errors

CPU errors are rare but serious:

# Hardware errors
dmesg -T | grep -i 'mce\|machine check\|cpu thermal'

# MCE daemon logs (if installed)
sudo cat /var/log/mcelog 2>/dev/null

If your CPU is reporting MCEs, replace the hardware.

Memory

Utilization

# The easy one
free -h
#                total   used   free   shared  buff/cache  available
# Mem:            32Gi  14Gi   2Gi      1Gi      16Gi         17Gi
# Swap:          4.0Gi   0B   4.0Gi
# "used" excludes reclaimable cache. "available" is the number that matters.

cat /proc/meminfo | head
# MemTotal:       32893400 kB
# MemFree:         1823440 kB
# MemAvailable:   18435212 kB    <- this is what you monitor
# Buffers:          120456 kB
# Cached:         14280432 kB
# ...

Saturation

Memory saturation means the system is straining to free memory: swapping, reclaiming, or OOM-killing.

# Are we swapping? si/so in vmstat (in KiB/s)
vmstat -SM 1 5
# si  so
#  0   0                    <- no swap activity = good
# 12  48                     <- swapping in and out = bad

# Per-process swap usage — more detail
for pid in $(ps -eo pid --no-headers); do
  swap=$(awk '/^Swap:/{sum+=$2} END {print sum}' /proc/$pid/status 2>/dev/null)
  [ -n "$swap" ] && [ "$swap" -gt 0 ] && echo "$swap kB  $pid $(ps -o comm= -p $pid)"
done | sort -n | tail

# Memory PSI
cat /proc/pressure/memory
# some avg10=0.12 avg60=0.34 avg300=0.56 total=12345
# full avg10=0.00 avg60=0.01 avg300=0.03 total=123
# "full" > 0 = every process was stalled; very bad sign

Errors

# OOM-killer has been busy?
dmesg -T | grep -i 'killed process\|oom-killer\|out of memory' | tail
# [Fri Apr 19 09:00:12 2026] Out of memory: Kill process 12345 (myapp) score 890 or ...
# [Fri Apr 19 09:00:12 2026] Killed process 12345 (myapp), UID 1000, total-vm:16000000kB, anon-rss:8000000kB

# Per-cgroup OOM count
grep -H '^oom_kill' /sys/fs/cgroup/*/memory.events 2>/dev/null

Disks

Utilization

# The production workhorse
iostat -xz 1 3
# Device     r/s     w/s    rkB/s    wkB/s  ...  r_await  w_await  aqu-sz  %util
# nvme0n1   241.0   38.0   12032    4864        0.28     0.75     0.09    16.4
# nvme1n1  1203.0  421.0   90240   16384        4.21    12.34     5.80    94.7
#                                              ^^^^^^^^^^^^^^^^^  ^^^^^   ^^^^
#                                              saturation         queue   util

# Watch %util and await:
# - %util high + await high = saturation
# - %util high + await low = busy but fine
# - %util high on NVMe: misleading (see Module 3 Lesson 3)

Saturation

aqu-sz (average queue size, renamed from avgqu-sz) is the direct saturation metric for disks. await (average latency) combines service time + queue time, rising await with low r/s + w/s is queuing.

# I/O Pressure Stall Info
cat /proc/pressure/io
# some avg10=8.23 avg60=4.56 avg300=2.34 total=1234567
# full avg10=1.23 avg60=0.56 avg300=0.23 total=12345

# Per-process I/O stats
pidstat -d 1 3
# 10:00:01  UID   PID   kB_rd/s   kB_wr/s   kB_ccwr/s  iodelay  Command
# 10:00:02  1000  1234  11200.0    8300.0        0.00       12  postgres

sudo iotop -oPa     # interactive

Errors

# I/O errors on block devices
dmesg -T | grep -Ei 'i/o error|blk_update_request|ata.*error|nvme.*error'

# SMART status — predicts drive failure
sudo smartctl -H /dev/nvme0      # SMART overall-health self-assessment test result: PASSED
sudo smartctl -a /dev/nvme0 | head -20

Network

Utilization

# Per-interface bandwidth
sar -n DEV 1 5
# Time   IFACE   rxpck/s  txpck/s  rxkB/s  txkB/s  rxcmp/s  txcmp/s  rxmcst/s  %ifutil
# 10:00   eth0   12000     8500   12500    8200      0        0        1        6.40
# 10:00   lo       400       400     120     120      0        0        0        0.00

# Also works: nload, iftop, bmon for interactive

%ifutil is Linux computing utilization relative to the link's declared speed (which ethtool eth0 reports). For high-speed NICs or virtualized interfaces, the declared speed may be wrong, double-check with throughput benchmarks.

Saturation

Network saturation shows as backlog queues, retransmits, and dropped packets:

# TCP-level saturation
sar -n TCP,ETCP 1 5
# TCP  active/s passive/s iseg/s  oseg/s
#        1.0      0.5     120     150
# ETCP  atmptf/s estres/s retrans/s isegerr/s  orsts/s
#        0.0      0.0      5.0       0.0       0.5
#                          ^^^ retransmits = network or receiver problem

# Detailed counters
netstat -s | head -30
# Tcp:
#   1234 active connections openings
#   5678 passive connection openings
#   12 failed connection attempts
#   3 connection resets received                  <- resets = trouble
#   150 connections established
#   42000 segments received
#   38500 segments send out
#   123 segments retransmitted                     <- retransmits
#   0 bad segments received

# Dropped packets at the NIC
ip -s link show eth0 | head
# 2: eth0: <BROADCAST,...> mtu 1500 ...
#     link/ether ... brd ff:ff:ff:ff:ff:ff
#     RX:  bytes  packets  errors  dropped  overrun  mcast
#      32409283...  450000      0       12        0      1
#                                        ^^^ dropped packets = saturation or wrong size

Errors

# Interface errors
ip -s link show eth0
ethtool -S eth0 | grep -i error
# rx_errors: 0
# tx_errors: 0
# rx_length_errors: 0
# rx_crc_errors: 0
# ...

# Connection-level reset rate
ss -s
# Total: 200
# TCP:   128 (estab 100, closed 10, orphaned 3, timewait 15)
# ...

# Ring buffer / NIC saturation
ethtool -S eth0 | grep -E 'discard|drop|missed'

File Descriptors

Utilization and errors

# System-wide
cat /proc/sys/fs/file-nr
# 12304    0    1048576
#   ^       ^      ^
#   allocated  unused   max
# If allocated approaches max, the system will refuse opens

# Per-process
ls /proc/$PID/fd | wc -l
grep 'Max open files' /proc/$PID/limits
# Max open files            1024                 4096
#                           ^soft                ^hard
# If count approaches soft, process will fail opens with EMFILE

Errors show up as EMFILE: Too many open files in application logs.

The Lesson-Length Triage Script

Here is a single shell script that runs the USE highlights. Save it, test it, commit it to your runbooks.

#!/bin/bash
# /usr/local/bin/quick-triage
set -eu

echo "=== Load, PSI ==="
uptime
echo
echo -n "cpu pressure: "; cat /proc/pressure/cpu | head -1
echo -n "mem pressure: "; cat /proc/pressure/memory | head -1
echo -n "io  pressure: "; cat /proc/pressure/io | head -1

echo
echo "=== CPU (per-cpu, one sample) ==="
mpstat -P ALL 1 1 | tail -n +4

echo
echo "=== Memory ==="
free -h
echo
awk '/^MemAvailable:/ {print "MemAvailable: " $2 " " $3}' /proc/meminfo
dmesg -T | tail -50 | grep -i 'oom\|killed process' | tail -5 || echo "No recent OOMs in dmesg"

echo
echo "=== Disks ==="
iostat -xz 1 2 | awk '/^Device/,/^$/'

echo
echo "=== Network ==="
sar -n DEV 1 1 | grep -v 'Average\|IFACE\|^$' | head
echo
sar -n ETCP 1 1 | grep -v 'Average\|^$' | head

echo
echo "=== Top 5 CPU hogs ==="
ps -eo pid,pcpu,comm --sort=-pcpu | head -6

echo
echo "=== Top 5 RSS consumers ==="
ps -eo pid,rss,comm --sort=-rss | head -6

echo
echo "=== File descriptors (system) ==="
cat /proc/sys/fs/file-nr

Running this gives you every USE signal in one screen.

PRO TIP

Put your version of this script on every production server and in every image. When the on-call starts, the first command is always quick-triage. Sixty seconds later you have Utilization, Saturation, and Errors for every resource. No dashboards, no delay, no ambiguity about what you are looking at.

Interpreting the Signals: Which Resource Is Guilty?

Symptom	Likely culprit	Confirming metric
High `r` in vmstat, low disk I/O	CPU saturation	mpstat per-CPU, PSI cpu
`%iowait` high, `%util` high on a disk	Disk saturation	iostat await + aqu-sz, PSI io
`si/so` > 0 in vmstat	Memory saturation (swapping)	/proc/pressure/memory, MemAvailable
"Out of memory" in dmesg	Memory errors (OOM)	cgroup memory.events
Retransmits rising in ETCP	Network saturation (peer) or packet loss	netstat -s, ss -s
Interface %ifutil > 80%	Network saturation (link)	sar -n DEV
EMFILE in application logs	FD exhaustion	/proc/sys/fs/file-nr, /proc/*/limits
CPU PSI "full > 0"	System-wide CPU starvation	mpstat -P ALL
IO PSI "full > 0"	All processes stuck on I/O	iotop, iostat per device

Beyond USE: When You Need More Detail

When USE shows CPU saturation but not which code:

# Snapshot — what functions the CPUs are running right now
sudo perf top
# Samples: 12K of event 'cycles'
# Overhead  Symbol
#   12.35%  [k] finish_task_switch
#    8.90%  [k] native_queued_spin_lock_slowpath
#    5.40%  [.] malloc
# ...

# 30-second profile + flame graph (requires FlameGraph scripts from brendangregg)
sudo perf record -F 99 -a -g -- sleep 30
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

When USE shows memory saturation but not which allocations:

# Per-process RSS/PSS over time
smem -tk -s pss
# Or from /proc
watch 'awk "/Pss:/ {sum+=\$2} END {print sum\" kB\"}" /proc/$PID/smaps'

# With BCC or bpftrace you can trace allocations (requires root and bcc-tools):
# sudo /usr/share/bcc/tools/memleak -p $PID -a -p

When USE shows I/O saturation but not which file:

# biolatency, biotop, biosnoop from BCC or bpftrace
sudo /usr/share/bcc/tools/biotop 5
# Tracing... Output every 5 secs. Hit Ctrl-C to end
# PID    COMM    D MAJ MIN DISK    I/O    Kbytes   AVGms
# 1234   postgres W 259 0 nvme0n1  500   64000    3.21

These advanced tools (perf, eBPF via BCC or bpftrace) are the second layer. USE tells you which resource is the bottleneck; these tools tell you which function or file in the hot code path.

Key Concepts Summary

USE = Utilization, Saturation, Errors. Apply it to every resource: CPU, memory, disk, network, file descriptors.
Saturation is the key signal. High utilization without saturation is just "doing work." High utilization with saturation is a bottleneck.
PSI is the modern saturation number. /proc/pressure/{cpu,memory,io} gives you normalized saturation per resource.
Every resource has specific commands: mpstat/vmstat for CPU, free + /proc/meminfo for memory, iostat -xz for disks, sar -n DEV / sar -n TCP for network, /proc/sys/fs/file-nr for file descriptors.
dmesg is the universal error log for kernel-level issues: OOM kills, disk errors, NIC errors, hardware faults.
A one-minute triage runs 6-8 commands and tells you which resource is the bottleneck. Build a script.
USE gets you to the resource; perf/eBPF get you to the code. Start with USE, escalate to deeper tools.

Common Mistakes

Looking at CPU graphs alone and declaring "CPU is fine." If saturation (run queue, PSI cpu) is high but per-CPU isn't pegged, the scheduler is the bottleneck.
Trusting %util on NVMe drives. On multi-queue devices it can read 100% with plenty of headroom. Use await and aqu-sz instead.
Interpreting %iowait as "disk load." It's CPU time idle while waiting on I/O, it depends on how much else the CPU has to do, so it is a very noisy signal.
Ignoring dmesg. It is where every kernel-level error surfaces: bad disks, bad NICs, bad memory, OOM kills, kernel panics. Make dmesg -T | tail part of every triage.
Monitoring free memory and alerting when it is low. Linux aggressively caches disk data in "free" memory, MemAvailable is the real pressure signal.
Not alerting on PSI metrics. They are the single best "is this machine healthy?" signals available.
Stopping at "high CPU usage" without per-process or per-thread attribution. pidstat 1 and top -H tell you which process or thread is the consumer.
Using one data point. USE needs two-sample minimum (vmstat's first output is averages since boot; use vmstat 1 5 and look at the later rows).
Forgetting to check file descriptor limits. "Mystery" request failures at moderate load are often EMFILE from a process hitting its soft rlimit.

KNOWLEDGE CHECK

Application p99 latency just doubled. On an affected host: `vmstat 1 5` shows `r` consistently at 20 (the host has 16 CPUs), CPU is ~80% busy, iostat shows `%util` low and disk `await` < 1ms, network is unsaturated, memory is 40% used with no swap. PSI CPU `some avg10=35`. Which resource is the bottleneck, and what is your evidence?

Putting It Together: A Container From Scratch

Continue

Network Debugging

←→ navigateM toggle sidebar