Linux Fundamentals for Engineers

Reading /proc for Debugging

It is 2 AM. A payment service in production is "stuck." CPU is flat, memory is flat, no errors in the logs, latency has climbed to 30 seconds per request, and the on-call engineer has already tried kubectl rollout restart. You are the next escalation. There is no APM. There is no strace installed in the container. There is no gdb. There is only a bash shell, the coreutils from busybox, and the running process.

This is the situation /proc was made for. Every running process on a Linux box has a directory under /proc containing its complete live state — what files it has open, what memory it has mapped, what syscall it is blocked in, what its environment variables are, what cgroups it belongs to, how many context switches it has done since it started. All of it is cat-able from a minimal shell. A senior engineer who knows /proc can diagnose a hung process in under five minutes with no special tools installed. This lesson is how.


The Shape of /proc

/proc is a pseudo-filesystem: its contents are generated on the fly by the kernel when you read them. It has two kinds of entries:

  • Per-process directories (/proc/[pid] and /proc/self) — one per running process.
  • System-wide files (/proc/cpuinfo, /proc/meminfo, /proc/mounts, and so on).
# Top-level structure
ls /proc | head -20
# 1          <- PID 1 (systemd)
# 2          <- PID 2 (kthreadd)
# 100        <- some other process
# cpuinfo
# meminfo
# mounts
# loadavg
# version
# sys        <- /proc/sys (kernel tunables)
# ...

# /proc/self is a magic symlink that points to your own pid
ls -l /proc/self
# lrwxrwxrwx 1 admin admin 0 Apr 19 10:00 /proc/self -> 45678

# So you can inspect yourself
cat /proc/self/status | head -5
# Name:   cat
# State:  R (running)
# Tgid:   45678
# Ngid:   0
# Pid:    45678

Everything under /proc/[pid] is either a regular-looking file (readable with cat) or a symlink to something real (an open file, a CWD, an executable).

KEY CONCEPT

Reading from /proc/[pid]/* is the fastest way to answer "what is this process actually doing right now?" — faster than restarting it, faster than attaching a debugger, faster than adding logs and redeploying. You will never have less information than /proc gives you, and most of the time you will have everything you need.


The Map of /proc/[pid]

Here is the full per-process directory, with the most useful entries annotated. The ones marked are the ones you will reach for constantly.

ls /proc/$PID/
# attr         cwd         io            mem        oom_adj       schedstat    stat      task
# autogroup    environ     ⭐ limits      ⭐ maps    oom_score     ⭐ smaps     statm     timers
# cgroup ⭐    exe ⭐       loginuid     mountinfo   pagemap       stack        ⭐ status  wchan ⭐
# clear_refs   ⭐ fd        ⭐ ns         net        personality   ⭐ syscall   ⭐ comm    ...
# cmdline ⭐   fdinfo       numa_maps    mounts      root          ⭐ sched     ⭐ cmdline
FileWhat it tells youTypical use
cmdlineCommand line args, NUL-separated"What was this process started with?"
commShort name (15 chars)Quick identification
statusEvery Name:, State:, Pid:, Uid:, Vm*:, Sig*:, Cpus_allowed:The all-in-one overview
statSame info as status, machine-parseable space-separatedScripts
cwdSymlink to current working directory"What dir is this cd'd into?"
exeSymlink to the binary"What is actually running — even if the binary was deleted?"
environEnvironment variables, NUL-separated"What env did it inherit?"
fd/Directory of open file descriptors (symlinks)Open files, sockets, pipes
fdinfo/One file per fd with position, flags, etc.Fine-grained fd state
mapsMemory regions (virtual address ranges)"What's in this process's address space?"
smapsPer-region RSS/PSS/Swap/etc.Real memory accounting
limitsrlimits (ulimit) currently in effect"Is this process limited to 1024 fds?"
cgroupWhich cgroups the process is inContainer and systemd membership
ns/Symlinks to namespace inodesWhich namespaces the process is in
wchanKernel function the process is blocked in"Why is it not running?"
syscallCurrent syscall number and args"What syscall is it stuck in?"
stackKernel stack trace (root only)"Where exactly in the kernel?"
schedScheduler statsContext switches, wait time
ioCumulative read/write bytes"Is this process hammering the disk?"
task/One subdirectory per threadPer-thread state of a multi-threaded process
memLive memory of the process (root only)Dump memory with gdb or custom tools
mountinfoEvery mount visible to this processDiffers per mount namespace — important for containers

Most of these are readable by the owning user; a few require root. You do not need to memorize the list — just remember that ls /proc/[pid] is your starting point and every file there is cat-able.


The Hung Process Playbook

Here is the exact sequence to run when a process is hung. Five minutes, no extra tools.

1. What is it, and when did it start?

# What is the process?
cat /proc/$PID/comm
# Payment-service

# Full command line
cat /proc/$PID/cmdline | tr '\0' ' '; echo
# /usr/bin/python3 /app/payment_service.py --config /etc/app/prod.yaml

# When did it start?
ls -ld /proc/$PID
# dr-xr-xr-x 9 app app 0 Apr 19 02:14 /proc/45678    <- started at 02:14
# (note: "0" size is normal for /proc)

# Or read it directly from stat (field 22 is start time in jiffies since boot)
awk '{print $22}' /proc/$PID/stat

2. What state is it in?

cat /proc/$PID/status | head -20
# Name:   python3
# State:  S (sleeping)                        <- not running — blocked somewhere
# Tgid:   45678
# Pid:    45678
# PPid:   1
# Uid:    1000    1000    1000    1000
# Gid:    1000    1000    1000    1000
# FDSize: 128
# Groups: 1000
# VmPeak:   1258432 kB
# VmSize:   1258432 kB
# VmRSS:     483128 kB
# Threads:      34                            <- multi-threaded
# SigQ:   0/31389
# SigPnd: 0000000000000000
# SigBlk: 0000000000000000
# SigIgn: 0000000000001000
# SigCgt: 0000000180004a07                    <- has handlers installed

States and what they mean:

  • R (running) — on a CPU or in the runqueue
  • S (sleeping) — waiting on something, wakeable by signals
  • D (disk sleep) — waiting on I/O, not killable
  • T (stopped) — received SIGSTOP or SIGTSTP
  • Z (zombie) — exited, parent has not reaped
  • I (idle) — kernel thread idling

3. If it is blocked, what on?

# What kernel function is it blocked in?
cat /proc/$PID/wchan
# futex_wait_queue_me                    <- waiting on a mutex/condvar
# do_epoll_wait                           <- idle event loop
# sk_wait_data                            <- waiting on socket data
# do_nanosleep                            <- sleeping

# What syscall is currently in progress?
cat /proc/$PID/syscall
# 202 0x7f1b... 0x80 0x0 0x0 0x0 ...
# ^^ syscall number. Look it up with ausyscall or the table
ausyscall 202
# futex

# So this process is stuck inside a futex call — almost always a lock
PRO TIP

wchan + syscall together tell you exactly why a process is not running. futex is a lock. epoll_wait is an event loop idling (fine!). read or recvfrom is waiting on I/O. nanosleep is voluntarily sleeping. For a "stuck" process, a wchan of futex_wait_queue_me means some other thread holds a lock and never released it — now you need to find that thread.

4. Look at every thread

# Show each thread's state and wchan
for tid in $(ls /proc/$PID/task); do
  state=$(awk '/^State:/{print $2$3}' /proc/$PID/task/$tid/status)
  wchan=$(cat /proc/$PID/task/$tid/wchan)
  echo "$tid  $state  $wchan"
done
# 45678  Ssleeping       futex_wait_queue_me
# 45679  Ssleeping       futex_wait_queue_me
# 45680  Rrunning        0
# 45681  Ssleeping       futex_wait_queue_me
# ...

# htop shows the same with 'H' to toggle thread view

If 33 threads are all stuck in futex and 1 thread is running hot, that one thread is holding the lock everyone else wants. That is the thread to profile.

5. What files and sockets does it have open?

# Every open file descriptor
ls -l /proc/$PID/fd | head -15
# lr-x------ 1 app app 64 Apr 19 02:14 0 -> /dev/null
# l-wx------ 1 app app 64 Apr 19 02:14 1 -> pipe:[91283]
# l-wx------ 1 app app 64 Apr 19 02:14 2 -> pipe:[91284]
# lrwx------ 1 app app 64 Apr 19 02:14 3 -> socket:[123456]
# lrwx------ 1 app app 64 Apr 19 02:14 4 -> anon_inode:[eventpoll]
# lr-x------ 1 app app 64 Apr 19 02:14 5 -> /etc/app/prod.yaml
# lrwx------ 1 app app 64 Apr 19 02:14 6 -> socket:[123460]
# ...

# How many fds open?
ls /proc/$PID/fd | wc -l
# 217

# Compare to the limit
grep 'Max open files' /proc/$PID/limits
# Max open files            1024                 4096
#                           ^ soft                ^ hard limit

# Which socket is fd 3?
sudo ss -p | grep "$PID,fd=3"
# tcp  ESTAB  0  0  10.0.1.5:50123  10.0.99.1:5432  users:(("python3",pid=45678,fd=3))
# So fd 3 is a TCP connection to 10.0.99.1:5432 — the database
WAR STORY

A service mysteriously hung after running for exactly 7 days. /proc/[pid]/fd showed 1024 open file descriptors — exactly the soft rlimit. Every fd pointed at a socket to the same external API. A retry loop was leaking one TCP connection per network hiccup and never closing them. No error message, no log line — the process just silently stopped accepting work once accept() started returning EMFILE. ls /proc/[pid]/fd | wc -l and grep 'Max open files' /proc/[pid]/limits took 10 seconds to find. The fix was a with block around the HTTP client. Since then, every production service gets an alert when its open fd count crosses 50% of its limit.

6. What is in memory?

# High-level layout
cat /proc/$PID/maps | head -10
# 556a... r--p ... /usr/bin/python3.11         <- text
# 556a... r-xp ... /usr/bin/python3.11
# 556a... rw-p ... [heap]                      <- where malloc grows
# 7f... rw-p ... [stack:45679]                 <- thread 45679's stack
# 7f... r-xp ... /usr/lib/.../libssl.so.3
# 7f... rw-p ...                               <- anonymous, probably malloc/mmap
# ...

# Detailed per-region memory accounting
head -20 /proc/$PID/smaps
# 556a... r--p ... /usr/bin/python3.11
# Size:                  4 kB
# Rss:                   4 kB                  <- pages in RAM
# Pss:                   1 kB                  <- proportional (shared/4)
# Shared_Clean:          4 kB
# Shared_Dirty:          0 kB
# Private_Clean:         0 kB
# Private_Dirty:         0 kB
# Referenced:            4 kB
# Anonymous:             0 kB
# ...

# Total PSS across the whole process
awk '/^Pss:/ {sum+=$2} END {print sum" kB"}' /proc/$PID/smaps
# 512340 kB                                    <- actual memory cost

# Detecting swap usage
awk '/^Swap:/ {sum+=$2} END {print sum" kB"}' /proc/$PID/smaps

7. I/O and scheduler stats

# Cumulative I/O
cat /proc/$PID/io
# rchar: 458092340              <- bytes read (including from page cache)
# wchar: 891234                 <- bytes written
# syscr: 5821                   <- read/write syscall counts
# syscw: 203
# read_bytes: 16384             <- actual bytes from disk
# write_bytes: 0
# cancelled_write_bytes: 0

# Scheduler stats
head -20 /proc/$PID/sched
# python3 (45678, #threads: 34)
# ...
# se.sum_exec_runtime              :   1842.193421
# se.nr_migrations                 :          2
# nr_voluntary_switches            :       2847  <- yielded voluntarily (I/O)
# nr_involuntary_switches          :        193  <- preempted by scheduler

8. Which cgroup, which namespaces?

# cgroup membership — on cgroup v2 it's one line
cat /proc/$PID/cgroup
# 0::/system.slice/docker-abc123.scope
# (on v1 you get one line per subsystem: memory, cpu, etc.)

# Namespace membership — each is a symlink with a unique inode
ls -l /proc/$PID/ns/
# lrwxrwxrwx 1 app app 0 ... cgroup -> 'cgroup:[4026531835]'
# lrwxrwxrwx 1 app app 0 ... ipc    -> 'ipc:[4026532152]'
# lrwxrwxrwx 1 app app 0 ... mnt    -> 'mnt:[4026532150]'
# lrwxrwxrwx 1 app app 0 ... net    -> 'net:[4026532155]'
# lrwxrwxrwx 1 app app 0 ... pid    -> 'pid:[4026532153]'
# lrwxrwxrwx 1 app app 0 ... user   -> 'user:[4026531837]'
# lrwxrwxrwx 1 app app 0 ... uts    -> 'uts:[4026532151]'

# Two processes with the same inode for pid:[...] are in the same PID namespace
# (meaning, usually, the same container)

This is a huge deal: if you see two processes with pid:[4026532153], they are the same container. Different inode means different namespace means different container (or the host).

We cover namespaces in depth in Module 5; for now, just know /proc/[pid]/ns/ is where you check.


System-Wide /proc Files You Will Use

Not under a PID, but equally useful for production debugging.

# CPU info — useful for sizing, NUMA, and feature detection
cat /proc/cpuinfo | grep -E '^(processor|model name|cache size|cpu MHz)' | head

# Memory — used/free/cached/buffers at a glance
cat /proc/meminfo | head -10
# MemTotal:       32893400 kB
# MemFree:         1823440 kB
# MemAvailable:   18435212 kB
# Buffers:          120456 kB
# Cached:         14280432 kB
# ...

# Load average and process counts
cat /proc/loadavg
# 0.52 0.68 0.71 2/485 48923
#                  ^  ^  ^
#                  |  |  +-- last PID created
#                  |  +-- total processes
#                  +-- currently runnable

# All currently mounted filesystems (from the kernel's point of view)
cat /proc/mounts | head -5

# Kernel version and build info
cat /proc/version

# Interrupts per CPU
cat /proc/interrupts | head

# TCP stats — RCV buffers, active connections, etc.
cat /proc/net/tcp | head
PRO TIP

/proc/sys/ is the control surface for kernel tunables — it is both readable (current value) and writable (change it). echo 1 > /proc/sys/net/ipv4/ip_forward turns on routing. cat /proc/sys/kernel/pid_max tells you the max PID. Almost every "tune the kernel" recipe you will find on the internet is really "write a value into a file under /proc/sys". The sysctl command is a nicer frontend for the same files.


Recovering a Deleted Binary or Log

A classic /proc trick: if a process has a file open and someone deletes the file on disk, you can still recover the content through /proc/[pid]/fd.

# Imagine: nginx is running, someone ran "rm /var/log/nginx/access.log"
# Nginx still holds fd 5 open pointing at the deleted file
ls -l /proc/$(pgrep -f 'nginx: master')/fd | grep deleted
# l-wx------ 1 root root 64 Apr 19 10:00 5 -> /var/log/nginx/access.log (deleted)

# Recover the content while nginx is still running
cp /proc/$(pgrep -f 'nginx: master')/fd/5 /tmp/recovered-access.log
# Now /tmp/recovered-access.log has everything nginx has written so far

# Same trick for a deleted binary
cp /proc/$PID/exe /tmp/recovered-binary

This is the single most important /proc trick nobody mentions. Worth filing away.


Key Concepts Summary

  • /proc is the kernel's live UI. Reading a file under /proc runs kernel code that generates fresh state — nothing is cached on disk.
  • Every running process has a /proc/[pid] directory. /proc/self is a magic symlink to your own.
  • The "debug a hung process" checklist is cat-only. cmdlinestatuswchansyscallfd/limitsmaps/smaps.
  • Process state decodes the behavior. R/S/D/T/Z each mean something specific; D is the only one that cannot be killed.
  • /proc/[pid]/task/ lists every thread. Per-thread wchan and status tell you which thread is holding the lock.
  • /proc/[pid]/fd plus ss -p tells you every network connection. lsof is a nicer interface over the same data.
  • /proc/[pid]/cgroup and /proc/[pid]/ns/ tell you which container and which namespaces the process belongs to.
  • /proc/[pid]/fd/[N] can recover deleted files as long as the process is still holding the fd open.
  • System-wide files (cpuinfo, meminfo, loadavg, mounts, sys/) cover the rest of the host.

Common Mistakes

  • Looking at VmSize and panicking — that is virtual address space, most of which may never be touched. VmRSS is what is in RAM.
  • Reading /proc/[pid]/status as "the process's total state" and ignoring /proc/[pid]/task/*/status. A multi-threaded process's interesting state is usually per-thread.
  • Treating /proc/[pid]/stat as user-friendly. It is not — it is machine-parseable. For humans, /proc/[pid]/status has labels.
  • Writing to /proc/sys/ files without knowing whether the change persists. It does not — reboot loses it. Use /etc/sysctl.conf or /etc/sysctl.d/*.conf for persistence.
  • Forgetting that /proc/[pid] disappears instantly when the process exits — scripts reading it must handle "directory vanished" as a normal case.
  • Using ps (which reads /proc under the hood) and then ignoring /proc itself when ps does not show what you need. Half the columns ps could show are just there in status waiting for you.
  • Assuming /proc/[pid]/cmdline is always populated. For kernel threads and some short-lived helpers it is empty; fall back to comm.
  • Reading /proc/mounts to see "my mounts" from inside a container — you will see the container's mount namespace view, which is often different from the host's. That is a feature, not a bug.

KNOWLEDGE CHECK

A Python service is hung. You cat /proc/$PID/wchan and see `futex_wait_queue_me`. You check each thread under /proc/$PID/task/*/wchan and find 33 threads in `futex_wait_queue_me` and exactly one thread with `wchan` value `0` (running). What is the most likely situation and what is your next step?