Network Debugging
A new Kubernetes service works in staging and fails in production with intermittent connection timeouts — about 2% of requests. APM traces show the service is the client: some outbound HTTP calls complete in 50 ms, others hang for 30 seconds before timing out. Nothing in the application logs. Nothing in the service mesh metrics. Infra team says "network is fine." You ssh into a pod, run
ss -sand seetime wait 62000. The socket pool is exhausted; new outbound connections are waiting for TIME_WAIT to drain. Ten minutes of network-level debugging found what ten hours of application-level debugging never would."The network" as a debugging surface is huge. Linux gives you more than a dozen tools for it, and a senior engineer knows which one to reach for in which situation. This lesson is the actual toolkit —
ip,ss,nstat,tcpdump,conntrack,mtr, and a few others — organized by the class of problem each one diagnoses. If you made it through Module 3 of the Networking Fundamentals course, this is the implementation side: the commands you run, the output you look at, and how to connect them to the OSI-layer debugging mindset.
The Layered Approach (30-Second Recap)
From Networking Fundamentals, the bottom-up approach: check Layer 1/2 → Layer 3 → Layer 4 → Layer 7, stop at the first layer that is broken. On Linux, each layer has its own command set:
| Layer | Question | Commands |
|---|---|---|
| 1/2 | Is the interface up? | ip link, ethtool, ip -s link |
| 3 | Can I reach the IP? | ping, ip route, traceroute, mtr |
| 4 | Can I connect to the port? | ss -tlnp, nc -zv, curl -v |
| 7 | Does the service respond correctly? | curl, dig, openssl s_client, tcpdump |
This lesson focuses on the commands and their less-obvious uses.
ip — Interfaces and Routing
ip is the modern replacement for ifconfig and route. One command, many sub-commands.
# Interfaces and their state
ip link
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
# 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
# ^^^^^^^^^^^ no carrier means no cable/peer
# Addresses (IPv4 and IPv6)
ip addr # or `ip a`
ip -4 addr # IPv4 only
ip -br addr # one-line per interface (great for scripting)
# eth0 UP 10.0.1.23/24 fe80::...
# docker0 DOWN 172.17.0.1/16
# Routes
ip route # or `ip r`
# default via 10.0.1.1 dev eth0 proto dhcp src 10.0.1.23 metric 100
# 10.0.1.0/24 dev eth0 proto kernel scope link src 10.0.1.23 metric 100
# 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
# Which route would be used to reach a specific address?
ip route get 1.1.1.1
# 1.1.1.1 via 10.0.1.1 dev eth0 src 10.0.1.23 uid 1000
# ARP table — IP-to-MAC on the local segment
ip neigh
# 10.0.1.1 dev eth0 lladdr aa:bb:cc:dd:ee:ff REACHABLE
# 10.0.1.52 dev eth0 lladdr ... STALE
# ^^^^^ if many are STALE/INCOMPLETE, ARP trouble
# Statistics including errors and drops
ip -s link show eth0
# 2: eth0: ...
# RX: bytes packets errors dropped overrun mcast
# 5031234567 123456 0 0 0 234
# TX: bytes packets errors dropped carrier collsns
# 8910234567 234567 0 0 0 0
ip route get DEST is the shortest answer to "which interface and gateway will Linux use to reach X?" It is deterministic — it runs the same lookup the kernel runs on every packet — so when routing looks wrong, this tells you what the kernel actually decided.
When a link is down
# Everything about an interface
ethtool eth0
# Settings for eth0:
# Supported ports: [ FIBRE ]
# Speed: 10000Mb/s
# Duplex: Full
# Link detected: yes
# ^^^ Layer 1 state
# Driver info
ethtool -i eth0
# driver: virtio_net
# version: 1.0.0
# firmware-version:
# NIC-level counters (the real story)
ethtool -S eth0 | grep -iE 'drop|err|overrun|miss' | grep -v ': 0$'
# rx_dropped: 12
# rx_missed_errors: 48
# tx_carrier_errors: 0
ss — The Socket Statistics Swiss Army Knife
ss replaces the old netstat. It reads sockets directly from the kernel via netlink and is much faster on busy hosts.
# Every TCP socket, with listening ports and the process owning each
ss -tanlp
# State Recv-Q Send-Q Local:Port Peer:Port Users
# LISTEN 0 128 *:22 *:* users:(("sshd",pid=1234,fd=3))
# ESTAB 0 0 10.0.1.23:50124 10.0.1.50:443 users:(("curl",pid=5678,fd=3))
# TIME-WAIT 0 0 10.0.1.23:50125 10.0.1.50:443
# ...
# Only listening sockets (what am I serving?)
ss -tlnp
# Only established (who am I currently talking to?)
ss -tnp state established
# Only TIME-WAIT (any socket exhaustion?)
ss -tnp state time-wait
# UDP (-u instead of -t)
ss -unlp
# Unix sockets
ss -xp
# Aggregated summary
ss -s
# Total: 182
# TCP: 145 (estab 32, closed 98, orphaned 0, timewait 62000)
# ^^^^^^^^^^^^^ 62000 TIME-WAIT = trouble
# Transport Total IP IPv6
# RAW 0 0 0
# UDP 5 2 3
# TCP 48 42 6
# INET 53 44 9
# FRAG 0 0 0
Common filtering
# Who is connected to port 5432 (postgres)?
ss -tnp '( dport = :5432 or sport = :5432 )'
# Any connections in problematic states?
ss -tn state close-wait # peer closed, we didn't — application bug
ss -tn state last-ack # we closed, waiting for final ack
ss -tn state syn-sent # stuck SYN = firewall or unreachable
# Very useful: all sockets held by a specific PID
ss -tnp | grep "pid=$PID"
# Top 10 hosts by number of connections from this box
ss -tn | awk 'NR>1 {print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head
Connection states, what they mean, and what to do
| State | Meaning | Action |
|---|---|---|
LISTEN | Listening for new connections | Normal for servers |
ESTAB | Active connection, both sides sending | Normal |
SYN-SENT | Outbound SYN sent, no reply yet | If many: firewall or peer down |
SYN-RECV | Inbound SYN received, SYN-ACK sent, no ACK | If many: SYN flood or slow client |
FIN-WAIT1, FIN-WAIT2 | Our side closing | Large counts: peer slow to close |
CLOSE-WAIT | Peer closed, we haven't | Almost always an application bug |
LAST-ACK | We closed, waiting final ACK | Normal, brief |
TIME-WAIT | Closed, keeping 4-tuple reserved to catch duplicate packets | Many expected; trouble if exhausting ports |
A growing number of CLOSE-WAIT connections is always a bug in your application, not a network problem. The remote peer closed the socket; your app has not called close() on it. The socket will never go away on its own. Find the process with ss -tp state close-wait, then fix the code to close sockets after use (defer, with-block, try/finally — every language has its version).
Debugging Connectivity End to End
Layer 3 tools
# Basic reachability
ping -c 3 -W 2 10.0.1.50
# Path trace
traceroute 10.0.1.50
# or the much better variant
mtr -rn 10.0.1.50 # continuous with loss % per hop
# 1. 10.0.1.1 0.0% loss 0.3 ms
# 2. 10.255.0.1 0.0% loss 1.2 ms
# 3. 10.1.2.3 30.0% loss 15.4 ms <- packet loss starts here
# 4. ...
# Is the gateway even reachable at L2?
arping -c 3 -I eth0 10.0.1.1
Layer 4 tools
# Can I open a TCP connection?
nc -zv 10.0.1.50 443
# Connection to 10.0.1.50 443 port [tcp/https] succeeded!
# With timeout
nc -zv -w 2 10.0.1.50 443
# Actually do something on the port — test a protocol
printf 'GET / HTTP/1.0\r\nHost: example.com\r\n\r\n' | nc -w 5 example.com 80
# Alternative: bash's /dev/tcp (covered in Module 1 Lesson 2)
exec 3<>/dev/tcp/example.com/80
printf 'GET / HTTP/1.0\r\nHost: example.com\r\n\r\n' >&3
cat <&3
Layer 7 tools
# HTTP
curl -v --max-time 10 https://api.example.com/health
# Shows: DNS resolution, TCP connect, TLS handshake, HTTP request, HTTP response
# If anything goes wrong, the line right before failure tells you which layer
# Useful curl flags for debugging
curl -v -o /dev/null -s -w '%{http_code} %{time_total}s %{time_connect}s %{time_starttransfer}s\n' https://api.example.com/health
# 200 0.245s 0.020s 0.180s
# (total) (tcp) (first byte)
# DNS
dig example.com
dig +short @8.8.8.8 example.com
dig +trace example.com # full recursive resolution
dig -x 10.0.1.1 # reverse DNS
# TLS
openssl s_client -connect example.com:443 -servername example.com </dev/null 2>/dev/null \
| openssl x509 -noout -dates -subject -issuer
curl -w is magical and underused. -w '%{time_namelookup} %{time_connect} %{time_appconnect} %{time_starttransfer} %{time_total}\n' prints the elapsed seconds for each phase of the request. If you see time_namelookup: 5.0 in a 5-second request, DNS is your problem; if time_appconnect (TLS handshake) dominates, it's TLS; and so on. One invocation, whole-request breakdown.
tcpdump — Packet Capture
When you need to see the bytes on the wire.
# Capture 10 packets on eth0 to/from 10.0.1.50
sudo tcpdump -n -c 10 -i eth0 host 10.0.1.50
# Only TCP traffic on port 443
sudo tcpdump -n -i eth0 'tcp and port 443'
# Host and port combined
sudo tcpdump -n -i eth0 'host 10.0.1.50 and port 5432'
# Only SYN packets (helpful for debugging connection problems)
sudo tcpdump -n -i eth0 'tcp[tcpflags] & tcp-syn != 0'
# Traffic on a non-default port, printing payload (useful for plain-text protocols)
sudo tcpdump -n -i eth0 -A 'port 8080'
# Write to a pcap file for later analysis (open in Wireshark)
sudo tcpdump -n -i eth0 -w /tmp/trace.pcap 'port 5432'
# Then: wireshark /tmp/trace.pcap on your workstation
# Read a pcap back
sudo tcpdump -r /tmp/trace.pcap -n
Common BPF filter syntax:
host 1.2.3.4orsrc host,dst hostnet 10.0.0.0/8port 80,src port,dst porttcp,udp,icmp- Boolean combinations:
and,or,not, parentheses
When not to use tcpdump
If you just want to see which connections exist, use ss. tcpdump is for seeing the actual protocol interaction — SYN/ACK timing, retransmits, payload content, what a middlebox rewrote. Do not use it as a connection list — it is overkill and misses idle connections entirely.
A Kubernetes service mesh installation started dropping ~10% of requests between pods. The mesh dashboard showed elevated 5xx but no root cause. A 30-second tcpdump on a node with -w to a pcap, opened in Wireshark, showed the pattern: the client sent HTTP/2 SETTINGS frames that never got ACKed by the sidecar, leading to connection resets after 15 seconds. The sidecar binary was the wrong minor version and did not handle a specific SETTINGS option. Without packet capture, the root cause would have been invisible — application-layer metrics cannot see what happens during the TCP/HTTP handshake. Packet capture is slow but tells the truth.
nstat and /proc/net/snmp — Kernel Networking Counters
Every major kernel networking event is counted. netstat -s shows them, but nstat is more modern and compact.
# Diff two points in time (how many of each event in the last second)
nstat -c 1 '*'
# Shows changes vs the last call — great for "what is spiking right now"
# Look for specific counters
nstat -z TcpExtTCPRetrans* # retransmit counters
nstat -z Tcp.RetransSegs
# Classic summary
netstat -s | head -30
# Tcp:
# 1234 active connections openings
# 5678 passive connection openings
# 12 failed connection attempts
# 3 connection resets received
# 150 connections established
# 42000 segments received
# 38500 segments sent out
# 123 segments retransmitted
# 4 bad segments received
# 45 resets sent
Key counters to watch:
Tcp.RetransSegs— TCP retransmits. Climbing = packet loss somewhere.TcpExt.ListenOverflows/TcpExt.ListenDrops— accept queue full. Raisenet.core.somaxconn/backlog.TcpExt.TCPTimeWaitOverflow— ran out of TIME-WAIT slots. Tunetcp_max_tw_buckets.UdpSndbufErrors/UdpRcvbufErrors— socket buffer overrun. Raisenet.core.rmem_max/wmem_max.
conntrack — Stateful Firewall Table
Linux's NAT/stateful firewall keeps a connection tracking table. When it fills, new connections get dropped silently — a catastrophic failure mode on busy NAT gateways.
# Size and usage
sudo conntrack -C
# 85432 <- current count
cat /proc/sys/net/nf_conntrack_max
# 262144 <- max
# If current approaches max, you are about to drop connections
# Current connections by protocol
sudo conntrack -L -o extended | awk '{print $1}' | sort | uniq -c
# Look at specific connections
sudo conntrack -L -p tcp --dport 443 | head
# Raise the limit (not for the faint of heart — also raises table memory)
echo 524288 | sudo tee /proc/sys/net/nf_conntrack_max
# Reduce TIME-WAIT timeout if you are exhausting conntrack on shortlived connections
echo 30 | sudo tee /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait
On systems doing a lot of short-lived connections behind NAT (load balancers, NAT gateways, k8s nodes with many Services), conntrack is often the silent culprit.
DNS-Specific Debugging
DNS is one of the top sources of "flaky network" reports. Always check it explicitly.
# Resolve using system resolver
getent hosts example.com
# 93.184.216.34 example.com
# Check /etc/resolv.conf in this mount namespace (differs in containers!)
cat /etc/resolv.conf
# nameserver 8.8.8.8
# options ndots:5
# Raw DNS with a specific server
dig @1.1.1.1 example.com A
# Recursive trace — see every step
dig +trace example.com
# Measure resolution time
dig +stats example.com | grep 'Query time'
# ;; Query time: 23 msec
# On musl-based systems (Alpine) there is no /etc/resolver lookup caching,
# and resolver behavior differs subtly from glibc. `getent` vs `dig` results
# diverging is sometimes glibc-vs-musl NSS differences, not DNS.
In Kubernetes, DNS problems usually come from:
ndots:5(default) causing many redundant lookups for external domains.- CoreDNS pod count or caching too low.
- Policy blocking pod-to-CoreDNS traffic.
A Real Triage Flow
Here is how you apply this toolkit to "something is broken."
- Who is the caller? Who is the callee? (Often the caller is yours, the callee is somewhere you do not control.)
- Same pod / same node / same DC / different DC? Scope matters.
- Layer check, bottom-up (from Networking Fundamentals):
ip link/ethtool— Layer 1/2ping/ip route— Layer 3nc -zv/ss -tan— Layer 4curl -v/dig— Layer 7
- If Layer 4 is flaky, check:
ss -sfor TIME-WAIT / CLOSE-WAIT countsnstat Tcp*for retransmits, overflowsconntrack -Cvsnf_conntrack_max
- If Layer 7 is flaky,
curl -v -w <timing>andtcpdump -w trace.pcapwhile reproducing. - Always sanity-check DNS:
getent hosts,dig, and look at/etc/resolv.confin the right mount namespace.
Key Concepts Summary
ipreplacesifconfigandroute.ip link,ip addr,ip route,ip neighare the everyday sub-commands.ss -tanlpis the modern netstat. State, ports, processes, all from the kernel via netlink.ss -sis the connection-state summary. TIME-WAIT or CLOSE-WAIT counts reveal socket exhaustion or app bugs.- Connection states are diagnostic. CLOSE-WAIT = your app bug. Many SYN-SENT = firewall / peer down. Many TIME-WAIT = healthy but watch port exhaustion on the source side.
tcpdumpfor the bytes on the wire. Use BPF filters to focus on one conversation; write to.pcapfor Wireshark.nstatandnetstat -sexpose kernel counters. Retransmits, listen overflows, buffer overruns — the invisible failures.conntrack -Cvsnf_conntrack_max. The silent killer on NAT-heavy hosts.ethtool -Sfor NIC counters. Drops, errors, overruns, CRC errors — the hardware story.mtrfor path debugging. Continuous traceroute with loss percentages beats single-shottraceroute.- Always check DNS explicitly.
getent,dig, and/etc/resolv.confin the right namespace.
Common Mistakes
- Running
pingand declaring the network "fine" or "broken." Ping uses ICMP; many firewalls block it while TCP is fine. Always test with the actual protocol. - Using
netstatin a tight loop on a busy server.ssis 10–100× faster because it reads via netlink instead of scanning/proc/net. - Ignoring
Send-Q/Recv-Qcolumns inss. Non-zeroSend-Qmeans data is queued for transmission — often indicates a slow receiver. Non-zeroRecv-Qon a listening socket means the accept queue is backing up. - Debugging container networking on the host. You are in the wrong network namespace. Use
nsenter -t $PID -n TOOL. - Assuming conntrack does not apply to your system. If iptables/nftables has any stateful rules, or your host does NAT, conntrack matters and can silently drop connections when full.
- Confusing
tcpdump's "packets captured" with "packets received." The kernel can drop packets before tcpdump sees them if you are too slow; look at thedropped by kernelnumber. - Running
tcpdumpon a busy interface without a filter. You will miss packets and saturate the CPU. - Ignoring PMTU (path MTU) issues. A PMTU mismatch causes hangs on large responses (small requests work). Symptom: TLS handshake completes, first HTTP header works, response body never arrives.
- Not checking
/etc/resolv.confinside the container. Mount namespace means the container may see a different resolver config than the host — many "DNS is broken in prod" incidents are this.
A Kubernetes pod running a Python microservice shows a steadily growing number of connections in `ss -s` output — specifically, `close-wait` is now at 4000 and climbing. Memory usage is also climbing. What is happening, and where do you fix it?