Linux Fundamentals for Engineers

Network Debugging

A new Kubernetes service works in staging and fails in production with intermittent connection timeouts — about 2% of requests. APM traces show the service is the client: some outbound HTTP calls complete in 50 ms, others hang for 30 seconds before timing out. Nothing in the application logs. Nothing in the service mesh metrics. Infra team says "network is fine." You ssh into a pod, run ss -s and see time wait 62000. The socket pool is exhausted; new outbound connections are waiting for TIME_WAIT to drain. Ten minutes of network-level debugging found what ten hours of application-level debugging never would.

"The network" as a debugging surface is huge. Linux gives you more than a dozen tools for it, and a senior engineer knows which one to reach for in which situation. This lesson is the actual toolkit — ip, ss, nstat, tcpdump, conntrack, mtr, and a few others — organized by the class of problem each one diagnoses. If you made it through Module 3 of the Networking Fundamentals course, this is the implementation side: the commands you run, the output you look at, and how to connect them to the OSI-layer debugging mindset.


The Layered Approach (30-Second Recap)

From Networking Fundamentals, the bottom-up approach: check Layer 1/2 → Layer 3 → Layer 4 → Layer 7, stop at the first layer that is broken. On Linux, each layer has its own command set:

LayerQuestionCommands
1/2Is the interface up?ip link, ethtool, ip -s link
3Can I reach the IP?ping, ip route, traceroute, mtr
4Can I connect to the port?ss -tlnp, nc -zv, curl -v
7Does the service respond correctly?curl, dig, openssl s_client, tcpdump

This lesson focuses on the commands and their less-obvious uses.


ip — Interfaces and Routing

ip is the modern replacement for ifconfig and route. One command, many sub-commands.

# Interfaces and their state
ip link
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
# 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
#                                     ^^^^^^^^^^^ no carrier means no cable/peer

# Addresses (IPv4 and IPv6)
ip addr                            # or `ip a`
ip -4 addr                         # IPv4 only
ip -br addr                        # one-line per interface (great for scripting)
# eth0    UP     10.0.1.23/24 fe80::...
# docker0 DOWN   172.17.0.1/16

# Routes
ip route                           # or `ip r`
# default via 10.0.1.1 dev eth0 proto dhcp src 10.0.1.23 metric 100
# 10.0.1.0/24 dev eth0 proto kernel scope link src 10.0.1.23 metric 100
# 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

# Which route would be used to reach a specific address?
ip route get 1.1.1.1
# 1.1.1.1 via 10.0.1.1 dev eth0 src 10.0.1.23 uid 1000

# ARP table — IP-to-MAC on the local segment
ip neigh
# 10.0.1.1 dev eth0 lladdr aa:bb:cc:dd:ee:ff REACHABLE
# 10.0.1.52 dev eth0 lladdr ... STALE
#                              ^^^^^ if many are STALE/INCOMPLETE, ARP trouble

# Statistics including errors and drops
ip -s link show eth0
# 2: eth0: ...
#     RX: bytes  packets  errors  dropped  overrun mcast
#      5031234567 123456     0      0        0    234
#     TX: bytes  packets  errors  dropped  carrier collsns
#      8910234567 234567     0      0        0      0
PRO TIP

ip route get DEST is the shortest answer to "which interface and gateway will Linux use to reach X?" It is deterministic — it runs the same lookup the kernel runs on every packet — so when routing looks wrong, this tells you what the kernel actually decided.

# Everything about an interface
ethtool eth0
# Settings for eth0:
#         Supported ports: [ FIBRE ]
#         Speed: 10000Mb/s
#         Duplex: Full
#         Link detected: yes
#                        ^^^ Layer 1 state

# Driver info
ethtool -i eth0
# driver: virtio_net
# version: 1.0.0
# firmware-version:

# NIC-level counters (the real story)
ethtool -S eth0 | grep -iE 'drop|err|overrun|miss' | grep -v ': 0$'
# rx_dropped: 12
# rx_missed_errors: 48
# tx_carrier_errors: 0

ss — The Socket Statistics Swiss Army Knife

ss replaces the old netstat. It reads sockets directly from the kernel via netlink and is much faster on busy hosts.

# Every TCP socket, with listening ports and the process owning each
ss -tanlp
# State  Recv-Q Send-Q  Local:Port           Peer:Port              Users
# LISTEN 0      128     *:22                 *:*                     users:(("sshd",pid=1234,fd=3))
# ESTAB  0      0       10.0.1.23:50124      10.0.1.50:443           users:(("curl",pid=5678,fd=3))
# TIME-WAIT 0  0        10.0.1.23:50125      10.0.1.50:443
# ...

# Only listening sockets (what am I serving?)
ss -tlnp
# Only established (who am I currently talking to?)
ss -tnp state established
# Only TIME-WAIT (any socket exhaustion?)
ss -tnp state time-wait

# UDP (-u instead of -t)
ss -unlp

# Unix sockets
ss -xp

# Aggregated summary
ss -s
# Total: 182
# TCP:   145 (estab 32, closed 98, orphaned 0, timewait 62000)
#                                             ^^^^^^^^^^^^^ 62000 TIME-WAIT = trouble
# Transport Total     IP        IPv6
# RAW       0         0         0
# UDP       5         2         3
# TCP       48        42        6
# INET      53        44        9
# FRAG      0         0         0

Common filtering

# Who is connected to port 5432 (postgres)?
ss -tnp '( dport = :5432 or sport = :5432 )'

# Any connections in problematic states?
ss -tn state close-wait      # peer closed, we didn't — application bug
ss -tn state last-ack         # we closed, waiting for final ack
ss -tn state syn-sent         # stuck SYN = firewall or unreachable

# Very useful: all sockets held by a specific PID
ss -tnp | grep "pid=$PID"

# Top 10 hosts by number of connections from this box
ss -tn | awk 'NR>1 {print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

Connection states, what they mean, and what to do

StateMeaningAction
LISTENListening for new connectionsNormal for servers
ESTABActive connection, both sides sendingNormal
SYN-SENTOutbound SYN sent, no reply yetIf many: firewall or peer down
SYN-RECVInbound SYN received, SYN-ACK sent, no ACKIf many: SYN flood or slow client
FIN-WAIT1, FIN-WAIT2Our side closingLarge counts: peer slow to close
CLOSE-WAITPeer closed, we haven'tAlmost always an application bug
LAST-ACKWe closed, waiting final ACKNormal, brief
TIME-WAITClosed, keeping 4-tuple reserved to catch duplicate packetsMany expected; trouble if exhausting ports
WARNING

A growing number of CLOSE-WAIT connections is always a bug in your application, not a network problem. The remote peer closed the socket; your app has not called close() on it. The socket will never go away on its own. Find the process with ss -tp state close-wait, then fix the code to close sockets after use (defer, with-block, try/finally — every language has its version).


Debugging Connectivity End to End

Layer 3 tools

# Basic reachability
ping -c 3 -W 2 10.0.1.50

# Path trace
traceroute 10.0.1.50
# or the much better variant
mtr -rn 10.0.1.50        # continuous with loss % per hop
# 1. 10.0.1.1       0.0% loss  0.3 ms
# 2. 10.255.0.1     0.0% loss  1.2 ms
# 3. 10.1.2.3      30.0% loss  15.4 ms   <- packet loss starts here
# 4. ...

# Is the gateway even reachable at L2?
arping -c 3 -I eth0 10.0.1.1

Layer 4 tools

# Can I open a TCP connection?
nc -zv 10.0.1.50 443
# Connection to 10.0.1.50 443 port [tcp/https] succeeded!

# With timeout
nc -zv -w 2 10.0.1.50 443

# Actually do something on the port — test a protocol
printf 'GET / HTTP/1.0\r\nHost: example.com\r\n\r\n' | nc -w 5 example.com 80

# Alternative: bash's /dev/tcp (covered in Module 1 Lesson 2)
exec 3<>/dev/tcp/example.com/80
printf 'GET / HTTP/1.0\r\nHost: example.com\r\n\r\n' >&3
cat <&3

Layer 7 tools

# HTTP
curl -v --max-time 10 https://api.example.com/health
# Shows: DNS resolution, TCP connect, TLS handshake, HTTP request, HTTP response
# If anything goes wrong, the line right before failure tells you which layer

# Useful curl flags for debugging
curl -v -o /dev/null -s -w '%{http_code}  %{time_total}s  %{time_connect}s  %{time_starttransfer}s\n' https://api.example.com/health
# 200  0.245s  0.020s  0.180s
# (total)     (tcp)    (first byte)

# DNS
dig example.com
dig +short @8.8.8.8 example.com
dig +trace example.com        # full recursive resolution
dig -x 10.0.1.1               # reverse DNS

# TLS
openssl s_client -connect example.com:443 -servername example.com </dev/null 2>/dev/null \
  | openssl x509 -noout -dates -subject -issuer
PRO TIP

curl -w is magical and underused. -w '%{time_namelookup} %{time_connect} %{time_appconnect} %{time_starttransfer} %{time_total}\n' prints the elapsed seconds for each phase of the request. If you see time_namelookup: 5.0 in a 5-second request, DNS is your problem; if time_appconnect (TLS handshake) dominates, it's TLS; and so on. One invocation, whole-request breakdown.


tcpdump — Packet Capture

When you need to see the bytes on the wire.

# Capture 10 packets on eth0 to/from 10.0.1.50
sudo tcpdump -n -c 10 -i eth0 host 10.0.1.50

# Only TCP traffic on port 443
sudo tcpdump -n -i eth0 'tcp and port 443'

# Host and port combined
sudo tcpdump -n -i eth0 'host 10.0.1.50 and port 5432'

# Only SYN packets (helpful for debugging connection problems)
sudo tcpdump -n -i eth0 'tcp[tcpflags] & tcp-syn != 0'

# Traffic on a non-default port, printing payload (useful for plain-text protocols)
sudo tcpdump -n -i eth0 -A 'port 8080'

# Write to a pcap file for later analysis (open in Wireshark)
sudo tcpdump -n -i eth0 -w /tmp/trace.pcap 'port 5432'
# Then: wireshark /tmp/trace.pcap  on your workstation

# Read a pcap back
sudo tcpdump -r /tmp/trace.pcap -n

Common BPF filter syntax:

  • host 1.2.3.4 or src host, dst host
  • net 10.0.0.0/8
  • port 80, src port, dst port
  • tcp, udp, icmp
  • Boolean combinations: and, or, not, parentheses

When not to use tcpdump

If you just want to see which connections exist, use ss. tcpdump is for seeing the actual protocol interaction — SYN/ACK timing, retransmits, payload content, what a middlebox rewrote. Do not use it as a connection list — it is overkill and misses idle connections entirely.

WAR STORY

A Kubernetes service mesh installation started dropping ~10% of requests between pods. The mesh dashboard showed elevated 5xx but no root cause. A 30-second tcpdump on a node with -w to a pcap, opened in Wireshark, showed the pattern: the client sent HTTP/2 SETTINGS frames that never got ACKed by the sidecar, leading to connection resets after 15 seconds. The sidecar binary was the wrong minor version and did not handle a specific SETTINGS option. Without packet capture, the root cause would have been invisible — application-layer metrics cannot see what happens during the TCP/HTTP handshake. Packet capture is slow but tells the truth.


nstat and /proc/net/snmp — Kernel Networking Counters

Every major kernel networking event is counted. netstat -s shows them, but nstat is more modern and compact.

# Diff two points in time (how many of each event in the last second)
nstat -c 1 '*'
# Shows changes vs the last call — great for "what is spiking right now"

# Look for specific counters
nstat -z TcpExtTCPRetrans*      # retransmit counters
nstat -z Tcp.RetransSegs

# Classic summary
netstat -s | head -30
# Tcp:
#   1234 active connections openings
#   5678 passive connection openings
#   12 failed connection attempts
#   3 connection resets received
#   150 connections established
#   42000 segments received
#   38500 segments sent out
#   123 segments retransmitted
#   4 bad segments received
#   45 resets sent

Key counters to watch:

  • Tcp.RetransSegs — TCP retransmits. Climbing = packet loss somewhere.
  • TcpExt.ListenOverflows / TcpExt.ListenDrops — accept queue full. Raise net.core.somaxconn / backlog.
  • TcpExt.TCPTimeWaitOverflow — ran out of TIME-WAIT slots. Tune tcp_max_tw_buckets.
  • UdpSndbufErrors / UdpRcvbufErrors — socket buffer overrun. Raise net.core.rmem_max / wmem_max.

conntrack — Stateful Firewall Table

Linux's NAT/stateful firewall keeps a connection tracking table. When it fills, new connections get dropped silently — a catastrophic failure mode on busy NAT gateways.

# Size and usage
sudo conntrack -C
# 85432                                     <- current count

cat /proc/sys/net/nf_conntrack_max
# 262144                                    <- max

# If current approaches max, you are about to drop connections
# Current connections by protocol
sudo conntrack -L -o extended | awk '{print $1}' | sort | uniq -c

# Look at specific connections
sudo conntrack -L -p tcp --dport 443 | head

# Raise the limit (not for the faint of heart — also raises table memory)
echo 524288 | sudo tee /proc/sys/net/nf_conntrack_max

# Reduce TIME-WAIT timeout if you are exhausting conntrack on shortlived connections
echo 30 | sudo tee /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait

On systems doing a lot of short-lived connections behind NAT (load balancers, NAT gateways, k8s nodes with many Services), conntrack is often the silent culprit.


DNS-Specific Debugging

DNS is one of the top sources of "flaky network" reports. Always check it explicitly.

# Resolve using system resolver
getent hosts example.com
# 93.184.216.34  example.com

# Check /etc/resolv.conf in this mount namespace (differs in containers!)
cat /etc/resolv.conf
# nameserver 8.8.8.8
# options ndots:5

# Raw DNS with a specific server
dig @1.1.1.1 example.com A

# Recursive trace — see every step
dig +trace example.com

# Measure resolution time
dig +stats example.com | grep 'Query time'
# ;; Query time: 23 msec

# On musl-based systems (Alpine) there is no /etc/resolver lookup caching,
# and resolver behavior differs subtly from glibc. `getent` vs `dig` results
# diverging is sometimes glibc-vs-musl NSS differences, not DNS.

In Kubernetes, DNS problems usually come from:

  • ndots:5 (default) causing many redundant lookups for external domains.
  • CoreDNS pod count or caching too low.
  • Policy blocking pod-to-CoreDNS traffic.

A Real Triage Flow

Here is how you apply this toolkit to "something is broken."

  1. Who is the caller? Who is the callee? (Often the caller is yours, the callee is somewhere you do not control.)
  2. Same pod / same node / same DC / different DC? Scope matters.
  3. Layer check, bottom-up (from Networking Fundamentals):
    • ip link / ethtool — Layer 1/2
    • ping / ip route — Layer 3
    • nc -zv / ss -tan — Layer 4
    • curl -v / dig — Layer 7
  4. If Layer 4 is flaky, check:
    • ss -s for TIME-WAIT / CLOSE-WAIT counts
    • nstat Tcp* for retransmits, overflows
    • conntrack -C vs nf_conntrack_max
  5. If Layer 7 is flaky, curl -v -w <timing> and tcpdump -w trace.pcap while reproducing.
  6. Always sanity-check DNS: getent hosts, dig, and look at /etc/resolv.conf in the right mount namespace.

Key Concepts Summary

  • ip replaces ifconfig and route. ip link, ip addr, ip route, ip neigh are the everyday sub-commands.
  • ss -tanlp is the modern netstat. State, ports, processes, all from the kernel via netlink.
  • ss -s is the connection-state summary. TIME-WAIT or CLOSE-WAIT counts reveal socket exhaustion or app bugs.
  • Connection states are diagnostic. CLOSE-WAIT = your app bug. Many SYN-SENT = firewall / peer down. Many TIME-WAIT = healthy but watch port exhaustion on the source side.
  • tcpdump for the bytes on the wire. Use BPF filters to focus on one conversation; write to .pcap for Wireshark.
  • nstat and netstat -s expose kernel counters. Retransmits, listen overflows, buffer overruns — the invisible failures.
  • conntrack -C vs nf_conntrack_max. The silent killer on NAT-heavy hosts.
  • ethtool -S for NIC counters. Drops, errors, overruns, CRC errors — the hardware story.
  • mtr for path debugging. Continuous traceroute with loss percentages beats single-shot traceroute.
  • Always check DNS explicitly. getent, dig, and /etc/resolv.conf in the right namespace.

Common Mistakes

  • Running ping and declaring the network "fine" or "broken." Ping uses ICMP; many firewalls block it while TCP is fine. Always test with the actual protocol.
  • Using netstat in a tight loop on a busy server. ss is 10–100× faster because it reads via netlink instead of scanning /proc/net.
  • Ignoring Send-Q / Recv-Q columns in ss. Non-zero Send-Q means data is queued for transmission — often indicates a slow receiver. Non-zero Recv-Q on a listening socket means the accept queue is backing up.
  • Debugging container networking on the host. You are in the wrong network namespace. Use nsenter -t $PID -n TOOL.
  • Assuming conntrack does not apply to your system. If iptables/nftables has any stateful rules, or your host does NAT, conntrack matters and can silently drop connections when full.
  • Confusing tcpdump's "packets captured" with "packets received." The kernel can drop packets before tcpdump sees them if you are too slow; look at the dropped by kernel number.
  • Running tcpdump on a busy interface without a filter. You will miss packets and saturate the CPU.
  • Ignoring PMTU (path MTU) issues. A PMTU mismatch causes hangs on large responses (small requests work). Symptom: TLS handshake completes, first HTTP header works, response body never arrives.
  • Not checking /etc/resolv.conf inside the container. Mount namespace means the container may see a different resolver config than the host — many "DNS is broken in prod" incidents are this.

KNOWLEDGE CHECK

A Kubernetes pod running a Python microservice shows a steadily growing number of connections in `ss -s` output — specifically, `close-wait` is now at 4000 and climbing. Memory usage is also climbing. What is happening, and where do you fix it?