Networking Fundamentals for DevOps Engineers

tcpdump & Packet Analysis

Two pods run the same application image, same configuration, same environment variables. Pod A can reach the database. Pod B cannot. curl from Pod B to the database shows a timeout. No error message. No logs. No clue.

You run tcpdump on the node, filter by Pod B's IP, and immediately see the answer: the TCP SYN packet leaves Pod B, but no SYN-ACK ever comes back. The packet is being silently dropped. You check NetworkPolicies — sure enough, a recently applied policy allows database access only from pods with label tier: backend. Pod B was deployed without that label.

tcpdump turned a mystery into a one-line fix. It is the command you reach for when everything else has failed, when logs say nothing, and when the network itself is lying to you.


Part 1: tcpdump Fundamentals

tcpdump captures packets at the network interface level. It sees every packet entering and leaving a host — before iptables, before any application, before anything else processes the traffic.

The Basic Syntax

# The most basic capture: everything on all interfaces
tcpdump -i any
# 15:42:01.123456 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [S], seq 12345
# 15:42:01.123789 IP 10.244.2.8.8080 > 10.244.1.5.45678: Flags [S.], seq 67890, ack 12346
# 15:42:01.123890 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [.], ack 67891

# Key flags:
# -i any     : Capture on all interfaces (eth0, lo, veth*, etc.)
# -i eth0    : Capture only on eth0
# -n         : Do not resolve hostnames (MUCH faster output)
# -nn        : Do not resolve hostnames OR port names (8080 instead of "http-alt")
# -v/-vv/-vvv: Increasing verbosity (show TTL, IP options, etc.)
# -c 100     : Capture only 100 packets, then stop
# -w file.pcap : Write raw packets to file (for Wireshark analysis)
# -r file.pcap : Read packets from a saved file
KEY CONCEPT

Always use -n (or -nn) with tcpdump. Without it, tcpdump performs reverse DNS lookups for every IP address it sees, which slows the output dramatically and can even cause missed packets on busy interfaces. The first thing you type should always be tcpdump -i any -nn.

Reading tcpdump Output

# A typical TCP packet:
# 15:42:01.123456 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [S], seq 12345, win 65535, length 0
#
# Breakdown:
# 15:42:01.123456     → Timestamp (HH:MM:SS.microseconds)
# IP                  → IPv4 packet
# 10.244.1.5.45678    → Source IP . Source Port
# >                   → Direction (source → destination)
# 10.244.2.8.8080     → Destination IP . Destination Port
# Flags [S]           → TCP flags (see below)
# seq 12345           → Sequence number
# win 65535           → TCP window size
# length 0            → Payload length (0 for SYN/ACK)

TCP Flags — The Rosetta Stone

Every TCP packet has flags that indicate its purpose. Learning these flags is essential for reading tcpdump output.

# TCP Flags in tcpdump:
# [S]     → SYN        : "I want to start a connection"
# [S.]    → SYN-ACK    : "OK, I accept the connection"
# [.]     → ACK        : "I received your data"
# [P.]    → PSH-ACK    : "Here is data, process it immediately"
# [F.]    → FIN-ACK    : "I am done sending, let us close"
# [R]     → RST        : "Connection rejected / aborted"
# [R.]    → RST-ACK    : "Connection reset with acknowledgment"

# A healthy TCP connection (3-way handshake + data + close):
# Client → Server: [S]    SYN        (start connection)
# Server → Client: [S.]   SYN-ACK    (accept connection)
# Client → Server: [.]    ACK        (handshake complete)
# Client → Server: [P.]   PSH-ACK    (send HTTP request)
# Server → Client: [P.]   PSH-ACK    (send HTTP response)
# Client → Server: [F.]   FIN-ACK    (close connection)
# Server → Client: [F.]   FIN-ACK    (confirm close)
# Client → Server: [.]    ACK        (final acknowledgment)
PRO TIP

Memorize what RST (Reset) means. When you see [R] in tcpdump output, it means one side is forcefully aborting the connection. Common causes: nothing listening on the port, firewall actively rejecting (REJECT vs DROP), application crash during connection, connection to a closed port. RST is your friend during debugging — it is the network telling you exactly what went wrong. A timeout (no response at all) is harder to debug than a reset.


Part 2: Useful tcpdump Filters

The power of tcpdump is in its filters. On a busy production node, unfiltered tcpdump produces thousands of packets per second. Filters let you isolate exactly the traffic you care about.

Filter by Host

# Capture all traffic to/from a specific IP
tcpdump -i any -nn host 10.244.2.8

# Capture traffic FROM a specific source
tcpdump -i any -nn src host 10.244.1.5

# Capture traffic TO a specific destination
tcpdump -i any -nn dst host 10.244.2.8

# Capture traffic between two specific hosts
tcpdump -i any -nn host 10.244.1.5 and host 10.244.2.8

Filter by Port

# Capture all traffic on port 8080
tcpdump -i any -nn port 8080

# Capture traffic on multiple ports
tcpdump -i any -nn port 8080 or port 443

# Capture traffic on a port range
tcpdump -i any -nn portrange 8080-8090

# Combine host and port filters
tcpdump -i any -nn host 10.244.2.8 and port 8080

Filter by Protocol

# Capture only TCP traffic
tcpdump -i any -nn tcp

# Capture only UDP traffic
tcpdump -i any -nn udp

# Capture only ICMP (ping) traffic
tcpdump -i any -nn icmp

# Capture only DNS traffic (UDP port 53)
tcpdump -i any -nn port 53

# Capture only SYN packets (new connections)
tcpdump -i any -nn 'tcp[tcpflags] & tcp-syn != 0'

# Capture only RST packets (connection resets)
tcpdump -i any -nn 'tcp[tcpflags] & tcp-rst != 0'

# Capture only SYN packets that are NOT SYN-ACK (outbound connections only)
tcpdump -i any -nn 'tcp[tcpflags] & (tcp-syn|tcp-ack) = tcp-syn'

tcpdump Filter Decision Tree

Click each step to explore

WARNING

When capturing on production systems, always use filters and always use -c (packet count limit) or -w (write to file with a rotation). An unfiltered tcpdump on a busy node can consume significant CPU and, if writing to disk, can fill the filesystem. Never leave a tcpdump running unattended in production.


Part 3: Capturing Specific Traffic Types

Capturing DNS Traffic

# Capture all DNS queries and responses
tcpdump -i any -nn port 53
# 15:42:01.001 IP 10.244.1.5.34567 > 10.96.0.10.53: 12345+ A? api-service.production.svc.cluster.local. (58)
# 15:42:01.003 IP 10.96.0.10.53 > 10.244.1.5.34567: 12345 1/0/0 A 10.96.45.123 (74)
#
# Reading this:
# 12345+         → Query ID (matches request to response)
# A?             → Query type A (IPv4 address)
# api-service... → The hostname being queried
# 1/0/0          → 1 answer, 0 authority, 0 additional records
# A 10.96.45.123 → The answer: A record pointing to 10.96.45.123

# Look for DNS queries without responses:
# 15:42:01.001 IP 10.244.1.5.34567 > 10.96.0.10.53: 12345+ A? api-service... (58)
# 15:42:06.001 IP 10.244.1.5.34568 > 10.96.0.10.53: 12346+ A? api-service... (58)
# ← No response between these queries = DNS timeout
# The 5-second gap is the DNS retry interval
WAR STORY

A cluster experienced intermittent 5-second delays on all HTTP requests. Not all requests — roughly 1 in 20. tcpdump on port 53 revealed the pattern: DNS queries for external domains were being sent to CoreDNS, which then forwarded them upstream. Some upstream queries timed out after 5 seconds. The cause was the ndots:5 setting in /etc/resolv.conf (Kubernetes default). Pods were querying api.external.com.production.svc.cluster.local, api.external.com.svc.cluster.local, and three other search domains before finally trying api.external.com — and one of those intermediate queries would occasionally time out. The fix: add a trailing dot to external hostnames (api.external.com.) or reduce ndots to 1.

Capturing HTTP Traffic

# Capture HTTP traffic and show ASCII content
tcpdump -i any -nn port 80 -A
# 15:42:01.123 IP 10.244.1.5.45678 > 10.244.2.8.80: Flags [P.], length 86
# GET /health HTTP/1.1
# Host: api-service
# User-Agent: curl/7.88.1
# Accept: */*
#
# 15:42:01.125 IP 10.244.2.8.80 > 10.244.1.5.45678: Flags [P.], length 142
# HTTP/1.1 200 OK
# Content-Type: application/json
# Content-Length: 22
#
# {"status":"healthy"}

# -A: Show packet content as ASCII (readable for HTTP)
# -X: Show packet content as hex + ASCII (useful for binary protocols)
PRO TIP

The -A flag only works for unencrypted HTTP traffic (port 80). For HTTPS (port 443), the payload is encrypted and -A shows gibberish. To analyze HTTPS traffic, capture with -w file.pcap and analyze in Wireshark — you can still see the TLS handshake (certificate exchange, cipher negotiation) even though the application data is encrypted.

Capturing the TLS Handshake

# Capture TLS handshake (you cannot read the encrypted data,
# but you CAN see the handshake and certificate exchange)
tcpdump -i any -nn port 443 -w tls-capture.pcap

# What the TLS handshake looks like in tcpdump (verbose mode):
tcpdump -i any -nn port 443 -v
# Client → Server: [S]      TCP SYN
# Server → Client: [S.]     TCP SYN-ACK
# Client → Server: [.]      TCP ACK (handshake complete)
# Client → Server: [P.]     ClientHello (TLS version, cipher suites, SNI)
# Server → Client: [P.]     ServerHello (chosen cipher, certificate)
# Client → Server: [P.]     Key exchange, ChangeCipherSpec
# Server → Client: [P.]     ChangeCipherSpec
# ...encrypted application data...

# To see TLS details, save to pcap and open in Wireshark:
# Wireshark → filter: tls.handshake → shows full certificate chain,
# cipher negotiation, SNI, and TLS version

Part 4: What Packet Patterns Tell You

The packets do not lie. Here are the patterns you will see most often and what they mean.

Pattern: SYN Without SYN-ACK (Timeout)

tcpdump -i any -nn host 10.244.2.8 and port 8080
# 15:42:01.001 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [S], seq 12345
# 15:42:02.002 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [S], seq 12345
# 15:42:04.004 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [S], seq 12345
# 15:42:08.008 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [S], seq 12345
#
# SYN sent, no response. Retransmissions at 1s, 2s, 4s (exponential backoff).
# The packets are being SILENTLY DROPPED.
# Causes: NetworkPolicy (DROP), firewall rule, routing black hole, dead host

Pattern: Immediate RST (Connection Refused)

tcpdump -i any -nn host 10.244.2.8 and port 8080
# 15:42:01.001 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [S], seq 12345
# 15:42:01.002 IP 10.244.2.8.8080 > 10.244.1.5.45678: Flags [R.], seq 0, ack 12346
#
# SYN sent, RST received immediately (1ms later).
# The destination host is REACHABLE but NOTHING IS LISTENING on port 8080.
# Causes: app not running, app listening on different port, app on 127.0.0.1

Pattern: TCP Retransmissions (Packet Loss)

tcpdump -i any -nn host 10.244.2.8 and port 8080
# 15:42:01.001 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [P.], seq 1:500, length 499
# 15:42:01.250 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [P.], seq 1:500, length 499
# 15:42:01.750 IP 10.244.1.5.45678 > 10.244.2.8.8080: Flags [P.], seq 1:500, length 499
#
# Same data sent 3 times. The peer is not acknowledging.
# Causes: network congestion, packet loss, overloaded receiver, MTU issues

Pattern: RST After Data Exchange (App Crash)

tcpdump -i any -nn host 10.244.2.8 and port 8080
# 15:42:01.001 [S]     → SYN
# 15:42:01.002 [S.]    → SYN-ACK
# 15:42:01.002 [.]     → ACK (connection established)
# 15:42:01.003 [P.]    → HTTP request sent
# 15:42:01.050 [R]     → RST from server
#
# Connection established successfully, request sent, then server RSTs.
# The application accepted the connection but crashed while processing.
# Check application logs on the destination pod.

Healthy vs Unhealthy TCP Patterns

Healthy Connection

Complete 3-way handshake + data exchange

Step 1SYN → (client initiates)
Step 2← SYN-ACK (server accepts)
Step 3ACK → (handshake complete)
Step 4PSH-ACK → (request data)
Step 5← PSH-ACK (response data)
Step 6FIN-ACK → (close)
Step 7← FIN-ACK (confirm close)
Unhealthy Patterns

What failure looks like in tcpdump

SYN → (nothing)Timeout: packets silently dropped (firewall/NetworkPolicy)
SYN → ← RSTConnection refused: nothing listening on port
RetransmissionsSame packet sent multiple times: packet loss or congestion
RST after handshakeApp crash: connection established then server aborted
SYN-ACK then RSTFirewall closing connection after inspection (stateful firewall)
Delayed ACKsSlow processing: receiver overwhelmed or CPU-starved
Window size 0Receiver buffer full: backpressure from slow consumer

Part 5: tcpdump in Kubernetes

In Kubernetes, you typically run tcpdump on the node, not inside pods (most pods do not have tcpdump installed).

Capture Traffic for a Specific Pod

# Step 1: Find the pod IP
kubectl get pod my-pod -n production -o wide
# NAME     READY   STATUS    IP           NODE
# my-pod   1/1     Running   10.244.2.8   node-2

# Step 2: SSH to the node where the pod runs
ssh node-2

# Step 3: Capture traffic filtered by pod IP
tcpdump -i any -nn host 10.244.2.8
# This captures ALL traffic to/from the pod — inbound and outbound

# Step 4: Narrow down with port filter
tcpdump -i any -nn host 10.244.2.8 and port 5432
# Only traffic between the pod and PostgreSQL

Using nsenter — Capture From the Pod Network Namespace

nsenter lets you run commands in a pod's network namespace without entering the pod. This is more precise than filtering by IP on the node.

# Step 1: Find the container ID
crictl ps | grep my-pod
# abc123def456   my-app   Running

# Step 2: Find the PID of the container
crictl inspect abc123def456 | grep -m1 pid
# "pid": 12345

# Step 3: Run tcpdump in the pod network namespace
nsenter -t 12345 -n -- tcpdump -i any -nn port 8080

# This captures ONLY traffic visible to the pod.
# No other pod traffic on the same node will appear.
# Much cleaner output than filtering by IP on the host interface.
KEY CONCEPT

Using nsenter with tcpdump is the cleanest way to capture pod traffic. When you filter by pod IP on the node, you might see encapsulated traffic (VXLAN, Geneve) depending on your CNI. With nsenter, you see traffic exactly as the pod sees it — unencapsulated, simple, and easy to read. This is especially important with overlay networks like Flannel where node-level captures show encapsulated packets.

Saving Captures for Wireshark

# Capture to a file (pcap format)
tcpdump -i any -nn host 10.244.2.8 -w /tmp/pod-traffic.pcap -c 1000
# Captures 1000 packets and writes to file

# Rotating capture (prevent filling disk)
tcpdump -i any -nn host 10.244.2.8 \
  -w /tmp/capture-%Y%m%d-%H%M%S.pcap \
  -G 60 \     # Rotate every 60 seconds
  -W 10       # Keep only 10 files (10 minutes of data)

# Copy the pcap file to your local machine
kubectl cp production/my-pod:/tmp/capture.pcap ./capture.pcap
# Or use scp from the node:
scp node-2:/tmp/pod-traffic.pcap ./pod-traffic.pcap

# Open in Wireshark for visual analysis
# Wireshark filters:
# tcp.analysis.retransmission → Show retransmissions
# tcp.flags.reset == 1        → Show RST packets
# dns                         → Show DNS traffic
# tls.handshake               → Show TLS handshakes
# http.response.code == 500   → Show HTTP 500 errors
PRO TIP

When you need to capture traffic from a pod but cannot SSH to the node, use a sidecar container with tcpdump. Add a container with the nicolaka/netshoot image to the pod spec, sharing the network namespace (which all containers in a pod share by default). Run tcpdump inside the sidecar. This also works with ephemeral containers: kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-app -- tcpdump -i any -nn port 8080.

WAR STORY

The most memorable tcpdump debug session I had involved a service that worked perfectly from 10 AM to 2 PM, then started timing out. Every day. Same time window. tcpdump revealed the pattern: at 2 PM, a batch job kicked off that created thousands of short-lived connections, exhausting the conntrack table. New connections for the production service were being dropped because there was no room in conntrack. The fix was a two-line sysctl change to increase nf_conntrack_max and decrease nf_conntrack_tcp_timeout_time_wait. tcpdump showed us the SYN packets leaving but no SYN-ACK returning — the node itself was dropping the outbound packets due to conntrack exhaustion.


Course Conclusion

You have completed the Networking Fundamentals for DevOps Engineers course. Let us recap the journey from OSI layers to packet captures.

What You Learned

Module 1 — The OSI Model: You built a mental framework for understanding how data flows through networks, layer by layer. Every troubleshooting session starts with "which layer is the problem at?"

Module 2 — DNS: You learned how hostnames become IP addresses, how CoreDNS works in Kubernetes, and why DNS is the first thing that breaks in every cluster.

Module 3 — TCP, IP & UDP: You understood the transport protocols that carry all network traffic — the three-way handshake, flow control, windowing, and why UDP exists alongside TCP.

Module 4 — HTTP & HTTPS: You saw how application-level communication works, how TLS secures it, and how to debug HTTP issues methodically.

Module 5 — Load Balancing & Proxies: You learned L4 vs L7 load balancing, how reverse proxies (NGINX, Envoy, Traefik) route traffic, and what every Kubernetes Service type creates under the hood.

Module 6 — Network Troubleshooting: You gained a systematic methodology for debugging network issues, mastered the 7 essential commands, and learned to read packet captures with tcpdump.

Where to Go Next

The networking fundamentals you learned here are the foundation for advanced infrastructure topics:

  • TLS deep dive: Certificate chains, mTLS, certificate rotation in Kubernetes — understanding TCP and the TLS handshake from this course makes TLS debugging straightforward
  • Kubernetes networking at scale: CNI internals (Cilium eBPF, Calico BGP), NetworkPolicy design, multi-cluster networking — the Service types and kube-proxy modes from Module 5 are prerequisite knowledge
  • Service mesh architecture: Istio, Linkerd, Envoy data plane — understanding reverse proxies and L7 routing from this course is essential before adding sidecar proxies to every pod

Keep Practicing

Networking is a skill that deepens with practice, not just reading. Here are concrete next steps:

  1. Set up a lab cluster (kind, minikube, or k3s) and deliberately break networking — then use the 5-step methodology to fix it
  2. Run tcpdump during your next incident — even if you do not need it, the practice of reading live packet captures builds muscle memory
  3. Create a NetworkPolicy that blocks specific traffic, then debug it from a pod that is affected — this teaches you how silent drops look from the application perspective

The engineers who are best at networking are not the ones who memorized RFCs. They are the ones who have run tcpdump a hundred times and can read packet patterns like a language. That fluency comes from practice.

Thank you for completing this course. Go build something, break something, and debug the network when it all falls apart.


Key Concepts Summary

  • tcpdump -i any -nn is the starting point for all packet captures — always use -nn to disable DNS lookups
  • TCP flags tell the story: [S] = new connection, [R] = rejected, [F.] = closing, retransmissions = packet loss
  • SYN without SYN-ACK = packets silently dropped (firewall/NetworkPolicy)
  • Immediate RST = destination reachable but nothing listening on the port
  • Retransmissions = packet loss, congestion, or overloaded receiver
  • nsenter lets you capture from a pod's network namespace without entering the pod — cleaner than filtering by IP on the node
  • Save captures with -w for Wireshark analysis — especially for TLS traffic that cannot be read in the terminal
  • Always use filters and packet limits in production — unfiltered captures can impact system performance and fill disks

Common Mistakes

  • Running tcpdump without -n — reverse DNS lookups slow output dramatically and can cause packet drops on busy interfaces
  • Running unfiltered captures on production nodes — captures everything, overwhelms the terminal, wastes CPU
  • Leaving tcpdump running without -c or file rotation — fills the disk, causing a second incident while debugging the first
  • Interpreting intermediate router packet loss in traceroute/mtr as real loss — only final-hop loss counts
  • Trying to read HTTPS content with -A — TLS-encrypted traffic is unreadable in tcpdump, save to pcap and use Wireshark
  • Forgetting that tcpdump on the node sees encapsulated traffic (VXLAN/Geneve) with overlay CNIs — use nsenter for clean pod-level captures
  • Not capturing bidirectional traffic — always filter by host (both directions) not just src or dst, unless you specifically need one direction

KNOWLEDGE CHECK

You capture packets with tcpdump and see: SYN sent from Pod A to Pod B, then the same SYN retransmitted 3 times with exponential backoff, and no SYN-ACK ever received. What is the most likely cause?