OSI Meets Kubernetes Networking
Pod A cannot reach Pod B. Your developer files a ticket: "Networking is broken." You SSH into the node and start debugging. Is it DNS? Is it the CNI plugin? Is it a NetworkPolicy? Is it the application returning errors?
Without a framework, you are guessing. With the OSI model mapped to Kubernetes, you know exactly which layer to check. CNI problems are L2/L3. Service routing issues are L4. Ingress problems are L7. Each layer has different tools, different symptoms, and different fixes.
The Three Networks in Kubernetes
Before we map OSI layers, you need to understand that Kubernetes has three separate networks operating simultaneously. This is the single most important concept for understanding K8s networking.
The Three Networks in Kubernetes
Hover components for details
1. Node Network — The physical (or cloud VPC) network connecting your Kubernetes nodes. These are the IPs your cloud provider assigns. Nodes communicate with each other over this network. This is a real, routable network.
2. Pod Network — A virtual network where every pod gets its own unique IP address. This is managed by the CNI plugin (Calico, Cilium, Flannel). Pods on the same node communicate directly. Pods on different nodes communicate through overlay or routing.
3. Service Network — A completely virtual network that exists only in iptables/IPVS rules. ClusterIP addresses are not assigned to any interface. They are translated to pod IPs by kube-proxy.
# See the three networks in action:
# Node network
kubectl get nodes -o wide
# NAME STATUS INTERNAL-IP EXTERNAL-IP
# node-1 Ready 10.0.0.10 203.0.113.10
# node-2 Ready 10.0.0.11 203.0.113.11
# Pod network
kubectl get pods -o wide
# NAME READY IP NODE
# pod-a 1/1 10.244.0.5 node-1
# pod-b 1/1 10.244.1.8 node-2
# Service network
kubectl get svc
# NAME TYPE CLUSTER-IP PORT(S)
# kubernetes ClusterIP 10.96.0.1 443/TCP
# my-service ClusterIP 10.96.0.42 8080/TCP
The Service network is the most confusing because it is not real. There is no network interface with IP 10.96.0.42. There is no routing table entry for 10.96.0.0/12. When a pod sends a packet to 10.96.0.42, kube-proxy intercepts it (via iptables or IPVS rules) and rewrites the destination to the actual pod IP. The ClusterIP is a load-balancing virtual IP, nothing more.
Mapping OSI Layers to Kubernetes
Now let us map each OSI layer to the Kubernetes component that operates there.
Layer 2/3: The CNI Plugin — Pod-to-Pod Connectivity
The CNI (Container Network Interface) plugin is responsible for:
- Assigning IP addresses to pods (L3)
- Creating virtual network interfaces (veth pairs) for pods (L2)
- Enabling pod-to-pod communication, both on the same node and across nodes
Different CNI plugins work at different layers:
| CNI Plugin | L2/L3 Approach | Overlay | eBPF | NetworkPolicy |
|---|---|---|---|---|
| Flannel | VXLAN overlay (L2 over L3) | Yes | No | No (needs Calico) |
| Calico | BGP routing (L3) or VXLAN | Optional | Yes (eBPF mode) | Yes |
| Cilium | eBPF (L3/L4) | Optional | Yes (native) | Yes (L3-L7) |
| AWS VPC CNI | Native VPC IPs (L3) | No | No | Partial |
| Weave | VXLAN overlay + mesh | Yes | No | Yes |
# Check which CNI is running
ls /etc/cni/net.d/
# 10-calico.conflist
# Check pod CIDR allocation per node
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# node-1 10.244.0.0/24
# node-2 10.244.1.0/24
# node-3 10.244.2.0/24
When choosing a CNI, the decision usually comes down to: (1) Do you need NetworkPolicy enforcement? If yes, skip Flannel. (2) Do you need L7 policies or service mesh features? If yes, choose Cilium. (3) Are you on AWS and want native VPC integration? Use AWS VPC CNI. (4) Do you want the simplest option that works? Calico with VXLAN.
How Pod-to-Pod Communication Works — Same Node
When Pod A (10.244.0.5) sends a packet to Pod B (10.244.0.8) on the same node, the packet never leaves the node:
- Pod A sends the packet through its
eth0interface (which is actually a veth pair — a virtual Ethernet cable connecting the pod network namespace to the host) - The packet arrives on the host side of the veth pair (e.g.,
cali1234abcdin Calico) - The host Linux kernel routes the packet to Pod B's veth pair based on the routing table
- Pod B receives the packet on its
eth0
# See the veth pairs on a node
ip link show | grep cali
# 5: cali1234abcd@if4: <BROADCAST,MULTICAST,UP,LOWER_UP>
# 7: cali5678efgh@if6: <BROADCAST,MULTICAST,UP,LOWER_UP>
# See the routes to pod IPs on this node
ip route | grep cali
# 10.244.0.5 dev cali1234abcd scope link
# 10.244.0.8 dev cali5678efgh scope link
How Pod-to-Pod Communication Works — Across Nodes
When Pod A (10.244.0.5 on node-1) sends a packet to Pod C (10.244.1.8 on node-2), the packet must cross the node network. How this works depends on the CNI:
VXLAN Overlay (Flannel, Calico in VXLAN mode): The pod packet is encapsulated inside a UDP packet between the nodes. This is L2-over-L3 tunneling — the entire Ethernet frame (with pod IPs) is wrapped inside a new IP packet with node IPs.
BGP Routing (Calico in BGP mode): Each node advertises its pod CIDR to other nodes via BGP. The underlying network learns that 10.244.0.0/24 is reachable via node-1 and 10.244.1.0/24 via node-2. No encapsulation needed.
VXLAN adds 50 bytes of overhead to every packet (outer Ethernet 14 + outer IP 20 + outer UDP 8 + VXLAN header 8). This reduces the effective MTU from 1500 to 1450. If your pods send 1500-byte packets, they will be fragmented or dropped. Most CNI plugins handle this automatically by setting the pod interface MTU to 1450, but misconfiguration here is a common source of mysterious connectivity failures.
Layer 4: kube-proxy — Service Load Balancing
kube-proxy operates at Layer 4. It watches the Kubernetes API for Services and Endpoints, then programs the node to redirect traffic destined for ClusterIPs to actual pod IPs.
kube-proxy has three modes:
| Mode | How It Works | Performance | Scalability |
|---|---|---|---|
| iptables | NAT rules in iptables chains | Good for small clusters | Degrades with thousands of services |
| IPVS | Linux Virtual Server (kernel L4 LB) | Better than iptables | Handles thousands of services |
| eBPF (Cilium) | Replaces kube-proxy entirely | Best | Excellent |
# See how kube-proxy translates ClusterIP to pod IPs (iptables mode)
iptables -t nat -L KUBE-SERVICES -n | grep my-service
# -A KUBE-SERVICES -d 10.96.0.42/32 -p tcp --dport 8080 -j KUBE-SVC-XXXX
# Follow the chain to see the actual pod endpoints
iptables -t nat -L KUBE-SVC-XXXX -n
# -A KUBE-SVC-XXXX -m statistic --mode random --probability 0.333 -j KUBE-SEP-AAA
# -A KUBE-SVC-XXXX -m statistic --mode random --probability 0.500 -j KUBE-SEP-BBB
# -A KUBE-SVC-XXXX -j KUBE-SEP-CCC
# Each KUBE-SEP chain DNATs to a pod IP
iptables -t nat -L KUBE-SEP-AAA -n
# -A KUBE-SEP-AAA -p tcp -j DNAT --to-destination 10.244.0.5:8080
A team reported that their service was "randomly dropping requests." One in three requests failed. Investigation revealed one of three backend pods was in CrashLoopBackOff, but kube-proxy was still sending traffic to it because the Endpoints object had not been updated yet (readiness probe was misconfigured with too long an initial delay). The iptables rules were load-balancing to three pods, but one was dead. Fix: set proper readiness probes with short initial delays so unhealthy pods are removed from the Endpoints list quickly.
Layer 4: NetworkPolicy — Traffic Filtering
NetworkPolicy resources operate at Layer 3 and Layer 4. They allow or deny traffic based on:
- Source/destination pod labels (L3 — via pod IP)
- Source/destination namespaces (L3 — via pod IP)
- Ports and protocols (L4 — TCP/UDP port numbers)
# Allow only frontend pods to reach backend on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
NetworkPolicies are additive for allow rules but default-deny requires an explicit policy. If you create a NetworkPolicy that selects a pod, all traffic not explicitly allowed is denied for that pod. But if no NetworkPolicy selects a pod, all traffic is allowed. This asymmetry confuses many engineers. To lock down a namespace, start with a default-deny policy and then add explicit allow rules.
Layer 7: Ingress Controller — HTTP Routing
The Ingress controller operates at Layer 7. It understands HTTP and can make routing decisions based on:
- Hostname (
Hostheader) - URL path (
/api/v1goes to one service,/webgoes to another) - TLS termination (decrypts HTTPS and forwards HTTP to pods)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- devopsbeast.com
secretName: tls-secret
rules:
- host: devopsbeast.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 3000
Common Ingress controllers: NGINX Ingress, Traefik, HAProxy, AWS ALB Ingress Controller, Istio Gateway.
The newer Gateway API is replacing Ingress for advanced use cases. Gateway API provides more expressive routing (header-based routing, traffic splitting, request mirroring) and a cleaner separation between infrastructure (Gateway) and application (HTTPRoute) configuration. If you are starting a new cluster, evaluate Gateway API before defaulting to Ingress.
Layer 7: Service Mesh — mTLS, Retries, Observability
Service meshes like Istio and Linkerd inject sidecar proxies into every pod. These proxies operate at Layer 7 and provide:
- mTLS: automatic encryption between all pods (zero-trust networking)
- Retries and timeouts: configurable per-route retry policies
- Traffic routing: canary deployments, A/B testing, blue-green deployments
- Observability: request-level metrics, distributed tracing
# Check if Istio sidecar is injected
kubectl get pod my-pod -o jsonpath='{.spec.containers[*].name}'
# my-app istio-proxy
# ^^^^^^^^^^^ The sidecar proxy — intercepts all traffic
# Check mTLS status between services
istioctl authn tls-check my-pod.production
Troubleshooting by Layer in Kubernetes
Here is the systematic approach when a pod cannot reach a service:
Kubernetes Network Troubleshooting by OSI Layer
Click each step to explore
# The complete K8s network debugging sequence:
# Step 1: L3 — Pod-to-Pod IP connectivity
kubectl exec pod-a -- ping -c 3 10.244.1.8
# If fails: CNI issue
# Step 2: L4 — Port connectivity
kubectl exec pod-a -- nc -zv 10.244.1.8 8080
# If fails: NetworkPolicy or service not listening
# Step 3: L4 — Service discovery
kubectl exec pod-a -- nc -zv my-service.default.svc.cluster.local 8080
# If fails but Step 2 works: DNS or kube-proxy issue
# Step 4: L7 — Application response
kubectl exec pod-a -- curl -s http://my-service.default:8080/health
# If fails but Step 3 works: application issue
# Step 5: Check DNS specifically
kubectl exec pod-a -- nslookup my-service.default.svc.cluster.local
# If NXDOMAIN: service does not exist or wrong namespace
# If SERVFAIL: CoreDNS is broken
# Step 6: Check endpoints
kubectl get endpoints my-service
# If empty: no pods match the service selector, or pods are not ready
The most common Kubernetes networking "issue" is not a networking issue at all. It is one of: (1) DNS typo in the service name, (2) pods not ready so endpoints are empty, (3) wrong port number in the service spec, (4) missing or wrong label selector on the service. Always check kubectl get endpoints before assuming the network is broken.
How ClusterIP Actually Works
Let us demystify ClusterIP by tracing what happens when Pod A sends a request to my-service (ClusterIP 10.96.0.42, port 8080):
- Pod A's application resolves
my-service.default.svc.cluster.localvia CoreDNS, which returns10.96.0.42 - Pod A sends a TCP SYN to
10.96.0.42:8080 - The packet enters the host network namespace via the pod veth pair
- iptables/IPVS on the host matches the destination (
10.96.0.42:8080) against the KUBE-SERVICES chain - The matching rule performs DNAT (Destination NAT): rewrites the destination IP from
10.96.0.42to a pod IP (e.g.,10.244.1.8) - If the target pod is on another node, the packet is routed via the CNI (VXLAN or BGP)
- The target pod receives a packet that appears to come from Pod A's IP (
10.244.0.5) with destination10.244.1.8:8080 - The response goes back to Pod A, and conntrack ensures the reverse translation happens correctly
# Trace this with conntrack
conntrack -L -d 10.96.0.42
# tcp 6 117 TIME_WAIT src=10.244.0.5 dst=10.96.0.42 sport=43210 dport=8080
# src=10.244.1.8 dst=10.244.0.5 sport=8080 dport=43210 [ASSURED]
# ^^^^^^^^^^^^^
# The actual destination after DNAT
A team had intermittent connection resets to a ClusterIP service. The issue was conntrack table exhaustion. Their cluster was handling 50,000 connections per second, and the default nf_conntrack_max was 65536. When the conntrack table filled up, new connections were dropped silently. Fix: increase nf_conntrack_max via sysctl, or migrate to Cilium eBPF mode which does not use conntrack for service routing.
Pod-to-Pod Across Nodes: VXLAN vs BGP
When pods on different nodes need to communicate, the CNI plugin must get the packet from one node to another. The two main approaches are fundamentally different:
VXLAN Overlay vs BGP Routing
VXLAN Overlay
Encapsulation (L2 over L3)
BGP Routing
Native routing (L3)
Key Concepts Summary
- Kubernetes has three networks: node network (physical/VPC), pod network (CNI-managed), and service network (virtual, iptables/IPVS)
- CNI plugins operate at L2/L3: they assign pod IPs, create veth pairs, and handle cross-node routing via VXLAN or BGP
- kube-proxy operates at L4: it translates ClusterIPs to pod IPs using iptables, IPVS, or eBPF
- NetworkPolicy operates at L3/L4: allows or denies traffic based on pod labels, namespaces, and ports
- Ingress controllers operate at L7: HTTP routing based on hostname and URL path, plus TLS termination
- Service meshes operate at L7: mTLS, retries, traffic routing, and observability via sidecar proxies
- ClusterIP is a virtual IP that only exists in iptables rules — no interface holds this IP
- VXLAN reduces MTU by 50 bytes — misconfigured MTU is a common source of cross-node pod connectivity issues
- Most K8s networking issues are not network issues — they are DNS typos, empty endpoints, wrong ports, or missing labels
Common Mistakes
- Assuming ClusterIP is a real IP on a real interface — it is a DNAT rule, nothing more
- Forgetting that NetworkPolicies are namespace-scoped and only enforced if the CNI supports them (Flannel does not)
- Not checking
kubectl get endpointswhen a service is unreachable — empty endpoints means no pods match the selector - Ignoring MTU when using overlay networks — VXLAN reduces MTU by 50 bytes, causing fragmentation or dropped packets
- Using
pingto test service connectivity — ClusterIP does not respond to ICMP by default, usencorcurlinstead - Confusing Pod DNS names with Service DNS names —
pod-ip.namespace.pod.cluster.localexists but is rarely useful
A pod can reach another pod directly by IP (10.244.1.8:8080), but cannot reach it via the Service name (my-service:8080). What is the most likely cause?