Networking Fundamentals for Engineers

Kubernetes Service Types Deep Dive

You create a Service of type LoadBalancer in your Kubernetes cluster. You run kubectl get svc and wait. The EXTERNAL-IP column says <pending>. Five minutes pass. Still pending. Ten minutes. Still pending.

You check the pod, it is Running. The Endpoints are populated. The Service has a ClusterIP. Everything looks correct from inside the cluster. But no external IP appears.

The problem? Your cluster does not have a cloud controller manager configured. There is nothing in the cluster that knows how to talk to your cloud provider and provision a load balancer. The Service is waiting for something that will never come.

To debug this kind of issue, you need to understand what each Service type actually creates under the hood, not just what kubectl get svc shows you.

Part 1: ClusterIP: The Foundation of All Services

Every Kubernetes Service starts as a ClusterIP. Even NodePort and LoadBalancer services have a ClusterIP underneath. Understanding ClusterIP means understanding how all Services work.

What ClusterIP Creates

When you create a ClusterIP Service, Kubernetes assigns a virtual IP address from the Service CIDR range (typically 10.96.0.0/12). This IP does not belong to any network interface. No pod, no node, no device has this IP. It exists only in iptables/IPVS rules.

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  type: ClusterIP          # default
  selector:
    app: api
  ports:
  - port: 80               # Service port (what clients connect to)
    targetPort: 8080        # Pod port (where the app listens)
    protocol: TCP

$ kubectl get svc api-service
NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
api-service   ClusterIP   10.96.45.123   <none>        80/TCP    5m

$ kubectl get endpoints api-service
NAME          ENDPOINTS                                      AGE
api-service   10.244.1.5:8080,10.244.2.8:8080,10.244.3.2:8080   5m

The Service maps 10.96.45.123:80 to the three pod IPs on port 8080. But how does the traffic actually get from the ClusterIP to the pod? That depends on which kube-proxy mode your cluster uses.

KEY CONCEPT

A ClusterIP is a virtual IP that only exists in the data plane rules (iptables, IPVS, or eBPF maps). You cannot ping it. You cannot traceroute to it. No device has this IP on any network interface. It only works because every node in the cluster has rules that intercept traffic to this IP and redirect it to a real pod IP. If you are debugging and try to ping a ClusterIP and it does not respond, that is normal.

kube-proxy Mode: iptables (Default)

In iptables mode, kube-proxy writes NAT rules that intercept traffic destined for the ClusterIP and DNAT (destination NAT) it to a randomly chosen pod IP.

# Simplified view of what kube-proxy creates for a 3-pod Service:

# Step 1: Match traffic to the Service ClusterIP
iptables -t nat -A KUBE-SERVICES \
  -d 10.96.45.123/32 -p tcp --dport 80 \
  -j KUBE-SVC-ABCDEF

# Step 2: Probabilistic load balancing
# Rule 1: 33.3% chance → Pod A
iptables -t nat -A KUBE-SVC-ABCDEF \
  -m statistic --mode random --probability 0.33333 \
  -j KUBE-SEP-POD-A

# Rule 2: 50% of remaining (= 33.3% total) → Pod B
iptables -t nat -A KUBE-SVC-ABCDEF \
  -m statistic --mode random --probability 0.50000 \
  -j KUBE-SEP-POD-B

# Rule 3: Everything else (33.3%) → Pod C
iptables -t nat -A KUBE-SVC-ABCDEF \
  -j KUBE-SEP-POD-C

# Step 3: DNAT to the actual pod IP
iptables -t nat -A KUBE-SEP-POD-A \
  -p tcp -j DNAT --to-destination 10.244.1.5:8080

WARNING

iptables mode does not do real load balancing. It uses random selection via the statistic module. There is no "least connections" or "round robin", just random. For most workloads this is fine. But if you have long-lived connections (WebSocket, gRPC streams), the randomness can lead to uneven distribution. One pod might get 5 long-lived connections while another gets 1.

kube-proxy Mode: IPVS

IPVS (IP Virtual Server) is a Linux kernel module designed specifically for load balancing. When kube-proxy runs in IPVS mode, it creates IPVS virtual servers instead of iptables rules.

# Check if your cluster uses IPVS
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep "Using ipvs"

# View IPVS rules
ipvsadm -Ln
# IP Virtual Server version 1.2.1 (size=4096)
# Prot LocalAddress:Port Scheduler Flags
#   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
# TCP  10.96.45.123:80 rr
#   -> 10.244.1.5:8080              Masq    1      12         45
#   -> 10.244.2.8:8080              Masq    1      11         42
#   -> 10.244.3.2:8080              Masq    1      13         48

IPVS provides real load balancing algorithms:

Algorithm	Flag	Description
Round Robin	`rr`	Sequential rotation (default)
Least Connections	`lc`	Fewest active connections
Destination Hashing	`dh`	Hash destination IP, consistent routing
Source Hashing	`sh`	Hash source IP, session affinity
Shortest Expected Delay	`sed`	Factors in connection count AND weight
Never Queue	`nq`	Sends to idle server if available, else SED

# Switch kube-proxy to IPVS mode
kubectl edit configmap kube-proxy -n kube-system
# Change: mode: "ipvs"
# Change: ipvs.scheduler: "lc"   # least connections

# Restart kube-proxy pods to apply
kubectl rollout restart daemonset kube-proxy -n kube-system

PRO TIP

IPVS mode is significantly better than iptables at scale. With iptables, adding a Service with 100 endpoints means adding 100+ iptables rules. Rule evaluation is O(n), the kernel walks through rules linearly. IPVS uses hash tables for O(1) lookups. If your cluster has more than 1,000 Services or more than 5,000 Endpoints, switch to IPVS mode. The performance difference is dramatic.

kube-proxy Replacement: eBPF (Cilium)

Cilium can replace kube-proxy entirely using eBPF programs attached to the kernel networking stack. This is the most performant option.

# Cilium kube-proxy replacement:
# - No iptables rules for Services
# - No IPVS virtual servers
# - eBPF programs attached to socket and TC hooks
# - Socket-level redirection: traffic is redirected BEFORE
#   it even enters the kernel networking stack

# Check if Cilium is replacing kube-proxy
cilium status | grep KubeProxyReplacement
# KubeProxyReplacement:   True

kube-proxy Modes: iptables vs IPVS vs eBPF

iptables (Default)

Simple, works everywhere

Load balancingRandom (statistic module)

AlgorithmsRandom only

Lookup performanceO(n), linear rule walk

Rule count per Service~5 rules per endpoint

ScalabilityDegrades above 5,000 Services

Connection trackingconntrack (kernel)

When to useSmall clusters, default setup

IPVS / eBPF

Scalable, real load balancing

Load balancingReal algorithms (rr, lc, sh, etc.)

Algorithms6+ options (rr, lc, dh, sh, sed, nq)

Lookup performanceO(1), hash table (IPVS) / eBPF map

Rule count per Service1 virtual server per Service

ScalabilityHandles 10,000+ Services easily

Connection trackingconntrack (IPVS) / eBPF CT map

When to useLarge clusters, performance-critical

Part 2: NodePort: Exposing Services Outside the Cluster

NodePort builds on top of ClusterIP. It takes a ClusterIP Service and adds one thing: it opens a specific port on every node in the cluster.

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  type: NodePort
  selector:
    app: api
  ports:
  - port: 80              # ClusterIP port (internal)
    targetPort: 8080       # Pod port
    nodePort: 30080        # Port opened on every node (30000-32767)
    protocol: TCP

$ kubectl get svc api-service
NAME          TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
api-service   NodePort   10.96.45.123   <none>        80:30080/TCP   5m

# Now reachable at ANY node IP on port 30080:
curl http://node-1-ip:30080    # works
curl http://node-2-ip:30080    # works (even if no api pods run on node-2)
curl http://node-3-ip:30080    # works

The traffic flow: Client → NodeIP:30080 → iptables DNAT → PodIP:8080

KEY CONCEPT

NodePort opens the port on ALL nodes, not just nodes running the target pods. If you have 100 nodes and only 3 run your pods, all 100 nodes accept traffic on port 30080 and forward it to one of the 3 pods. This is useful because external load balancers can point at all nodes without knowing which ones run the pods.

externalTrafficPolicy: Cluster vs Local

This is one of the most important and least understood Service settings.

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  type: NodePort
  externalTrafficPolicy: Local   # or Cluster (default)

Cluster (default): Traffic arriving at any node is distributed to ALL pods in the Service, regardless of which node they are on. This means a request might arrive at Node A, get forwarded to a pod on Node C. The downside: an extra network hop, and the source IP is lost (because the node SNATs the packet).

Local: Traffic arriving at a node is only sent to pods running ON THAT NODE. If no pods run on that node, the traffic is dropped (the node returns ICMP unreachable). The upside: no extra hop, and the source IP is preserved. The downside: uneven distribution if pods are not evenly spread across nodes.

# externalTrafficPolicy: Cluster (default)
# Client (1.2.3.4) → Node A:30080
#   → iptables SNATs source to Node A IP
#   → Forwards to Pod on Node C
#   → Pod sees source IP = Node A (NOT the client)

# externalTrafficPolicy: Local
# Client (1.2.3.4) → Node A:30080
#   → iptables forwards to Pod on Node A (local only)
#   → Pod sees source IP = 1.2.3.4 (the real client IP!)

WAR STORY

Our security team needed client source IPs for audit logging. All our Services were using the default externalTrafficPolicy: Cluster, which meant every request appeared to come from an internal node IP. We switched to Local, but immediately saw uneven load: 2 nodes had 3 pods each and got most traffic, while 3 other nodes had 1 pod each and got less. The fix was a combination of Local policy and a PodAntiAffinity rule to spread pods evenly across nodes.

externalTrafficPolicy: Cluster vs Local

Click each step to explore

Part 3: LoadBalancer: Cloud Integration

LoadBalancer type builds on NodePort. It creates a NodePort Service AND asks the cloud provider to create an external load balancer pointing at the NodePort on all nodes.

apiVersion: v1
kind: Service
metadata:
  name: api-service
  annotations:
    # AWS-specific: use NLB instead of Classic LB
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    # AWS: make it internal (VPC only)
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

$ kubectl get svc api-service
NAME          TYPE           CLUSTER-IP     EXTERNAL-IP                          PORT(S)        AGE
api-service   LoadBalancer   10.96.45.123   a1b2c3-1234567890.elb.amazonaws.com  80:31245/TCP   2m

How It Works Under the Hood

The cloud controller manager (running in the cluster or as a cloud-hosted component) watches for Services of type LoadBalancer. When it sees one:

Creates a cloud load balancer (AWS NLB/CLB, GCP Network LB, Azure LB)
Configures health checks pointing at the NodePort on all nodes
Registers all nodes as targets
Updates the Service with the external IP/hostname in the status.loadBalancer.ingress field

KEY CONCEPT

The "EXTERNAL-IP pending" problem has exactly three causes: (1) The cloud controller manager is not running or not configured with correct credentials. (2) You hit a cloud quota limit (e.g., max Elastic IPs, max load balancers). (3) You are running on bare metal with no cloud integration, use MetalLB instead. Check cloud controller manager logs first: kubectl logs -n kube-system -l component=cloud-controller-manager.

Cloud Provider Specifics

# AWS: Creates an NLB by default (used to be CLB)
# Annotations control NLB behavior:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:..."

# GCP: Creates a Network Load Balancer (L4)
# For L7, use GKE Ingress with BackendConfig instead

# Azure: Creates an Azure Load Balancer
# Annotations control SKU and internal/external
service.beta.kubernetes.io/azure-load-balancer-internal: "true"

WARNING

Each LoadBalancer Service creates a separate cloud load balancer. If you have 20 services of type LoadBalancer, you have 20 cloud load balancers: each with its own IP, its own cost, and its own management overhead. This gets expensive fast. Use an Ingress Controller with a single LoadBalancer Service instead, and route multiple services through path-based or host-based rules.

Part 4: ExternalName and Headless Services

ExternalName: DNS Alias

ExternalName is the simplest Service type. It creates a DNS CNAME record. No proxying, no load balancing, no iptables rules.

apiVersion: v1
kind: Service
metadata:
  name: database
spec:
  type: ExternalName
  externalName: mydb.abc123.us-east-1.rds.amazonaws.com

# Inside any pod:
dig database.default.svc.cluster.local
# Returns: CNAME mydb.abc123.us-east-1.rds.amazonaws.com

# Your app connects to "database:5432" and it resolves to the RDS endpoint.
# If you migrate databases, change the ExternalName — no app changes needed.

PRO TIP

ExternalName is useful for giving cluster-internal DNS names to external services (RDS databases, SaaS APIs, services in other clusters). It lets your application code reference database.default.svc.cluster.local instead of a cloud-specific hostname. If you migrate from AWS RDS to a self-hosted database inside the cluster, you just change the Service type from ExternalName to ClusterIP, no application changes.

Headless Service (clusterIP: None)

A headless Service has no ClusterIP. Instead of returning a virtual IP, DNS returns the IP addresses of all pods directly.

apiVersion: v1
kind: Service
metadata:
  name: database
spec:
  clusterIP: None          # Headless — no virtual IP
  selector:
    app: database
  ports:
  - port: 5432

# Normal Service DNS:
dig api-service.default.svc.cluster.local
# Returns: 10.96.45.123 (ClusterIP — virtual IP)

# Headless Service DNS:
dig database.default.svc.cluster.local
# Returns: 10.244.1.5, 10.244.2.8, 10.244.3.2 (actual pod IPs!)

When to use headless Services:

StatefulSets: each pod needs a stable DNS name (database-0.database.default.svc.cluster.local)
Client-side load balancing: the client (or client library) picks which pod to connect to
Service discovery: the client needs to know all pod IPs (e.g., for gossip protocols, Elasticsearch cluster formation)

Kubernetes Service Types, Layered

LoadBalancer

Everything below, PLUS creates a cloud load balancer with an external IP/hostname. Cloud controller manager provisions the LB and registers nodes as targets. Each LoadBalancer Service creates a separate cloud LB.

NodePort

Everything below, PLUS opens a port (30000-32767) on every node in the cluster. External traffic reaches NodeIP:NodePort and gets forwarded to a pod via iptables/IPVS. externalTrafficPolicy controls routing and source IP preservation.

ClusterIP

The foundation. Assigns a virtual IP from the Service CIDR. kube-proxy programs iptables/IPVS/eBPF rules on every node to DNAT traffic from ClusterIP:port to a randomly selected PodIP:targetPort. Only reachable from inside the cluster.

Endpoints

The Endpoints controller watches pods matching the Service selector and populates the Endpoints object with their IPs and ports. This is the source of truth for which pods receive traffic. No matching pods = empty Endpoints = Service routes nowhere.

Pods

The actual application containers. Each pod has a unique IP assigned by the CNI. Pods are ephemeral, their IPs change on restart. This is why Services exist: to provide a stable IP in front of changing pod IPs.

Hover to expand each layer

Part 5: Debugging Service Connectivity

When a Service is not routing traffic, work through this checklist:

# Step 1: Does the Service exist?
kubectl get svc api-service -n production
# Check: TYPE, CLUSTER-IP, PORT(S), EXTERNAL-IP

# Step 2: Do Endpoints exist?
kubectl get endpoints api-service -n production
# If empty: no pods match the selector, or pods are not Ready
# CRITICAL: selector labels must EXACTLY match pod labels

# Step 3: Does DNS resolve?
kubectl run debug --image=nicolaka/netshoot -it --rm -- \
  dig api-service.production.svc.cluster.local
# Should return the ClusterIP

# Step 4: Can you reach the ClusterIP from inside the cluster?
kubectl run debug --image=nicolaka/netshoot -it --rm -- \
  curl -v http://api-service.production.svc.cluster.local:80
# If timeout: check NetworkPolicy, kube-proxy logs, iptables rules

# Step 5: Can you reach the pod directly (bypassing the Service)?
kubectl run debug --image=nicolaka/netshoot -it --rm -- \
  curl -v http://10.244.1.5:8080
# If this works but Step 4 fails: the issue is in kube-proxy/iptables

# Step 6: Check kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy | tail -50

# Step 7: Check iptables rules on the node (if iptables mode)
# SSH to the node, then:
iptables -t nat -L KUBE-SERVICES | grep api-service

WAR STORY

A developer created a Service with selector: {app: api} but their pods had label app: api-server. The Service had zero Endpoints. Traffic went nowhere. They spent 2 hours checking NetworkPolicies, DNS, and firewall rules. The fix was changing one label. Always start debugging with kubectl get endpoints: if it is empty, the problem is in your selector, not the network.

PRO TIP

Kubernetes 1.21+ introduced EndpointSlices as a more scalable replacement for Endpoints. If your Service has hundreds of pods, check kubectl get endpointslices instead of kubectl get endpoints. EndpointSlices break the endpoint list into smaller chunks (default 100 per slice) to reduce API server load when pods change frequently.

Key Concepts Summary

ClusterIP is the foundation: a virtual IP that exists only in iptables/IPVS/eBPF rules, reachable only from inside the cluster
kube-proxy iptables mode uses random selection, O(n) rule evaluation, fine for small clusters, degrades at scale
kube-proxy IPVS mode provides real load balancing algorithms and O(1) lookups, switch to it for clusters with more than 1,000 Services
eBPF (Cilium) replaces kube-proxy entirely with socket-level redirection, most performant option
NodePort opens a port on every node (30000-32767), all nodes route traffic, not just nodes with pods
externalTrafficPolicy: Local preserves client source IPs but requires even pod distribution across nodes
LoadBalancer creates a cloud load balancer automatically, each Service creates a separate LB, which gets expensive
ExternalName is a DNS CNAME alias: no proxying, useful for referencing external services with cluster-internal DNS
Headless Services (clusterIP: None) return pod IPs directly in DNS, required for StatefulSets and client-side load balancing
Empty Endpoints is the number one cause of "Service not routing", always check selector labels match pod labels

Common Mistakes

Assuming you can ping a ClusterIP: it is a virtual IP in iptables rules, not a real network interface, and ping (ICMP) is not translated by DNAT rules
Using externalTrafficPolicy: Local without pod anti-affinity, leads to severely uneven load distribution
Creating many LoadBalancer Services instead of one Ingress Controller, wastes cloud load balancers and money
Forgetting that NodePort range is 30000-32767, specifying a port outside this range will be rejected
Mismatched selector labels between Service and pods, the most common cause of empty Endpoints
Not checking EndpointSlices in large clusters, kubectl get endpoints may appear empty when EndpointSlices actually exist
Leaving externalTrafficPolicy: Cluster when source IP preservation is needed, all requests appear to come from internal node IPs

KNOWLEDGE CHECK

You have a NodePort Service with externalTrafficPolicy: Local. Node A has 2 pods, Node B has 0 pods. What happens when traffic arrives at Node B on the NodePort?

Reverse Proxies & Kubernetes Ingress

Continue

The Systematic Troubleshooting Approach

←→ navigateM toggle sidebar