Networking Fundamentals for Engineers

L4 vs L7 Load Balancing

You have an AWS ALB routing traffic to 10 pods behind a Kubernetes Service. Monitoring shows one pod is healthy, it passes the TCP health check every 10 seconds. But your users are reporting intermittent 500 errors.

You check the pod logs. Sure enough, that pod is returning HTTP 500 for every single request. The application crashed internally but the process is still running and accepting TCP connections.

The load balancer keeps sending traffic to it. Why? Because your health check operates at Layer 4. It can establish a TCP connection. It has no idea what HTTP status code the application returns. This is the fundamental difference between L4 and L7 load balancing, and understanding it will save you hours of debugging.

Layer 4 load balancing operates at the transport layer of the OSI model. It sees exactly four things about every connection:

Source IP address
Destination IP address
Source port
Destination port

That is it. An L4 load balancer never opens the TCP payload. It never reads HTTP headers, URL paths, cookies, or request bodies. It makes routing decisions based purely on IP addresses and port numbers, then forwards raw TCP (or UDP) packets to a backend.

# What an L4 load balancer "sees" for each connection:
# Source:      192.168.1.50:49832
# Destination: 10.0.0.100:443
# Protocol:    TCP
#
# That is ALL the information it has to make a routing decision.
# It cannot see: GET /api/users HTTP/1.1
# It cannot see: Host: api.example.com
# It cannot see: Cookie: session=abc123

KEY CONCEPT

An L4 load balancer is a packet forwarder. It receives a TCP SYN, picks a backend using its algorithm, rewrites the destination IP/port, and forwards the packet. It never terminates the TCP connection, the client talks directly to the backend through the load balancer. This is why L4 is fast: there is no protocol parsing overhead.

L4 Load Balancing Algorithms

Since L4 load balancers cannot see application-level data, their routing algorithms are simple:

Round Robin, send each new connection to the next backend in sequence. Backend 1, then 2, then 3, then back to 1. Simple and fair when all backends are equal.

Least Connections, send each new connection to the backend with the fewest active connections. Better than round robin when request durations vary (one slow request does not cause pile-up on a single backend).

Source IP Hash, hash the client IP and always send that client to the same backend. Provides primitive session affinity without cookies. Breaks when clients share an IP (corporate NAT, mobile carriers).

Weighted Round Robin, same as round robin but some backends get more connections. Useful when backends have different capacity (e.g., 8-core and 16-core nodes).

# Example: iptables-based L4 load balancing (what kube-proxy does)
# Three pods behind a ClusterIP Service:
#   Pod A: 10.244.1.5:8080
#   Pod B: 10.244.2.8:8080
#   Pod C: 10.244.3.2:8080

# kube-proxy creates these iptables rules (simplified):
iptables -t nat -A KUBE-SERVICES \
  -d 10.96.0.100/32 -p tcp --dport 80 \
  -j KUBE-SVC-XXXX

# Each pod gets a probability-based chain:
iptables -t nat -A KUBE-SVC-XXXX \
  -m statistic --mode random --probability 0.333 \
  -j KUBE-SEP-POD-A    # DNAT to 10.244.1.5:8080

iptables -t nat -A KUBE-SVC-XXXX \
  -m statistic --mode random --probability 0.500 \
  -j KUBE-SEP-POD-B    # DNAT to 10.244.2.8:8080

iptables -t nat -A KUBE-SVC-XXXX \
  -j KUBE-SEP-POD-C    # DNAT to 10.244.3.2:8080

PRO TIP

The iptables probability math looks wrong but it is correct. The first rule fires 33.3% of the time. Of the remaining 66.7%, the second rule fires 50% of that (which is 33.3% of total). The third rule catches everything else (33.3%). This gives equal distribution across three pods. Add a fourth pod and the probabilities become 0.25, 0.333, 0.5, and fallthrough.

Real-World L4 Load Balancers

Load Balancer	Where You See It	Key Characteristics
AWS NLB	Cloud infrastructure	Millions of requests/sec, static IP, TLS passthrough
kube-proxy (iptables)	Every K8s cluster	Default Service implementation, random selection
kube-proxy (IPVS)	Large K8s clusters	Real algorithms (rr, lc, dh, sh), better at scale
MetalLB	Bare-metal K8s	Announces Service IPs via ARP/BGP
HAProxy (TCP mode)	Traditional infrastructure	Feature-rich, health checks, connection draining

Part 2: Layer 7 Load Balancing: Slower, Smarter, Content-Aware

Layer 7 load balancing operates at the application layer. It terminates the client TCP connection, parses the HTTP request (or gRPC, WebSocket, etc.), and makes routing decisions based on the full request content.

An L7 load balancer sees everything:

# What an L7 load balancer "sees":
# Everything L4 sees, PLUS:
#
# HTTP Method:  GET
# URL Path:     /api/v2/users/12345
# Host Header:  api.example.com
# Headers:      Authorization: Bearer eyJhbG...
#               Content-Type: application/json
#               X-Request-ID: abc-123
# Cookies:      session=xyz789; region=us-east
# Query Params: ?include=profile&limit=50
# Request Body: {"name": "updated-name"}

This visibility enables powerful routing strategies that L4 simply cannot do.

L4 vs L7 Load Balancing

Layer 4 (Transport)

Fast, simple, protocol-agnostic

SeesIP addresses + ports only

RoutingRound robin, least connections, IP hash

TLSPassthrough (does not terminate)

Health checksTCP SYN, can the port accept connections?

Latency addedMicroseconds (packet forwarding)

CPU costVery low (no parsing)

Use caseDatabase traffic, raw TCP, high-throughput

K8s examplekube-proxy, Service type LoadBalancer (NLB)

Layer 7 (Application)

Smart, content-aware, protocol-specific

SeesFull HTTP request (path, headers, cookies, body)

RoutingPath-based, header-based, weighted, canary

TLSTerminates TLS, re-encrypts or passes plaintext to backend

Health checksHTTP GET /health, checks status code and body

Latency addedMilliseconds (connection termination + parsing)

CPU costHigher (TLS termination, HTTP parsing)

Use caseHTTP APIs, web apps, microservices routing

K8s exampleIngress Controller (NGINX, Envoy, Traefik)

L7 Routing Strategies

Path-based routing, route by URL path. /api/* goes to the API service, /static/* goes to the CDN origin, /admin/* goes to the admin service.

# Kubernetes Ingress with path-based routing
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80

Header-based routing, route by HTTP header. Useful for canary deployments: if X-Canary: true, route to the canary backend.

Weighted routing, split traffic by percentage. Send 95% to v1 and 5% to v2. Gradually shift weight during rollouts.

Cookie-based sticky sessions, always route a user to the same backend based on a session cookie. Necessary for stateful applications that store session data in memory (though you should fix the app instead).

WARNING

Sticky sessions are a code smell. They mean your application stores state in local memory instead of an external store (Redis, database). When the sticky backend crashes, the user loses their session. Design stateless services and store session data externally. Use sticky sessions only as a temporary workaround while you fix the architecture.

Real-World L7 Load Balancers

Load Balancer	Where You See It	Key Characteristics
AWS ALB	Cloud infrastructure	Path/header routing, WebSocket support, WAF integration
NGINX Ingress	K8s clusters	Most popular Ingress controller, annotation-driven config
Envoy	Service meshes (Istio)	xDS dynamic config, gRPC-native, advanced observability
Traefik	K8s clusters	Auto-discovery, built-in Let's Encrypt, simpler config
HAProxy (HTTP mode)	Traditional infrastructure	Very fast, rich ACL system, battle-tested

Part 3: Health Checks: Where L4 and L7 Diverge Most

This is where the lesson from the opening scenario becomes concrete. Health checks determine whether a backend receives traffic. The layer at which you check determines what failures you can detect.

L4 Health Checks (TCP)

An L4 health check sends a TCP SYN packet to the backend port. If it gets a SYN-ACK back (meaning the port is open and accepting connections), the backend is "healthy."

# What an L4 health check does (equivalent):
nc -zv 10.244.1.5 8080
# Connection to 10.244.1.5 8080 port [tcp/*] succeeded!
# Result: HEALTHY

# But the app behind that port might be:
# - Returning HTTP 500 for every request
# - Deadlocked and not processing any requests
# - Connected to a dead database and timing out
# - Serving stale data from a broken cache
#
# The L4 health check cannot detect ANY of these conditions.

WAR STORY

We had a Java service that hit a deadlock in its connection pool. The JVM was still running, the port was still open, TCP health checks passed perfectly. But every HTTP request hung for 30 seconds and then timed out. The L4 load balancer kept sending traffic to it for 45 minutes before someone noticed. Switching to an HTTP health check on /health that verified database connectivity would have caught it in seconds.

L7 Health Checks (HTTP)

An L7 health check sends a real HTTP request (usually GET /health or GET /healthz) and checks the response:

# What an L7 health check does (equivalent):
curl -s -o /dev/null -w "%{http_code}" http://10.244.1.5:8080/health
# 200
# Result: HEALTHY

curl -s -o /dev/null -w "%{http_code}" http://10.244.1.5:8080/health
# 500
# Result: UNHEALTHY — remove from rotation

# A well-designed /health endpoint checks:
# - Can the app connect to its database?
# - Can it reach required downstream services?
# - Is it ready to serve traffic (not still warming up)?

KEY CONCEPT

Always use L7 (HTTP) health checks for HTTP services. L4 (TCP) health checks only verify that the process is running and the port is open. L7 health checks verify that the application is actually functional. The small overhead of an HTTP health check (typically 1-5ms) is negligible compared to the cost of routing traffic to a broken backend.

Health Check Configuration in Kubernetes

Kubernetes has three probe types that operate at the application level:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    ports:
    - containerPort: 8080
    # Liveness: is the container still running?
    # Failure → restart the container
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 10
      failureThreshold: 3

    # Readiness: is the container ready for traffic?
    # Failure → remove from Service endpoints (stop sending traffic)
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      periodSeconds: 5
      failureThreshold: 2

    # Startup: is the container still starting up?
    # Failure → restart the container (gives slow apps time to start)
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      periodSeconds: 5
      failureThreshold: 30  # 30 * 5s = 150s to start

PRO TIP

Separate your liveness and readiness endpoints. The liveness probe answers "is this process broken beyond repair?" (restart it). The readiness probe answers "can this instance handle requests right now?" (stop sending traffic). A pod that is overloaded should fail readiness (remove from rotation) but NOT fail liveness (do not restart it, that makes things worse).

Part 4: Connection Draining: The Graceful Shutdown Problem

When a pod is being terminated (scaling down, rolling update, node drain), what happens to active connections? If the load balancer immediately stops sending traffic AND drops existing connections, users see errors.

Connection draining (also called "graceful shutdown" or "deregistration delay") solves this:

Mark the backend as "draining": stop sending NEW connections
Allow EXISTING connections to complete (up to a timeout)
After the timeout, forcefully close remaining connections
Remove the backend entirely

# AWS ALB deregistration delay (Terraform)
resource "aws_lb_target_group" "app" {
  deregistration_delay = 30  # seconds to wait for in-flight requests
}

# Kubernetes: Pod termination grace period
apiVersion: v1
kind: Pod
spec:
  terminationGracePeriodSeconds: 30  # default, matches ALB
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["sh", "-c", "sleep 5"]
          # Give kube-proxy time to update iptables rules
          # BEFORE the app starts shutting down

WARNING

There is a race condition in Kubernetes pod termination. When a pod is deleted, two things happen simultaneously: (1) the kubelet sends SIGTERM to the container, and (2) the Endpoints controller removes the pod from the Service. If the app shuts down before iptables rules are updated, in-flight requests from other pods hit a closed port. The preStop sleep (5-10 seconds) gives kube-proxy time to propagate the change before the app exits.

Pod Termination and Connection Draining

Click each step to explore

Part 5: When to Use L4 vs L7

The decision is not always obvious. Here is a practical framework:

Use L4 load balancing when:

You need maximum throughput with minimum latency (every microsecond counts)
The traffic is not HTTP (databases, message queues, custom TCP protocols, gRPC without path routing)
You want TLS passthrough (the backend handles TLS, not the load balancer)
You are load balancing across Availability Zones at the network level

Use L7 load balancing when:

You need path-based or header-based routing (microservices behind one domain)
You want the load balancer to terminate TLS (centralized certificate management)
You need HTTP-level health checks (verify application health, not just port)
You want advanced traffic management (canary deployments, A/B testing, rate limiting)
You need request-level observability (HTTP status codes, latency per path)

In Kubernetes, you typically use both:

L4 + L7 in a Typical Kubernetes Setup

Client

AWS NLB (L4)

NGINX Ingress Controller (L7)

api-service (ClusterIP)

web-service (ClusterIP)

api-pod-1

api-pod-2

web-pod-1

web-pod-2

Hover components for details

The NLB (L4) handles raw TCP traffic from the internet, distributes it across Ingress Controller pods, and provides a static IP. The Ingress Controller (L7) terminates TLS, inspects HTTP requests, and routes /api/* to the API service and / to the web service. Inside the cluster, kube-proxy (L4) distributes traffic from the ClusterIP to individual pods.

PRO TIP

If you only have one backend service (no path routing needed), you can skip the Ingress Controller entirely and use a Service type LoadBalancer with an NLB. This removes a hop, reduces latency, and simplifies your stack. Only add an Ingress Controller when you actually need L7 routing.

Key Concepts Summary

L4 load balancing sees only IPs and ports, it forwards TCP/UDP packets without inspecting content, adding microseconds of latency
L7 load balancing terminates connections and inspects HTTP: it can route by path, header, and cookie, adding milliseconds of latency
L4 health checks (TCP SYN) only verify the port is open, a crashed application that still accepts TCP connections will pass
L7 health checks (HTTP GET) verify the application responds correctly, always use these for HTTP services
Connection draining prevents dropped requests during pod termination, configure preStop sleep and terminationGracePeriodSeconds
Kubernetes uses both layers: NLB (L4) at the edge, Ingress Controller (L7) for routing, kube-proxy (L4) inside the cluster
The preStop race condition is real, without a sleep, pods can receive traffic after they start shutting down

Common Mistakes

Using L4 (TCP) health checks for HTTP services, the load balancer cannot detect application-level failures like 500 errors or deadlocks
Forgetting the preStop sleep hook, leads to dropped connections during rolling updates because iptables rules update after the app starts shutting down
Setting terminationGracePeriodSeconds too low, long-running requests get SIGKILL before they complete
Using sticky sessions as a permanent solution instead of fixing stateful application design
Assuming round robin is "fair": if one backend is slower, it accumulates connections and becomes a bottleneck (use least connections instead)
Skipping the Ingress Controller when you need path-based routing, trying to do L7 routing with multiple LoadBalancer Services wastes cloud load balancers and IPs

KNOWLEDGE CHECK

Your HTTP service passes its TCP health check but returns HTTP 503 for every request. What type of load balancer health check would catch this failure?

HTTP/1.1 vs HTTP/2 vs HTTP/3

Continue

Reverse Proxies & Kubernetes Ingress

←→ navigateM toggle sidebar

L4 vs L7 Load Balancing

Part 1: Layer 4 Load Balancing: Fast, Simple, Blind

L4 Load Balancing Algorithms

Real-World L4 Load Balancers

Part 2: Layer 7 Load Balancing: Slower, Smarter, Content-Aware

L4 vs L7 Load Balancing

Layer 4 (Transport)

Layer 7 (Application)

L7 Routing Strategies

Real-World L7 Load Balancers

Part 3: Health Checks: Where L4 and L7 Diverge Most

L4 Health Checks (TCP)

L7 Health Checks (HTTP)

Health Check Configuration in Kubernetes

Part 4: Connection Draining: The Graceful Shutdown Problem

Pod Termination and Connection Draining

Part 5: When to Use L4 vs L7

L4 + L7 in a Typical Kubernetes Setup

Key Concepts Summary

Common Mistakes