Networking Fundamentals for Engineers

HTTP/1.1 vs HTTP/2 vs HTTP/3

You migrate your Kubernetes ingress from HTTP/1.1 to HTTP/2 for backend connections. Page load times drop by 40%. No code changes. No infrastructure changes. Just a protocol upgrade.

How can changing the wire protocol make that much difference? The answer lies in how each HTTP version manages TCP connections, handles concurrent requests, and deals with packet loss. This lesson explains the evolution from HTTP/1.1 through HTTP/3 and gives you practical guidance on configuring each in Kubernetes.

Part 1: HTTP/1.1: The Workhorse (and Its Limitations)

HTTP/1.1 (RFC 2616, later refined in RFC 7230-7235) has been the backbone of the web since 1997. It is text-based, human-readable, and well-understood. It is also fundamentally limited by how it uses TCP connections.

One Request Per Connection (Effectively)

In HTTP/1.1, each TCP connection handles requests sequentially. The client sends a request, waits for the complete response, then sends the next request. This is called head-of-line (HOL) blocking at the application layer.

HTTP/1.1 — Sequential requests on one connection:

Connection 1: [GET /index.html] → [response] → [GET /style.css] → [response] → [GET /app.js] → [response]
                                                                                                   ^
                                                                          Total time: 3 × round trip

HTTP/1.1 technically supports pipelining, sending multiple requests without waiting for responses. In practice, pipelining is broken:

Responses must come back in order (HOL blocking still applies)
Many proxies and servers do not support it
Browsers disabled it years ago

The Browser Workaround: Multiple Connections

Browsers work around HTTP/1.1's limitations by opening 6 TCP connections per domain. This creates some concurrency but has costs:

HTTP/1.1 — 6 parallel connections to same server:

Connection 1: [GET /index.html] → [response]
Connection 2: [GET /style.css]  → [response]
Connection 3: [GET /app.js]     → [response]
Connection 4: [GET /logo.png]   → [response]
Connection 5: [GET /font.woff]  → [response]
Connection 6: [GET /data.json]  → [response]
                                   ^
                          All 6 complete in ~1 round trip

But 6 connections means:

6 TCP handshakes (6 RTTs of setup latency)
6 TLS handshakes (6 more RTTs for HTTPS)
6x the memory for TCP buffers and connection state
6 slow-start ramp-ups for congestion control (each connection starts slow)

If you need more than 6 concurrent requests, requests 7+ queue and wait.

KEY CONCEPT

HTTP/1.1's fundamental limitation is one-request-at-a-time per TCP connection. Every optimization technique from the HTTP/1.1 era: domain sharding, CSS sprites, JS bundling, inline images, exists to work around this limitation. HTTP/2 eliminates the problem entirely by multiplexing many requests over a single connection.

Keep-Alive: Reusing Connections

HTTP/1.1 defaults to persistent connections (keep-alive). After a request/response completes, the TCP connection stays open for the next request. This avoids the cost of a new TCP+TLS handshake for each request.

# Check if keep-alive is working
curl -v https://api.example.com/users
# < Connection: keep-alive
# < Keep-Alive: timeout=60, max=100

# The connection stays open for 60 seconds or 100 requests

# NGINX Ingress upstream keepalive configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    # Keep upstream connections alive to backend pods
    nginx.ingress.kubernetes.io/upstream-keepalive-connections: "32"
    nginx.ingress.kubernetes.io/upstream-keepalive-timeout: "60"

WARNING

If your NGINX Ingress is not configured with upstream keep-alive, it creates a new TCP connection to the backend for every single request. This wastes time on handshakes and causes TIME_WAIT accumulation on the ingress controller. Set upstream-keepalive-connections to at least 32 for busy services. The performance impact can be dramatic, we have seen latency drop 30% just from enabling upstream keep-alive.

Part 2: HTTP/2: Multiplexing Changes Everything

HTTP/2 (RFC 7540, updated by RFC 9113) was published in 2015. Its headline feature is multiplexing: many requests and responses flowing simultaneously over a single TCP connection.

How Multiplexing Works

HTTP/2 introduces the concept of streams. Each request/response pair is a stream, identified by a numeric stream ID. Multiple streams share one TCP connection, and frames from different streams are interleaved:

HTTP/2 — Multiplexed requests on ONE connection:

Connection 1:
  Stream 1: [GET /index.html headers] →
  Stream 3: [GET /style.css headers]  →
  Stream 5: [GET /app.js headers]     →
  Stream 1: ← [response headers] ← [response body chunk 1]
  Stream 3: ← [response headers]
  Stream 5: ← [response headers] ← [response body chunk 1]
  Stream 1: ← [response body chunk 2]
  Stream 3: ← [response body chunk 1] ← [response body chunk 2]
  ...

No head-of-line blocking at the HTTP layer. If the response for stream 3 is slow (database query), streams 1 and 5 continue receiving data without waiting.

HTTP/1.1 vs HTTP/2 Connection Model

HTTP/1.1

6 TCP connections, 1 request each

Connections needed6 per domain (browser limit)

Concurrency6 parallel requests max

Head-of-line blockingYes, one slow response blocks that connection

Header formatText, repeated on every request (500+ bytes each)

Server pushNot available

TCP handshakes6 (one per connection)

TLS handshakes6 (one per connection)

Congestion control6 separate slow starts

HTTP/2

1 TCP connection, unlimited streams

Connections needed1 per domain

Concurrency100+ parallel streams (configurable)

Head-of-line blockingNo at HTTP level (yes at TCP level)

Header formatBinary, HPACK compressed (90%+ smaller)

Server pushAvailable (server can preemptively send resources)

TCP handshakes1

TLS handshakes1

Congestion control1 slow start, reaches full speed faster

HPACK Header Compression

HTTP/1.1 sends headers as plain text on every request. A typical request has 500-800 bytes of headers. If you make 100 requests, that is 50-80 KB of repeated header text.

HTTP/2 uses HPACK compression:

A shared header table between client and server
Previously sent headers are referenced by index (1 byte instead of 50+)
New headers are Huffman-encoded
Result: 85-95% reduction in header size after the first request

# HTTP/1.1 headers (repeated on every request):
# GET /api/users HTTP/1.1
# Host: api.example.com
# Authorization: Bearer eyJhbGciOiJSUzI1NiIs....(800 bytes)
# Accept: application/json
# User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)...
# Cookie: session=abc123; preferences=dark-mode;....
# Accept-Encoding: gzip, deflate, br
# Accept-Language: en-US,en;q=0.9
# ^^^ ~1200 bytes, sent identically on EVERY request

# HTTP/2: first request sends full headers, subsequent requests
# reference them by index → ~20-50 bytes for the same headers

PRO TIP

HPACK compression is especially impactful for API-heavy applications where every request carries a large Authorization header (JWTs can be 800+ bytes). With HTTP/1.1, a page that makes 50 API calls sends 40 KB of redundant header data. With HTTP/2, those headers are sent once and referenced by index for the remaining 49 requests.

HTTP/2 in Kubernetes

Most Kubernetes ingress controllers support HTTP/2 on the client-facing side by default (when TLS is configured). The tricky part is HTTP/2 between the ingress and your backend pods.

# NGINX Ingress: enable HTTP/2 to backend
# By default, NGINX Ingress uses HTTP/1.1 to backends
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    # For gRPC backends (which require HTTP/2)
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    # or for plain HTTP/2 backends
    nginx.ingress.kubernetes.io/backend-protocol: "H2C"

WARNING

NGINX Ingress uses HTTP/1.1 to communicate with backend pods by default, even if the client connects via HTTP/2. This means you lose multiplexing on the backend leg. For gRPC services (which require HTTP/2), you MUST set backend-protocol: GRPC or H2C (HTTP/2 Cleartext). Without this, gRPC traffic will fail with mysterious errors because HTTP/1.1 cannot carry HTTP/2 frames.

Part 3: The TCP Head-of-Line Blocking Problem

HTTP/2 solved head-of-line blocking at the HTTP layer. But it introduced a new problem: TCP-level head-of-line blocking.

All HTTP/2 streams share one TCP connection. TCP guarantees ordered delivery. If one TCP packet is lost (even if it belongs to stream 3), TCP holds up ALL packets behind it until the lost packet is retransmitted: including packets for streams 1, 5, 7, and 9 that have nothing to do with stream 3.

HTTP/2 TCP-level HOL blocking:

TCP packet sequence: [S1][S3][S5][S1][S3][S5][S1]
                            ↑
                       This packet lost

TCP stalls ALL streams while retransmitting:
  Stream 1: has data ready → WAITING for TCP retransmit
  Stream 3: waiting for its own retransmit → BLOCKED
  Stream 5: has data ready → WAITING for TCP retransmit

On a reliable network (wired datacenter), this rarely matters, packet loss is below 0.01%. On a lossy network (mobile, WiFi), this can make HTTP/2 slower than HTTP/1.1. With 6 separate HTTP/1.1 connections, a lost packet on connection 3 only blocks connection 3. The other 5 continue happily.

KEY CONCEPT

HTTP/2's TCP-level head-of-line blocking is the entire motivation for HTTP/3 and QUIC. If your users are on reliable networks (internal services, datacenter-to-datacenter), HTTP/2 is excellent. If your users are on lossy networks (mobile apps, global users on WiFi), HTTP/3 provides a measurable improvement because it eliminates this last form of HOL blocking.

Part 4: HTTP/3: QUIC and the End of HOL Blocking

HTTP/3 (RFC 9114) replaces TCP with QUIC (RFC 9000) as the transport layer. QUIC runs over UDP but is not "raw UDP": it implements reliability, ordering, congestion control, and encryption natively.

What QUIC Changes

Feature	TCP (HTTP/2)	QUIC (HTTP/3)
Transport	Kernel TCP stack	Userspace QUIC over UDP
Connection setup	TCP + TLS = 2-3 RTT	Combined = 1 RTT (0-RTT on resume)
Stream independence	No (TCP sees one byte stream)	Yes (each stream has independent loss recovery)
HOL blocking	TCP-level blocking on packet loss	Only the affected stream is blocked
Connection migration	Impossible (tied to 4-tuple)	Survives IP changes (uses Connection ID)
Encryption	Optional (TLS layered on top)	Mandatory (TLS 1.3 built-in)

0-RTT Connection Resumption

QUIC supports resuming a previous connection with zero round trips:

First connection:
  Client → Server: QUIC Initial (includes TLS ClientHello)
  Server → Client: QUIC Handshake (includes TLS ServerHello + cert)
  Client → Server: QUIC Handshake Complete + first HTTP request
  ^^^ 1 RTT total (vs 2-3 RTT for TCP + TLS 1.2)

Resumed connection:
  Client → Server: QUIC 0-RTT (includes TLS resumption + first HTTP request)
  Server → Client: QUIC response
  ^^^ 0 RTT — data sent immediately

PRO TIP

0-RTT is incredibly powerful for mobile applications and global services. A user in Tokyo connecting to a server in Virginia (150ms RTT) saves 300-450ms on connection setup with QUIC compared to TCP + TLS 1.2. For a page that loads 20 resources, this adds up quickly. However, 0-RTT has a replay attack risk, only safe for idempotent requests (GET). Non-idempotent requests should always use 1-RTT.

Connection Migration

TCP connections are identified by the 4-tuple: source IP, source port, destination IP, destination port. If any of these change (phone switches from WiFi to cellular), the TCP connection breaks. The client must establish a new connection with a new handshake.

QUIC uses a Connection ID instead. When the client's IP changes, it sends packets with the same Connection ID from the new IP. The server recognizes the connection and continues without interruption.

TCP: Phone switches WiFi → 4G
  Old: 192.168.1.50:52431 → 203.0.113.50:443 (connection dies)
  New: 10.0.0.1:48222 → 203.0.113.50:443 (new connection, new handshake)

QUIC: Phone switches WiFi → 4G
  Old: 192.168.1.50:52431 → 203.0.113.50:443 [ConnID: abc123]
  New: 10.0.0.1:48222 → 203.0.113.50:443 [ConnID: abc123] (same connection!)

WAR STORY

A mobile banking app had a persistent complaint: users on trains would get logged out constantly. The train passed through areas where the phone switched between cell towers, changing IP addresses. Each IP change killed the TCP connection, expired the session cookie (the server-side session was tied to the client IP for security), and forced re-authentication. Migrating the API to HTTP/3 with QUIC eliminated the connection drops. They also had to remove the IP-binding from their session logic, a security vs usability trade-off they debated for weeks.

Part 5: gRPC: Built on HTTP/2

gRPC deserves special mention because it is the dominant RPC framework in Kubernetes microservice architectures, and it is built directly on HTTP/2.

Why gRPC Chose HTTP/2

gRPC needs:

Multiplexing: many RPCs over one connection (HTTP/2 streams)
Bidirectional streaming: both client and server send data simultaneously (HTTP/2 supports this)
Header compression: metadata on every RPC (HPACK reduces overhead)
Binary framing: protobuf messages are binary, HTTP/2 frames are binary (no base64 encoding needed)

# gRPC over HTTP/2 — the wire format
# Each gRPC call maps to one HTTP/2 stream:
#
# Request:
#   :method: POST
#   :path: /mypackage.MyService/MyMethod
#   content-type: application/grpc
#   grpc-encoding: gzip
#   [binary protobuf body]
#
# Response:
#   :status: 200
#   content-type: application/grpc
#   [binary protobuf body]
#   grpc-status: 0
#   grpc-message: OK

Configuring gRPC in Kubernetes

# NGINX Ingress for gRPC
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    # Increase timeout for streaming RPCs
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
  tls:
    - hosts:
        - grpc.example.com
      secretName: grpc-tls
  rules:
    - host: grpc.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: grpc-service
                port:
                  number: 50051

KEY CONCEPT

gRPC in Kubernetes requires three things: (1) HTTP/2 between the ingress and backend (backend-protocol: GRPC), (2) TLS on the ingress (most gRPC clients expect TLS), and (3) L7 load balancing for request-level distribution. Without L7 balancing, all gRPC requests on a single connection go to one pod, negating any horizontal scaling benefits.

Part 6: Choosing the Right HTTP Version

Decision Matrix

Scenario	Recommended	Why
Internal service-to-service (K8s)	HTTP/2 (h2c)	Multiplexing, header compression, gRPC support
Public web traffic (browsers)	HTTP/2 + HTTP/3	Negotiate the best available, fallback gracefully
gRPC services	HTTP/2 (required)	gRPC is built on HTTP/2, no alternative
Legacy systems	HTTP/1.1	Some backends cannot speak HTTP/2
Mobile apps (global users)	HTTP/3 (QUIC)	Connection migration, 0-RTT, no HOL blocking on lossy networks

Configuring HTTP/2 on NGINX Ingress

# Enable HTTP/2 to clients (usually enabled by default with TLS)
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  use-http2: "true"                    # Client-facing HTTP/2 (default: true)
  upstream-keepalive-connections: "32"  # Keep HTTP/1.1 connections to backend alive

Verifying HTTP Version

# Check what HTTP version your server negotiated
curl -v --http2 https://api.example.com/users 2>&1 | grep "using HTTP"
# * using HTTP/2

# Force HTTP/1.1 to compare
curl -v --http1.1 https://api.example.com/users 2>&1 | grep "using HTTP"
# * using HTTP/1.1

# Check HTTP/3 support (requires curl with HTTP/3 built in)
curl -v --http3 https://api.example.com/users 2>&1 | grep "using HTTP"
# * using HTTP/3

# See ALPN negotiation in TLS handshake
curl -v https://api.example.com/ 2>&1 | grep ALPN
# * ALPN: offers h2,http/1.1
# * ALPN: server accepted h2

HTTP Version Evolution

Click each step to explore

PRO TIP

For most Kubernetes deployments, enabling HTTP/2 between ingress and backends is the highest-impact change you can make. It requires minimal configuration (one annotation), no code changes, and the benefits: multiplexing, header compression, reduced connection overhead, are immediate. HTTP/3 is the future but requires more infrastructure support (UDP load balancers, QUIC-capable ingress controllers). Start with HTTP/2 today.

Part 7: Performance Comparison: Real Numbers

To make this concrete, here is what the protocol differences look like in practice for a page that loads 50 resources:

Metric	HTTP/1.1 (6 conns)	HTTP/2 (1 conn)	HTTP/3 (QUIC)
TCP handshakes	6	1	0 (UDP)
TLS handshakes	6	1	1 (combined, 0 on resume)
Setup latency (100ms RTT)	600ms	200ms	100ms (0ms resume)
Concurrent requests	6	100+	100+
Header overhead (50 reqs)	~40 KB	~4 KB	~4 KB
Packet loss impact (2%)	1 of 6 connections affected	ALL streams blocked	Only affected stream blocked

WAR STORY

A team running a dashboard application with 80+ API calls per page load saw 3.2 second load times on HTTP/1.1. The browser was queuing requests, only 6 could run at a time, so 80 requests took 14 rounds. Switching the ingress to HTTP/2 (one annotation change) dropped load time to 1.8 seconds. The same 80 requests now ran concurrently over a single connection. Adding HTTP/2 between ingress and backend (h2c) dropped it further to 1.4 seconds by eliminating the connection overhead on the backend leg. No code changes. No architecture changes. Just protocol configuration.

Key Concepts Summary

HTTP/1.1 is limited to one request at a time per TCP connection, browsers work around this with 6 parallel connections, but this wastes resources
HTTP/2 multiplexes many streams over one TCP connection, eliminates HTTP-level head-of-line blocking and reduces connection overhead
HPACK header compression reduces header size by 85-95% after the first request, especially impactful for APIs with large auth tokens
HTTP/2 still suffers from TCP-level HOL blocking, one lost packet stalls all streams because TCP does not know about HTTP streams
HTTP/3 uses QUIC over UDP to eliminate TCP-level HOL blocking, each stream has independent loss recovery
QUIC provides 1-RTT setup (0-RTT on resume) and connection migration, critical for mobile and global users
gRPC requires HTTP/2: set backend-protocol: GRPC on NGINX Ingress for gRPC backends
NGINX Ingress defaults to HTTP/1.1 for backend connections, enable upstream keep-alive and consider h2c for performance
HTTP/2 is the most impactful protocol upgrade you can make in Kubernetes today with minimal effort

Common Mistakes

Not enabling HTTP/2 between ingress and backend, you get HTTP/2 from client to ingress but HTTP/1.1 from ingress to pod, losing multiplexing benefits on the backend leg
Using a ClusterIP Service for gRPC without L7 load balancing, all requests go to one pod because gRPC reuses a single HTTP/2 connection
Assuming HTTP/3 is always better: on reliable networks with low packet loss, HTTP/2 and HTTP/3 perform similarly
Not setting upstream keep-alive on NGINX Ingress, creates a new TCP connection to the backend for every request, accumulating TIME_WAIT
Blocking UDP port 443 in firewalls: this prevents QUIC/HTTP/3 from working, forcing fallback to HTTP/2 over TCP
Configuring long proxy timeouts for gRPC streaming without understanding that idle stream detection still applies, streams can be killed by intermediate proxies that do not see traffic

KNOWLEDGE CHECK

Why does HTTP/2 sometimes perform worse than HTTP/1.1 on networks with high packet loss?

Status Codes That Matter in Production

Continue

L4 vs L7 Load Balancing

←→ navigateM toggle sidebar