HTTP/1.1 vs HTTP/2 vs HTTP/3
You migrate your Kubernetes ingress from HTTP/1.1 to HTTP/2 for backend connections. Page load times drop by 40%. No code changes. No infrastructure changes. Just a protocol upgrade.
How can changing the wire protocol make that much difference? The answer lies in how each HTTP version manages TCP connections, handles concurrent requests, and deals with packet loss. This lesson explains the evolution from HTTP/1.1 through HTTP/3 and gives you practical guidance on configuring each in Kubernetes.
Part 1: HTTP/1.1 — The Workhorse (and Its Limitations)
HTTP/1.1 (RFC 2616, later refined in RFC 7230-7235) has been the backbone of the web since 1997. It is text-based, human-readable, and well-understood. It is also fundamentally limited by how it uses TCP connections.
One Request Per Connection (Effectively)
In HTTP/1.1, each TCP connection handles requests sequentially. The client sends a request, waits for the complete response, then sends the next request. This is called head-of-line (HOL) blocking at the application layer.
HTTP/1.1 — Sequential requests on one connection:
Connection 1: [GET /index.html] → [response] → [GET /style.css] → [response] → [GET /app.js] → [response]
^
Total time: 3 × round trip
HTTP/1.1 technically supports pipelining — sending multiple requests without waiting for responses. In practice, pipelining is broken:
- Responses must come back in order (HOL blocking still applies)
- Many proxies and servers do not support it
- Browsers disabled it years ago
The Browser Workaround: Multiple Connections
Browsers work around HTTP/1.1's limitations by opening 6 TCP connections per domain. This creates some concurrency but has costs:
HTTP/1.1 — 6 parallel connections to same server:
Connection 1: [GET /index.html] → [response]
Connection 2: [GET /style.css] → [response]
Connection 3: [GET /app.js] → [response]
Connection 4: [GET /logo.png] → [response]
Connection 5: [GET /font.woff] → [response]
Connection 6: [GET /data.json] → [response]
^
All 6 complete in ~1 round trip
But 6 connections means:
- 6 TCP handshakes (6 RTTs of setup latency)
- 6 TLS handshakes (6 more RTTs for HTTPS)
- 6x the memory for TCP buffers and connection state
- 6 slow-start ramp-ups for congestion control (each connection starts slow)
If you need more than 6 concurrent requests, requests 7+ queue and wait.
HTTP/1.1's fundamental limitation is one-request-at-a-time per TCP connection. Every optimization technique from the HTTP/1.1 era — domain sharding, CSS sprites, JS bundling, inline images — exists to work around this limitation. HTTP/2 eliminates the problem entirely by multiplexing many requests over a single connection.
Keep-Alive — Reusing Connections
HTTP/1.1 defaults to persistent connections (keep-alive). After a request/response completes, the TCP connection stays open for the next request. This avoids the cost of a new TCP+TLS handshake for each request.
# Check if keep-alive is working
curl -v https://api.example.com/users
# < Connection: keep-alive
# < Keep-Alive: timeout=60, max=100
# The connection stays open for 60 seconds or 100 requests
# NGINX Ingress upstream keepalive configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
# Keep upstream connections alive to backend pods
nginx.ingress.kubernetes.io/upstream-keepalive-connections: "32"
nginx.ingress.kubernetes.io/upstream-keepalive-timeout: "60"
If your NGINX Ingress is not configured with upstream keep-alive, it creates a new TCP connection to the backend for every single request. This wastes time on handshakes and causes TIME_WAIT accumulation on the ingress controller. Set upstream-keepalive-connections to at least 32 for busy services. The performance impact can be dramatic — we have seen latency drop 30% just from enabling upstream keep-alive.
Part 2: HTTP/2 — Multiplexing Changes Everything
HTTP/2 (RFC 7540, updated by RFC 9113) was published in 2015. Its headline feature is multiplexing: many requests and responses flowing simultaneously over a single TCP connection.
How Multiplexing Works
HTTP/2 introduces the concept of streams. Each request/response pair is a stream, identified by a numeric stream ID. Multiple streams share one TCP connection, and frames from different streams are interleaved:
HTTP/2 — Multiplexed requests on ONE connection:
Connection 1:
Stream 1: [GET /index.html headers] →
Stream 3: [GET /style.css headers] →
Stream 5: [GET /app.js headers] →
Stream 1: ← [response headers] ← [response body chunk 1]
Stream 3: ← [response headers]
Stream 5: ← [response headers] ← [response body chunk 1]
Stream 1: ← [response body chunk 2]
Stream 3: ← [response body chunk 1] ← [response body chunk 2]
...
No head-of-line blocking at the HTTP layer. If the response for stream 3 is slow (database query), streams 1 and 5 continue receiving data without waiting.
HTTP/1.1 vs HTTP/2 Connection Model
HTTP/1.1
6 TCP connections, 1 request each
HTTP/2
1 TCP connection, unlimited streams
HPACK Header Compression
HTTP/1.1 sends headers as plain text on every request. A typical request has 500-800 bytes of headers. If you make 100 requests, that is 50-80 KB of repeated header text.
HTTP/2 uses HPACK compression:
- A shared header table between client and server
- Previously sent headers are referenced by index (1 byte instead of 50+)
- New headers are Huffman-encoded
- Result: 85-95% reduction in header size after the first request
# HTTP/1.1 headers (repeated on every request):
# GET /api/users HTTP/1.1
# Host: api.example.com
# Authorization: Bearer eyJhbGciOiJSUzI1NiIs....(800 bytes)
# Accept: application/json
# User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)...
# Cookie: session=abc123; preferences=dark-mode;....
# Accept-Encoding: gzip, deflate, br
# Accept-Language: en-US,en;q=0.9
# ^^^ ~1200 bytes, sent identically on EVERY request
# HTTP/2: first request sends full headers, subsequent requests
# reference them by index → ~20-50 bytes for the same headers
HPACK compression is especially impactful for API-heavy applications where every request carries a large Authorization header (JWTs can be 800+ bytes). With HTTP/1.1, a page that makes 50 API calls sends 40 KB of redundant header data. With HTTP/2, those headers are sent once and referenced by index for the remaining 49 requests.
HTTP/2 in Kubernetes
Most Kubernetes ingress controllers support HTTP/2 on the client-facing side by default (when TLS is configured). The tricky part is HTTP/2 between the ingress and your backend pods.
# NGINX Ingress: enable HTTP/2 to backend
# By default, NGINX Ingress uses HTTP/1.1 to backends
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
# For gRPC backends (which require HTTP/2)
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
# or for plain HTTP/2 backends
nginx.ingress.kubernetes.io/backend-protocol: "H2C"
NGINX Ingress uses HTTP/1.1 to communicate with backend pods by default, even if the client connects via HTTP/2. This means you lose multiplexing on the backend leg. For gRPC services (which require HTTP/2), you MUST set backend-protocol: GRPC or H2C (HTTP/2 Cleartext). Without this, gRPC traffic will fail with mysterious errors because HTTP/1.1 cannot carry HTTP/2 frames.
Part 3: The TCP Head-of-Line Blocking Problem
HTTP/2 solved head-of-line blocking at the HTTP layer. But it introduced a new problem: TCP-level head-of-line blocking.
All HTTP/2 streams share one TCP connection. TCP guarantees ordered delivery. If one TCP packet is lost (even if it belongs to stream 3), TCP holds up ALL packets behind it until the lost packet is retransmitted — including packets for streams 1, 5, 7, and 9 that have nothing to do with stream 3.
HTTP/2 TCP-level HOL blocking:
TCP packet sequence: [S1][S3][S5][S1][S3][S5][S1]
↑
This packet lost
TCP stalls ALL streams while retransmitting:
Stream 1: has data ready → WAITING for TCP retransmit
Stream 3: waiting for its own retransmit → BLOCKED
Stream 5: has data ready → WAITING for TCP retransmit
On a reliable network (wired datacenter), this rarely matters — packet loss is below 0.01%. On a lossy network (mobile, WiFi), this can make HTTP/2 slower than HTTP/1.1. With 6 separate HTTP/1.1 connections, a lost packet on connection 3 only blocks connection 3. The other 5 continue happily.
HTTP/2's TCP-level head-of-line blocking is the entire motivation for HTTP/3 and QUIC. If your users are on reliable networks (internal services, datacenter-to-datacenter), HTTP/2 is excellent. If your users are on lossy networks (mobile apps, global users on WiFi), HTTP/3 provides a measurable improvement because it eliminates this last form of HOL blocking.
Part 4: HTTP/3 — QUIC and the End of HOL Blocking
HTTP/3 (RFC 9114) replaces TCP with QUIC (RFC 9000) as the transport layer. QUIC runs over UDP but is not "raw UDP" — it implements reliability, ordering, congestion control, and encryption natively.
What QUIC Changes
| Feature | TCP (HTTP/2) | QUIC (HTTP/3) |
|---|---|---|
| Transport | Kernel TCP stack | Userspace QUIC over UDP |
| Connection setup | TCP + TLS = 2-3 RTT | Combined = 1 RTT (0-RTT on resume) |
| Stream independence | No (TCP sees one byte stream) | Yes (each stream has independent loss recovery) |
| HOL blocking | TCP-level blocking on packet loss | Only the affected stream is blocked |
| Connection migration | Impossible (tied to 4-tuple) | Survives IP changes (uses Connection ID) |
| Encryption | Optional (TLS layered on top) | Mandatory (TLS 1.3 built-in) |
0-RTT Connection Resumption
QUIC supports resuming a previous connection with zero round trips:
First connection:
Client → Server: QUIC Initial (includes TLS ClientHello)
Server → Client: QUIC Handshake (includes TLS ServerHello + cert)
Client → Server: QUIC Handshake Complete + first HTTP request
^^^ 1 RTT total (vs 2-3 RTT for TCP + TLS 1.2)
Resumed connection:
Client → Server: QUIC 0-RTT (includes TLS resumption + first HTTP request)
Server → Client: QUIC response
^^^ 0 RTT — data sent immediately
0-RTT is incredibly powerful for mobile applications and global services. A user in Tokyo connecting to a server in Virginia (150ms RTT) saves 300-450ms on connection setup with QUIC compared to TCP + TLS 1.2. For a page that loads 20 resources, this adds up quickly. However, 0-RTT has a replay attack risk — only safe for idempotent requests (GET). Non-idempotent requests should always use 1-RTT.
Connection Migration
TCP connections are identified by the 4-tuple: source IP, source port, destination IP, destination port. If any of these change (phone switches from WiFi to cellular), the TCP connection breaks. The client must establish a new connection with a new handshake.
QUIC uses a Connection ID instead. When the client's IP changes, it sends packets with the same Connection ID from the new IP. The server recognizes the connection and continues without interruption.
TCP: Phone switches WiFi → 4G
Old: 192.168.1.50:52431 → 203.0.113.50:443 (connection dies)
New: 10.0.0.1:48222 → 203.0.113.50:443 (new connection, new handshake)
QUIC: Phone switches WiFi → 4G
Old: 192.168.1.50:52431 → 203.0.113.50:443 [ConnID: abc123]
New: 10.0.0.1:48222 → 203.0.113.50:443 [ConnID: abc123] (same connection!)
A mobile banking app had a persistent complaint: users on trains would get logged out constantly. The train passed through areas where the phone switched between cell towers, changing IP addresses. Each IP change killed the TCP connection, expired the session cookie (the server-side session was tied to the client IP for security), and forced re-authentication. Migrating the API to HTTP/3 with QUIC eliminated the connection drops. They also had to remove the IP-binding from their session logic — a security vs usability trade-off they debated for weeks.
Part 5: gRPC — Built on HTTP/2
gRPC deserves special mention because it is the dominant RPC framework in Kubernetes microservice architectures, and it is built directly on HTTP/2.
Why gRPC Chose HTTP/2
gRPC needs:
- Multiplexing — many RPCs over one connection (HTTP/2 streams)
- Bidirectional streaming — both client and server send data simultaneously (HTTP/2 supports this)
- Header compression — metadata on every RPC (HPACK reduces overhead)
- Binary framing — protobuf messages are binary, HTTP/2 frames are binary (no base64 encoding needed)
# gRPC over HTTP/2 — the wire format
# Each gRPC call maps to one HTTP/2 stream:
#
# Request:
# :method: POST
# :path: /mypackage.MyService/MyMethod
# content-type: application/grpc
# grpc-encoding: gzip
# [binary protobuf body]
#
# Response:
# :status: 200
# content-type: application/grpc
# [binary protobuf body]
# grpc-status: 0
# grpc-message: OK
Configuring gRPC in Kubernetes
# NGINX Ingress for gRPC
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
# Increase timeout for streaming RPCs
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
tls:
- hosts:
- grpc.example.com
secretName: grpc-tls
rules:
- host: grpc.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grpc-service
port:
number: 50051
gRPC in Kubernetes requires three things: (1) HTTP/2 between the ingress and backend (backend-protocol: GRPC), (2) TLS on the ingress (most gRPC clients expect TLS), and (3) L7 load balancing for request-level distribution. Without L7 balancing, all gRPC requests on a single connection go to one pod — negating any horizontal scaling benefits.
Part 6: Choosing the Right HTTP Version
Decision Matrix
| Scenario | Recommended | Why |
|---|---|---|
| Internal service-to-service (K8s) | HTTP/2 (h2c) | Multiplexing, header compression, gRPC support |
| Public web traffic (browsers) | HTTP/2 + HTTP/3 | Negotiate the best available, fallback gracefully |
| gRPC services | HTTP/2 (required) | gRPC is built on HTTP/2, no alternative |
| Legacy systems | HTTP/1.1 | Some backends cannot speak HTTP/2 |
| Mobile apps (global users) | HTTP/3 (QUIC) | Connection migration, 0-RTT, no HOL blocking on lossy networks |
Configuring HTTP/2 on NGINX Ingress
# Enable HTTP/2 to clients (usually enabled by default with TLS)
apiVersion: v1
kind: ConfigMap
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
data:
use-http2: "true" # Client-facing HTTP/2 (default: true)
upstream-keepalive-connections: "32" # Keep HTTP/1.1 connections to backend alive
Verifying HTTP Version
# Check what HTTP version your server negotiated
curl -v --http2 https://api.example.com/users 2>&1 | grep "using HTTP"
# * using HTTP/2
# Force HTTP/1.1 to compare
curl -v --http1.1 https://api.example.com/users 2>&1 | grep "using HTTP"
# * using HTTP/1.1
# Check HTTP/3 support (requires curl with HTTP/3 built in)
curl -v --http3 https://api.example.com/users 2>&1 | grep "using HTTP"
# * using HTTP/3
# See ALPN negotiation in TLS handshake
curl -v https://api.example.com/ 2>&1 | grep ALPN
# * ALPN: offers h2,http/1.1
# * ALPN: server accepted h2
HTTP Version Evolution
Click each step to explore
For most Kubernetes deployments, enabling HTTP/2 between ingress and backends is the highest-impact change you can make. It requires minimal configuration (one annotation), no code changes, and the benefits — multiplexing, header compression, reduced connection overhead — are immediate. HTTP/3 is the future but requires more infrastructure support (UDP load balancers, QUIC-capable ingress controllers). Start with HTTP/2 today.
Part 7: Performance Comparison — Real Numbers
To make this concrete, here is what the protocol differences look like in practice for a page that loads 50 resources:
| Metric | HTTP/1.1 (6 conns) | HTTP/2 (1 conn) | HTTP/3 (QUIC) |
|---|---|---|---|
| TCP handshakes | 6 | 1 | 0 (UDP) |
| TLS handshakes | 6 | 1 | 1 (combined, 0 on resume) |
| Setup latency (100ms RTT) | 600ms | 200ms | 100ms (0ms resume) |
| Concurrent requests | 6 | 100+ | 100+ |
| Header overhead (50 reqs) | ~40 KB | ~4 KB | ~4 KB |
| Packet loss impact (2%) | 1 of 6 connections affected | ALL streams blocked | Only affected stream blocked |
A team running a dashboard application with 80+ API calls per page load saw 3.2 second load times on HTTP/1.1. The browser was queuing requests — only 6 could run at a time, so 80 requests took 14 rounds. Switching the ingress to HTTP/2 (one annotation change) dropped load time to 1.8 seconds. The same 80 requests now ran concurrently over a single connection. Adding HTTP/2 between ingress and backend (h2c) dropped it further to 1.4 seconds by eliminating the connection overhead on the backend leg. No code changes. No architecture changes. Just protocol configuration.
Key Concepts Summary
- HTTP/1.1 is limited to one request at a time per TCP connection — browsers work around this with 6 parallel connections, but this wastes resources
- HTTP/2 multiplexes many streams over one TCP connection — eliminates HTTP-level head-of-line blocking and reduces connection overhead
- HPACK header compression reduces header size by 85-95% after the first request — especially impactful for APIs with large auth tokens
- HTTP/2 still suffers from TCP-level HOL blocking — one lost packet stalls all streams because TCP does not know about HTTP streams
- HTTP/3 uses QUIC over UDP to eliminate TCP-level HOL blocking — each stream has independent loss recovery
- QUIC provides 1-RTT setup (0-RTT on resume) and connection migration — critical for mobile and global users
- gRPC requires HTTP/2 — set
backend-protocol: GRPCon NGINX Ingress for gRPC backends - NGINX Ingress defaults to HTTP/1.1 for backend connections — enable upstream keep-alive and consider h2c for performance
- HTTP/2 is the most impactful protocol upgrade you can make in Kubernetes today with minimal effort
Common Mistakes
- Not enabling HTTP/2 between ingress and backend — you get HTTP/2 from client to ingress but HTTP/1.1 from ingress to pod, losing multiplexing benefits on the backend leg
- Using a ClusterIP Service for gRPC without L7 load balancing — all requests go to one pod because gRPC reuses a single HTTP/2 connection
- Assuming HTTP/3 is always better — on reliable networks with low packet loss, HTTP/2 and HTTP/3 perform similarly
- Not setting upstream keep-alive on NGINX Ingress — creates a new TCP connection to the backend for every request, accumulating TIME_WAIT
- Blocking UDP port 443 in firewalls — this prevents QUIC/HTTP/3 from working, forcing fallback to HTTP/2 over TCP
- Configuring long proxy timeouts for gRPC streaming without understanding that idle stream detection still applies — streams can be killed by intermediate proxies that do not see traffic
Why does HTTP/2 sometimes perform worse than HTTP/1.1 on networks with high packet loss?