TCP vs UDP — When to Use Which
Your gRPC service works perfectly in local development. Requests are fast, streaming works, everything is smooth. Then you deploy to Kubernetes and route traffic through a ClusterIP Service. Suddenly, connections drop, latency spikes, and a well-meaning teammate suggests: "gRPC is slow over TCP in K8s. Let's try UDP."
That suggestion is fundamentally wrong, and understanding why it is wrong requires knowing what TCP and UDP actually are, what guarantees they provide, and which protocols depend on those guarantees.
This lesson gives you the mental model to immediately know whether a protocol runs on TCP or UDP — and why.
Part 1: TCP — Reliable, Ordered, Connection-Oriented
TCP (Transmission Control Protocol) is defined in RFC 793. It provides three guarantees that UDP does not:
- Reliability — every byte sent will arrive at the destination, or the sender will be notified of failure
- Ordering — bytes arrive in the same order they were sent
- Flow control — the sender will not overwhelm the receiver
These guarantees come at a cost:
- Connection setup — the three-way handshake adds one RTT of latency before any data flows
- Head-of-line blocking — if one packet is lost, all subsequent packets must wait for the retransmission
- Overhead — TCP headers are 20-60 bytes (vs 8 bytes for UDP), plus ACK traffic in both directions
- State — both sides must maintain connection state (sequence numbers, windows, timers)
# TCP header (20 bytes minimum)
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Source Port | Destination Port |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Sequence Number |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Acknowledgment Number |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Offset| Res |C|E|U|A|P|R|S|F| Window Size |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Checksum | Urgent Pointer |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Options (if any) |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP's reliability guarantee means the protocol handles retransmission, ordering, and error detection for you. The application never has to worry about missing or out-of-order data. This is why virtually every request/response protocol (HTTP, gRPC, database protocols, SSH) is built on TCP — because the application developers do not want to implement their own reliability layer.
What TCP Gives You (and What It Costs)
| Feature | Benefit | Cost |
|---|---|---|
| Three-way handshake | Both sides confirmed reachable | 1 RTT before data flows |
| Sequence numbers | Data arrives in order | 8 bytes of header, state tracking |
| Acknowledgments | Sender knows what arrived | Return traffic, latency |
| Retransmission | Lost packets recovered | Delay on loss (head-of-line blocking) |
| Flow control | Receiver not overwhelmed | May throttle sender |
| Congestion control | Network not overwhelmed | Slow start, reduced throughput initially |
Part 2: UDP — Unreliable, Unordered, Connectionless
UDP (User Datagram Protocol) is defined in RFC 768. It provides almost nothing:
- No connection setup — just send packets
- No reliability — if a packet is lost, it is gone
- No ordering — packets may arrive in any order
- No flow control — the sender can blast as fast as it wants
- No congestion control — UDP will happily saturate the network
The entire UDP header is 8 bytes:
# UDP header (8 bytes — that's it)
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Source Port | Destination Port |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Length | Checksum |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP vs UDP — Protocol Comparison
TCP
Reliable, ordered, connection-oriented
UDP
Unreliable, unordered, connectionless
Why UDP Exists
If UDP provides no guarantees, why use it? Because sometimes the guarantees are worse than the problem:
1. The data is time-sensitive. In a video call, a packet that arrives 500ms late (after TCP retransmission) is useless — the moment has passed. It is better to skip it and show the next frame. UDP lets the application decide what to do with missing data instead of forcing a wait.
2. The interaction is a single request/response. DNS queries are typically one packet out, one packet back. A three-way handshake to set up a connection would triple the latency for a 50-byte question. UDP skips the handshake entirely.
3. The application implements its own reliability. QUIC (the protocol under HTTP/3) runs on UDP but implements its own reliability, ordering, and congestion control in userspace. This lets it fix TCP's head-of-line blocking problem while still being reliable.
A useful mental model: TCP says "the network will handle reliability for you." UDP says "you handle reliability yourself (or don't)." Modern protocols like QUIC choose UDP not because they want unreliable delivery, but because they want to implement reliability differently than TCP does. Raw UDP with no application-level reliability is only appropriate for truly fire-and-forget use cases like metrics collection or logging.
Part 3: Where Each Protocol Is Used
Protocols That Use TCP
| Protocol | Port | Why TCP |
|---|---|---|
| HTTP/1.1, HTTP/2 | 80, 443 | Every byte of a web page must arrive in order |
| gRPC | 443 (over HTTP/2) | Streaming RPCs need reliable ordered delivery |
| SSH | 22 | You cannot afford to lose keystrokes |
| PostgreSQL | 5432 | Database queries and results must be complete and ordered |
| MySQL | 3306 | Same as PostgreSQL |
| Redis | 6379 | Commands and responses must arrive reliably |
| SMTP | 25, 587 | Email must not lose content |
| FTP | 21 | File transfers need completeness |
Protocols That Use UDP
| Protocol | Port | Why UDP |
|---|---|---|
| DNS (queries) | 53 | Single packet request/response, speed matters |
| DHCP | 67, 68 | Broadcast-based, no connection possible |
| NTP | 123 | Time sync packets — late data is useless |
| SNMP | 161, 162 | Monitoring polls — missing one is fine |
| syslog | 514 | Log shipping — some loss acceptable |
| QUIC/HTTP/3 | 443 | Implements own reliability on top of UDP |
| Video streaming (RTP) | Various | Late frames are useless, skip and move on |
| Gaming protocols | Various | Stale position updates are worse than no update |
DNS is the most important UDP protocol you will encounter in Kubernetes. Every service discovery lookup, every external API call, every domain resolution starts with a UDP packet to port 53. When DNS is slow or broken in your cluster, everything is slow or broken. CoreDNS performance directly impacts every microservice.
DNS: The Protocol That Uses Both
DNS primarily uses UDP, but switches to TCP in specific cases:
- Response exceeds 512 bytes (or 4096 with EDNS0) — the server responds with the TC (truncated) flag, and the client retries over TCP
- Zone transfers (AXFR/IXFR) — when a secondary DNS server copies the full zone from a primary, it always uses TCP because the data is large
- DNS over TLS (DoT) — port 853, always TCP
- DNS over HTTPS (DoH) — port 443, always TCP (it is HTTP)
# DNS over UDP (default)
dig google.com
# ;; MSG SIZE rcvd: 55 (fits in one UDP packet)
# Force DNS over TCP
dig +tcp google.com
# ;; MSG SIZE rcvd: 55 (same answer, but used TCP — slower)
# See if responses are being truncated
dig large-record.example.com | grep flags
# ;; flags: qr rd ra tc; (tc = truncated, will retry over TCP)
If your Kubernetes NetworkPolicies allow DNS on UDP port 53 but block TCP port 53, you will have intermittent DNS failures. Any DNS response that exceeds the UDP size limit will fail because the TCP fallback is blocked. Always allow both UDP and TCP on port 53 in your DNS policies. This is one of the most common networking misconfigurations in K8s.
Part 4: QUIC and HTTP/3 — Reliable Protocol on UDP
QUIC deserves special attention because it confuses people. "HTTP/3 uses UDP" sounds like it is unreliable. It is not.
QUIC is a reliable, multiplexed transport protocol built on top of UDP datagrams. It reimplements everything TCP provides — reliability, ordering, congestion control — but does it in userspace rather than in the kernel.
Why Not Just Use TCP?
TCP has a fundamental problem: head-of-line blocking at the transport layer. If you multiplex 10 HTTP/2 streams over a single TCP connection, and one packet from stream 3 is lost, TCP holds up ALL 10 streams while it retransmits that packet. The other 9 streams are fine — their data has arrived — but TCP does not know about streams. It only knows about one ordered byte stream.
QUIC solves this because it knows about streams. If a packet from stream 3 is lost, only stream 3 is blocked. The other 9 streams continue unaffected.
HTTP/2 over TCP vs HTTP/3 over QUIC
HTTP/2 over TCP
All streams share one TCP connection
HTTP/3 over QUIC
Each stream is independent
In Kubernetes, QUIC/HTTP/3 support depends on your ingress controller. NGINX Ingress has experimental HTTP/3 support. Envoy (used by Istio and Gateway API) has production QUIC support. If your clients are browsers, enabling HTTP/3 on your ingress can noticeably improve latency for users on lossy networks (mobile, WiFi). For internal service-to-service communication, HTTP/2 over TCP is still the norm.
Part 5: TCP and UDP in Kubernetes
Service Protocol Configuration
Kubernetes Services default to TCP. You can explicitly set the protocol:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP # default if omitted
- name: dns
port: 53
targetPort: 5353
protocol: UDP # must be explicit for UDP
You cannot mix TCP and UDP ports in a single LoadBalancer Service on most cloud providers. If you need both (like CoreDNS, which serves DNS on both TCP and UDP port 53), you need two separate Services or use a cloud-specific annotation. This limitation catches people when they try to expose game servers or media services that use both protocols.
How kube-proxy Handles TCP vs UDP
kube-proxy (or its IPVS/eBPF replacement) handles TCP and UDP differently:
TCP with iptables mode:
- DNAT rule rewrites destination IP from ClusterIP to a pod IP
- Connection tracking (conntrack) ensures all packets in the same TCP connection go to the same pod
- If the backend pod dies, existing connections hang until TCP timeout
UDP with iptables mode:
- Same DNAT rule, but no connection concept
- conntrack tracks UDP "connections" based on the 5-tuple with a default timeout of 30 seconds
- If a DNS response is slow, a second query might get load-balanced to a different pod — and the response from the first pod gets dropped by conntrack
# Check conntrack entries for TCP and UDP
conntrack -L -p tcp | head -5
conntrack -L -p udp | head -5
# Count conntrack entries by protocol
conntrack -C
# Total: 15432
# conntrack table full? Packets get dropped silently
dmesg | grep conntrack
# nf_conntrack: table full, dropping packet
We had a production cluster where DNS resolution would randomly fail about 1% of the time. The issue was a known Linux kernel bug (now fixed) where UDP packets from pods using SNAT would occasionally get both the source port rewritten AND the conntrack entry confused due to a race condition. The symptom was that DNS responses were being delivered to the wrong pod. The fix was upgrading the kernel. The workaround was setting net.netfilter.nf_conntrack_udp_timeout and nf_conntrack_udp_timeout_stream to more aggressive values. Always check your kernel version if you see intermittent DNS failures in K8s.
Part 6: Why gRPC Needs TCP (Answering the Opening Scenario)
Back to the opening scenario: your gRPC service works locally but fails through a Kubernetes Service. The suggestion to "try UDP" is wrong for a fundamental reason.
gRPC is built on HTTP/2. HTTP/2 is built on TCP. gRPC requires:
- Reliable delivery — RPC request and response bytes must all arrive
- Ordering — protobuf messages must be reassembled in order
- Bidirectional streaming — both sides send data simultaneously over one connection
- Multiplexing — many RPCs share one TCP connection
UDP provides none of these. Using "raw UDP" for gRPC is like suggesting you drive your car without wheels.
The actual problem with gRPC in Kubernetes is usually load balancing. gRPC uses long-lived HTTP/2 connections. A Kubernetes ClusterIP Service load-balances at the connection level, not the request level. If a client opens one gRPC connection, all requests go to one pod — no load distribution.
# The fix: use a gRPC-aware load balancer
# Option 1: Client-side load balancing (resolve all pod IPs)
# Option 2: Envoy/Istio sidecar proxy (L7 load balancing)
# Option 3: Use a headless Service + gRPC client-side balancing
# Headless Service (no ClusterIP, returns all pod IPs)
apiVersion: v1
kind: Service
metadata:
name: grpc-service
spec:
clusterIP: None # headless
selector:
app: my-grpc-app
ports:
- port: 50051
protocol: TCP
The gRPC-in-Kubernetes problem is never about TCP vs UDP. It is about L4 vs L7 load balancing. TCP load balancers (kube-proxy, ClusterIP) distribute connections. gRPC multiplexes many requests over one connection. You need an L7 load balancer (Envoy, Linkerd, Istio) that understands HTTP/2 framing and distributes individual requests across backend pods.
Key Concepts Summary
- TCP provides reliability, ordering, and flow control at the cost of connection setup latency and head-of-line blocking
- UDP provides nothing — it is a thin wrapper around IP that adds port numbers and a checksum
- Most protocols use TCP because application developers do not want to implement their own reliability
- UDP is used when speed matters more than completeness — DNS queries, video streaming, gaming, time-sync
- DNS uses both protocols — UDP for queries, TCP for large responses and zone transfers. Always allow both in firewall rules.
- QUIC/HTTP/3 runs on UDP but IS reliable — it implements TCP-like guarantees in userspace to fix head-of-line blocking
- gRPC requires TCP (via HTTP/2) — the K8s problem is L7 load balancing, not protocol choice
- In Kubernetes, Services default to TCP — UDP must be explicitly specified and has different conntrack behavior
Common Mistakes
- Suggesting UDP for protocols that require reliability (gRPC, HTTP, database connections) — this reveals a fundamental misunderstanding of the transport layer
- Blocking TCP port 53 in NetworkPolicies while allowing UDP port 53 — large DNS responses will fail
- Assuming QUIC/HTTP/3 is "unreliable" because it runs on UDP — QUIC implements its own reliability
- Not specifying
protocol: UDPin K8s Service manifests — everything defaults to TCP - Mixing TCP and UDP ports in a single LoadBalancer Service without checking cloud provider support
- Ignoring conntrack table limits — when the table fills up, both TCP and UDP packets are silently dropped
- Diagnosing gRPC performance issues as a TCP/UDP problem when it is actually an L4/L7 load balancing problem
Why does DNS primarily use UDP instead of TCP for standard queries?