Networking Fundamentals for Engineers

TCP vs UDP: When to Use Which

Your gRPC service works perfectly in local development. Requests are fast, streaming works, everything is smooth. Then you deploy to Kubernetes and route traffic through a ClusterIP Service. Suddenly, connections drop, latency spikes, and a well-meaning teammate suggests: "gRPC is slow over TCP in K8s. Let's try UDP."

That suggestion is fundamentally wrong, and understanding why it is wrong requires knowing what TCP and UDP actually are, what guarantees they provide, and which protocols depend on those guarantees.

This lesson gives you the mental model to immediately know whether a protocol runs on TCP or UDP, and why.

Part 1: TCP: Reliable, Ordered, Connection-Oriented

TCP (Transmission Control Protocol) is defined in RFC 793. It provides three guarantees that UDP does not:

Reliability: every byte sent will arrive at the destination, or the sender will be notified of failure
Ordering: bytes arrive in the same order they were sent
Flow control: the sender will not overwhelm the receiver

These guarantees come at a cost:

Connection setup: the three-way handshake adds one RTT of latency before any data flows
Head-of-line blocking: if one packet is lost, all subsequent packets must wait for the retransmission
Overhead: TCP headers are 20-60 bytes (vs 8 bytes for UDP), plus ACK traffic in both directions
State: both sides must maintain connection state (sequence numbers, windows, timers)

# TCP header (20 bytes minimum)
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# |          Source Port          |       Destination Port        |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# |                        Sequence Number                       |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# |                    Acknowledgment Number                     |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# | Offset|  Res  |C|E|U|A|P|R|S|F|          Window Size         |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# |           Checksum            |         Urgent Pointer        |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# |                    Options (if any)                          |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

KEY CONCEPT

TCP's reliability guarantee means the protocol handles retransmission, ordering, and error detection for you. The application never has to worry about missing or out-of-order data. This is why virtually every request/response protocol (HTTP, gRPC, database protocols, SSH) is built on TCP, because the application developers do not want to implement their own reliability layer.

What TCP Gives You (and What It Costs)

Feature	Benefit	Cost
Three-way handshake	Both sides confirmed reachable	1 RTT before data flows
Sequence numbers	Data arrives in order	8 bytes of header, state tracking
Acknowledgments	Sender knows what arrived	Return traffic, latency
Retransmission	Lost packets recovered	Delay on loss (head-of-line blocking)
Flow control	Receiver not overwhelmed	May throttle sender
Congestion control	Network not overwhelmed	Slow start, reduced throughput initially

Part 2: UDP: Unreliable, Unordered, Connectionless

UDP (User Datagram Protocol) is defined in RFC 768. It provides almost nothing:

No connection setup: just send packets
No reliability: if a packet is lost, it is gone
No ordering: packets may arrive in any order
No flow control: the sender can blast as fast as it wants
No congestion control: UDP will happily saturate the network

The entire UDP header is 8 bytes:

# UDP header (8 bytes — that's it)
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# |          Source Port          |       Destination Port        |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# |            Length             |           Checksum            |
# +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TCP vs UDP, Protocol Comparison

TCP

Reliable, ordered, connection-oriented

Header size20-60 bytes

Connection setupThree-way handshake (1 RTT)

ReliabilityGuaranteed delivery with retransmission

OrderingBytes arrive in order sent

Flow controlReceiver window prevents overload

Congestion controlSlow start, AIMD, cubic

StateBoth sides track connection

Use whenCorrectness matters more than speed

UDP

Unreliable, unordered, connectionless

Header size8 bytes

Connection setupNone, just send

ReliabilityNone, packets may be lost

OrderingNone, packets may arrive out of order

Flow controlNone, sender blasts freely

Congestion controlNone: application must self-regulate

StateStateless: no connection to track

Use whenSpeed matters more than perfection

Why UDP Exists

If UDP provides no guarantees, why use it? Because sometimes the guarantees are worse than the problem:

1. The data is time-sensitive. In a video call, a packet that arrives 500ms late (after TCP retransmission) is useless, the moment has passed. It is better to skip it and show the next frame. UDP lets the application decide what to do with missing data instead of forcing a wait.

2. The interaction is a single request/response. DNS queries are typically one packet out, one packet back. A three-way handshake to set up a connection would triple the latency for a 50-byte question. UDP skips the handshake entirely.

3. The application implements its own reliability. QUIC (the protocol under HTTP/3) runs on UDP but implements its own reliability, ordering, and congestion control in userspace. This lets it fix TCP's head-of-line blocking problem while still being reliable.

PRO TIP

A useful mental model: TCP says "the network will handle reliability for you." UDP says "you handle reliability yourself (or don't)." Modern protocols like QUIC choose UDP not because they want unreliable delivery, but because they want to implement reliability differently than TCP does. Raw UDP with no application-level reliability is only appropriate for truly fire-and-forget use cases like metrics collection or logging.

Part 3: Where Each Protocol Is Used

Protocols That Use TCP

Protocol	Port	Why TCP
HTTP/1.1, HTTP/2	80, 443	Every byte of a web page must arrive in order
gRPC	443 (over HTTP/2)	Streaming RPCs need reliable ordered delivery
SSH	22	You cannot afford to lose keystrokes
PostgreSQL	5432	Database queries and results must be complete and ordered
MySQL	3306	Same as PostgreSQL
Redis	6379	Commands and responses must arrive reliably
SMTP	25, 587	Email must not lose content
FTP	21	File transfers need completeness

Protocols That Use UDP

Protocol	Port	Why UDP
DNS (queries)	53	Single packet request/response, speed matters
DHCP	67, 68	Broadcast-based, no connection possible
NTP	123	Time sync packets, late data is useless
SNMP	161, 162	Monitoring polls, missing one is fine
syslog	514	Log shipping, some loss acceptable
QUIC/HTTP/3	443	Implements own reliability on top of UDP
Video streaming (RTP)	Various	Late frames are useless, skip and move on
Gaming protocols	Various	Stale position updates are worse than no update

KEY CONCEPT

DNS is the most important UDP protocol you will encounter in Kubernetes. Every service discovery lookup, every external API call, every domain resolution starts with a UDP packet to port 53. When DNS is slow or broken in your cluster, everything is slow or broken. CoreDNS performance directly impacts every microservice.

DNS: The Protocol That Uses Both

DNS primarily uses UDP, but switches to TCP in specific cases:

Response exceeds 512 bytes (or 4096 with EDNS0), the server responds with the TC (truncated) flag, and the client retries over TCP
Zone transfers (AXFR/IXFR): when a secondary DNS server copies the full zone from a primary, it always uses TCP because the data is large
DNS over TLS (DoT): port 853, always TCP
DNS over HTTPS (DoH): port 443, always TCP (it is HTTP)

# DNS over UDP (default)
dig google.com
# ;; MSG SIZE  rcvd: 55    (fits in one UDP packet)

# Force DNS over TCP
dig +tcp google.com
# ;; MSG SIZE  rcvd: 55    (same answer, but used TCP — slower)

# See if responses are being truncated
dig large-record.example.com | grep flags
# ;; flags: qr rd ra tc;    (tc = truncated, will retry over TCP)

WARNING

If your Kubernetes NetworkPolicies allow DNS on UDP port 53 but block TCP port 53, you will have intermittent DNS failures. Any DNS response that exceeds the UDP size limit will fail because the TCP fallback is blocked. Always allow both UDP and TCP on port 53 in your DNS policies. This is one of the most common networking misconfigurations in K8s.

Part 4: QUIC and HTTP/3: Reliable Protocol on UDP

QUIC deserves special attention because it confuses people. "HTTP/3 uses UDP" sounds like it is unreliable. It is not.

QUIC is a reliable, multiplexed transport protocol built on top of UDP datagrams. It reimplements everything TCP provides: reliability, ordering, congestion control, but does it in userspace rather than in the kernel.

Why Not Just Use TCP?

TCP has a fundamental problem: head-of-line blocking at the transport layer. If you multiplex 10 HTTP/2 streams over a single TCP connection, and one packet from stream 3 is lost, TCP holds up ALL 10 streams while it retransmits that packet. The other 9 streams are fine, their data has arrived, but TCP does not know about streams. It only knows about one ordered byte stream.

QUIC solves this because it knows about streams. If a packet from stream 3 is lost, only stream 3 is blocked. The other 9 streams continue unaffected.

HTTP/2 over TCP vs HTTP/3 over QUIC

HTTP/2 over TCP

All streams share one TCP connection

TransportTCP (kernel)

StreamsMultiplexed, but TCP sees one byte stream

Packet lossOne lost packet blocks ALL streams

Connection setupTCP handshake + TLS handshake = 2-3 RTT

Connection migrationNot possible: tied to IP:port tuple

Head-of-line blockingAt TCP level, all streams blocked

HTTP/3 over QUIC

Each stream is independent

TransportQUIC over UDP (userspace)

StreamsMultiplexed with independent loss recovery

Packet lossOnly the affected stream is blocked

Connection setupCombined transport + TLS = 1 RTT (0-RTT on resumption)

Connection migrationSurvives IP changes (uses connection ID)

Head-of-line blockingEliminated, streams are independent

PRO TIP

In Kubernetes, QUIC/HTTP/3 support depends on your ingress controller. NGINX Ingress has experimental HTTP/3 support. Envoy (used by Istio and Gateway API) has production QUIC support. If your clients are browsers, enabling HTTP/3 on your ingress can noticeably improve latency for users on lossy networks (mobile, WiFi). For internal service-to-service communication, HTTP/2 over TCP is still the norm.

Part 5: TCP and UDP in Kubernetes

Service Protocol Configuration

Kubernetes Services default to TCP. You can explicitly set the protocol:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP      # default if omitted

    - name: dns
      port: 53
      targetPort: 5353
      protocol: UDP      # must be explicit for UDP

WARNING

You cannot mix TCP and UDP ports in a single LoadBalancer Service on most cloud providers. If you need both (like CoreDNS, which serves DNS on both TCP and UDP port 53), you need two separate Services or use a cloud-specific annotation. This limitation catches people when they try to expose game servers or media services that use both protocols.

How kube-proxy Handles TCP vs UDP

kube-proxy (or its IPVS/eBPF replacement) handles TCP and UDP differently:

TCP with iptables mode:

DNAT rule rewrites destination IP from ClusterIP to a pod IP
Connection tracking (conntrack) ensures all packets in the same TCP connection go to the same pod
If the backend pod dies, existing connections hang until TCP timeout

UDP with iptables mode:

Same DNAT rule, but no connection concept
conntrack tracks UDP "connections" based on the 5-tuple with a default timeout of 30 seconds
If a DNS response is slow, a second query might get load-balanced to a different pod, and the response from the first pod gets dropped by conntrack

# Check conntrack entries for TCP and UDP
conntrack -L -p tcp | head -5
conntrack -L -p udp | head -5

# Count conntrack entries by protocol
conntrack -C
# Total: 15432

# conntrack table full? Packets get dropped silently
dmesg | grep conntrack
# nf_conntrack: table full, dropping packet

WAR STORY

We had a production cluster where DNS resolution would randomly fail about 1% of the time. The issue was a known Linux kernel bug (now fixed) where UDP packets from pods using SNAT would occasionally get both the source port rewritten AND the conntrack entry confused due to a race condition. The symptom was that DNS responses were being delivered to the wrong pod. The fix was upgrading the kernel. The workaround was setting net.netfilter.nf_conntrack_udp_timeout and nf_conntrack_udp_timeout_stream to more aggressive values. Always check your kernel version if you see intermittent DNS failures in K8s.

Part 6: Why gRPC Needs TCP (Answering the Opening Scenario)

Back to the opening scenario: your gRPC service works locally but fails through a Kubernetes Service. The suggestion to "try UDP" is wrong for a fundamental reason.

gRPC is built on HTTP/2. HTTP/2 is built on TCP. gRPC requires:

Reliable delivery: RPC request and response bytes must all arrive
Ordering: protobuf messages must be reassembled in order
Bidirectional streaming: both sides send data simultaneously over one connection
Multiplexing: many RPCs share one TCP connection

UDP provides none of these. Using "raw UDP" for gRPC is like suggesting you drive your car without wheels.

The actual problem with gRPC in Kubernetes is usually load balancing. gRPC uses long-lived HTTP/2 connections. A Kubernetes ClusterIP Service load-balances at the connection level, not the request level. If a client opens one gRPC connection, all requests go to one pod, no load distribution.

# The fix: use a gRPC-aware load balancer
# Option 1: Client-side load balancing (resolve all pod IPs)
# Option 2: Envoy/Istio sidecar proxy (L7 load balancing)
# Option 3: Use a headless Service + gRPC client-side balancing

# Headless Service (no ClusterIP, returns all pod IPs)
apiVersion: v1
kind: Service
metadata:
  name: grpc-service
spec:
  clusterIP: None     # headless
  selector:
    app: my-grpc-app
  ports:
    - port: 50051
      protocol: TCP

KEY CONCEPT

The gRPC-in-Kubernetes problem is never about TCP vs UDP. It is about L4 vs L7 load balancing. TCP load balancers (kube-proxy, ClusterIP) distribute connections. gRPC multiplexes many requests over one connection. You need an L7 load balancer (Envoy, Linkerd, Istio) that understands HTTP/2 framing and distributes individual requests across backend pods.

Key Concepts Summary

TCP provides reliability, ordering, and flow control at the cost of connection setup latency and head-of-line blocking
UDP provides nothing: it is a thin wrapper around IP that adds port numbers and a checksum
Most protocols use TCP because application developers do not want to implement their own reliability
UDP is used when speed matters more than completeness: DNS queries, video streaming, gaming, time-sync
DNS uses both protocols: UDP for queries, TCP for large responses and zone transfers. Always allow both in firewall rules.
QUIC/HTTP/3 runs on UDP but IS reliable, it implements TCP-like guarantees in userspace to fix head-of-line blocking
gRPC requires TCP (via HTTP/2): the K8s problem is L7 load balancing, not protocol choice
In Kubernetes, Services default to TCP. UDP must be explicitly specified and has different conntrack behavior

Common Mistakes

Suggesting UDP for protocols that require reliability (gRPC, HTTP, database connections), this reveals a fundamental misunderstanding of the transport layer
Blocking TCP port 53 in NetworkPolicies while allowing UDP port 53, large DNS responses will fail
Assuming QUIC/HTTP/3 is "unreliable" because it runs on UDP, QUIC implements its own reliability
Not specifying protocol: UDP in K8s Service manifests, everything defaults to TCP
Mixing TCP and UDP ports in a single LoadBalancer Service without checking cloud provider support
Ignoring conntrack table limits: when the table fills up, both TCP and UDP packets are silently dropped
Diagnosing gRPC performance issues as a TCP/UDP problem when it is actually an L4/L7 load balancing problem

KNOWLEDGE CHECK

Why does DNS primarily use UDP instead of TCP for standard queries?

TCP, The Three-Way Handshake & Connection Lifecycle

Continue

Ports, Sockets & Kubernetes Services

←→ navigateM toggle sidebar