Ports, Sockets & Kubernetes Services
You deploy a new version of your API. It crashes on startup with:
Error: listen EADDRINUSE: address already in use :::8080. You check — the old pod is terminated. Nothing else is on port 8080. So why is it "already in use"?The answer involves understanding what a port really is, what a socket really is, and why the kernel treats them as it does. Once you understand these fundamentals, every Kubernetes Service configuration — ClusterIP, NodePort, LoadBalancer — will click into place, because they all map to real TCP connections and real port allocations underneath.
Part 1: Ports — The 16-Bit Address Space
An IP address gets you to a machine. A port gets you to a specific process on that machine. Ports are 16-bit unsigned integers, giving a range of 0 to 65,535.
Port Ranges
| Range | Name | Description | Examples |
|---|---|---|---|
| 0-1023 | Well-known | Reserved for standard services, require root to bind | 22 (SSH), 53 (DNS), 80 (HTTP), 443 (HTTPS) |
| 1024-49151 | Registered | Assigned by IANA but do not require root | 3306 (MySQL), 5432 (PostgreSQL), 6379 (Redis), 8080 (alt HTTP) |
| 49152-65535 | Ephemeral (dynamic) | Used by the OS for outgoing connections | Client-side source ports |
# See the configured ephemeral port range on Linux
sysctl net.ipv4.ip_local_port_range
# net.ipv4.ip_local_port_range = 32768 60999
# (Linux default: 32768-60999, giving 28,232 ephemeral ports)
# Some systems use a wider range
# net.ipv4.ip_local_port_range = 1024 65535
# (gives 64,512 ephemeral ports — better for high-connection systems)
The ephemeral port range determines how many simultaneous outgoing connections a single IP address can make to the same destination IP and port. With the default range (32768-60999), a pod can have at most 28,232 concurrent connections to a single destination. In practice, the limit is lower because TIME_WAIT connections hold ports for 60 seconds after closing.
What "Port Already in Use" Actually Means
When a process calls bind() on a port, the kernel marks that port as taken for the specified IP address and protocol. The error EADDRINUSE means another socket is already bound to the same combination of:
- IP address (or 0.0.0.0 for "all interfaces")
- Port number
- Protocol (TCP or UDP)
The common causes in Kubernetes:
# Find what is using port 8080
ss -tlnp | grep 8080
# LISTEN 0 128 0.0.0.0:8080 0.0.0.0:* users:(("node",pid=1,fd=3))
# If nothing shows up, check for TIME_WAIT
ss -tan | grep 8080
# TIME-WAIT 0 0 10.0.1.5:8080 10.0.2.10:52431
# The socket is in TIME_WAIT — the old connection has not fully closed
# Fix: set SO_REUSEADDR on the server socket
Every server application should set the SO_REUSEADDR socket option before calling bind(). This allows the server to bind to a port even if there are TIME_WAIT connections from a previous instance. Without it, restarting a server can fail for up to 60 seconds while old TIME_WAIT connections expire. In Node.js, Go, and Python, this is usually set by default in modern frameworks — but always verify when debugging EADDRINUSE errors.
Part 2: Sockets — The Full Connection Identity
A socket is not just a port. A socket is the complete identity of one endpoint of a network connection.
The 5-Tuple
Every TCP connection is uniquely identified by a 5-tuple:
- Source IP — the originating machine
- Source port — the ephemeral port chosen by the client OS
- Destination IP — the target machine
- Destination port — the service port (e.g., 8080)
- Protocol — TCP or UDP
Connection 1: 10.0.1.5:52431 → 10.0.2.10:8080 TCP
Connection 2: 10.0.1.5:52432 → 10.0.2.10:8080 TCP
Connection 3: 10.0.1.5:52431 → 10.0.2.10:5432 TCP (same source port, different dest)
All three connections above are distinct because their 5-tuples are different. The kernel uses this 5-tuple to route incoming packets to the correct socket.
A single server listening on port 8080 can handle millions of concurrent connections because each connection has a unique 5-tuple. The server has one listening socket (bound to 0.0.0.0:8080) and creates a new connected socket for each accepted connection. The number of connections is limited by memory, file descriptors, and ephemeral ports on the client side — not by the server port number.
Listening vs Connected Sockets
A server creates two types of sockets:
Listening socket — Created by bind() + listen(). This socket waits for incoming connections on a specific port. There is exactly one per port per IP.
Connected socket — Created by accept() when a client connects. Each connected socket represents one active TCP connection with a unique 5-tuple. A busy server may have thousands of these.
# See both types
ss -tlnp # listening sockets only
# LISTEN 0 128 0.0.0.0:8080 0.0.0.0:* users:(("nginx",pid=1,fd=6))
ss -tnp # connected sockets
# ESTAB 0 0 10.0.1.5:8080 10.0.2.10:52431 users:(("nginx",pid=2,fd=10))
# ESTAB 0 0 10.0.1.5:8080 10.0.2.10:52432 users:(("nginx",pid=3,fd=11))
# ESTAB 0 0 10.0.1.5:8080 10.0.3.20:48891 users:(("nginx",pid=2,fd=12))
Listening Socket vs Connected Sockets
Hover components for details
Part 3: Ephemeral Ports and Port Exhaustion
Every outgoing TCP connection needs a source port. The OS picks one from the ephemeral range. When you run out of ephemeral ports, new connections fail.
How Port Exhaustion Happens
Consider a pod making HTTP requests to an external API:
- Pod opens connection:
10.0.1.5:32768 → 203.0.113.50:443 - Request completes, connection closes
- The source port 32768 enters TIME_WAIT for 60 seconds
- Pod opens another connection:
10.0.1.5:32769 → 203.0.113.50:443 - That port also enters TIME_WAIT
- Repeat 28,232 times and you have no ephemeral ports left
# Check current ephemeral port usage
ss -tan | awk '{print $4}' | cut -d: -f2 | sort -n | tail -20
# Count TIME_WAIT connections per destination
ss -tan state time-wait | awk '{print $4}' | sort | uniq -c | sort -rn | head -10
# 8432 203.0.113.50:443
# 3201 10.0.2.10:5432
# 1102 10.0.3.30:6379
# If you see thousands of TIME_WAIT to one destination, you have a port exhaustion risk
# Check if port exhaustion is happening
# This counter increments when the kernel cannot find a free ephemeral port
nstat -az TcpExtTCPTimeWaitOverflow
Port exhaustion is one of the most common hidden failures in Kubernetes microservices. It does not produce a clear error message — you get vague "connection timed out" or "cannot assign requested address" errors. The root cause is almost always applications creating a new TCP connection for every request instead of using HTTP keep-alive or connection pooling. A single pod making 500 requests per second with 60-second TIME_WAIT needs 30,000 ephemeral ports — more than the default range provides.
Fixing Port Exhaustion
# Option 1: Expand the ephemeral port range
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
# Now 64,512 ports available instead of 28,232
# Option 2: Enable tcp_tw_reuse (allows reuse of TIME_WAIT sockets for new connections)
sysctl -w net.ipv4.tcp_tw_reuse=1
# Safe when both sides support TCP timestamps (almost always true today)
# Option 3 (BEST): Use connection pooling in your application
# Most HTTP clients support this — enable keep-alive and set pool size
The real fix for port exhaustion is always connection pooling, not kernel tuning. HTTP keep-alive reuses TCP connections across multiple requests, meaning one connection handles hundreds or thousands of requests instead of one. Configure your HTTP client library with keep-alive enabled (usually the default) and a reasonable pool size (10-50 connections per destination). This eliminates TIME_WAIT accumulation entirely for that destination.
Part 4: Kubernetes Service Types — TCP Connections All the Way Down
Every Kubernetes Service type is ultimately just a mechanism for routing TCP (or UDP) connections from one IP:port to another. Understanding which IP:port pairs are involved at each layer demystifies Service networking.
ClusterIP — Virtual IP with DNAT
A ClusterIP Service creates a virtual IP address that exists only in iptables/IPVS rules. No network interface is bound to this IP. When a pod connects to the ClusterIP, the kernel rewrites the destination:
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
type: ClusterIP
selector:
app: api
ports:
- port: 80 # The port clients connect to (on the ClusterIP)
targetPort: 8080 # The port the pod is actually listening on
What happens at the TCP level:
1. Pod A sends SYN to 10.96.0.100:80 (ClusterIP)
2. iptables/IPVS intercepts the packet BEFORE it leaves the node
3. DNAT rewrites destination to 10.244.1.5:8080 (Pod B's IP and targetPort)
4. SYN arrives at Pod B on port 8080
5. Pod B sends SYN-ACK back to Pod A's real IP (SNAT is not needed for in-cluster)
6. Connection is ESTABLISHED between Pod A and Pod B
7. All subsequent packets for this connection go to the same Pod B (conntrack)
The ClusterIP does not receive packets. No socket is listening on 10.96.0.100:80. The iptables DNAT rule rewrites the destination address before the packet enters the network stack. This means you cannot ping a ClusterIP — ICMP is not handled by iptables DNAT rules. You can only connect to the specific ports defined in the Service.
NodePort — Port on Every Node
A NodePort Service opens a port in the range 30000-32767 on every node in the cluster. Any traffic arriving at that port on any node is forwarded to the backing pods.
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
type: NodePort
selector:
app: api
ports:
- port: 80 # ClusterIP port (still created)
targetPort: 8080 # Pod port
nodePort: 30080 # Port opened on every node
At the TCP level:
1. External client sends SYN to Node1:30080
2. kube-proxy's iptables rules on Node1 intercept the SYN
3. DNAT rewrites destination to Pod B (maybe on Node2): 10.244.2.5:8080
4. If Pod B is on a different node, the packet is forwarded across the node network
5. SNAT rewrites the source IP to Node1's IP (so the return path works)
6. Connection is ESTABLISHED
NodePort uses SNAT (Source NAT) when forwarding to pods on other nodes. This means the pod sees the node's IP as the source, not the original client IP. If your application needs the real client IP (for rate limiting, geolocation, or logging), you must set externalTrafficPolicy: Local on the Service. This skips SNAT but only routes to pods on the node that received the traffic — which means if no pod is on that node, the connection is refused.
LoadBalancer — Cloud LB in Front of NodePort
A LoadBalancer Service creates a cloud load balancer (AWS NLB/ALB, GCP LB, etc.) that forwards to the NodePort on your cluster nodes.
Kubernetes Service Types — TCP Connection Path
Hover components for details
The full path for a LoadBalancer Service:
Client → Cloud LB → Node:NodePort → iptables DNAT → Pod:targetPort
Each arrow is a real TCP connection (or at minimum a packet rewrite). Understanding this chain tells you where to look when things break:
| Symptom | Likely layer | How to check |
|---|---|---|
| Cannot reach the service at all | Cloud LB or security group | Check LB health checks, security group rules |
| Intermittent connection refused | NodePort not open or pod not ready | Check ss -tlnp on the node, check pod readiness |
| Connections succeed but responses slow | Pod itself or inter-node network | Check pod CPU/memory, check node-to-node latency |
| Random 5xx errors | Backend pod crashing or misconfigured | Check pod logs, check targetPort matches actual listen port |
Part 5: The Port Naming Confusion — Resolved
Kubernetes has four different port-related fields, and they are a constant source of confusion:
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
type: NodePort
ports:
- port: 80 # (1) Service port — what clients inside the cluster connect to
targetPort: 8080 # (2) Pod port — what the container is actually listening on
nodePort: 30080 # (3) Node port — external access port on every node
protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: api
ports:
- containerPort: 8080 # (4) Documentation only — does NOT open the port
The containerPort field in a pod spec is purely informational. It does NOT open or close any port. Your container listens on whatever port the application binds to, regardless of what containerPort says. You could set containerPort: 9999 while the app listens on 8080, and everything would work fine (though it would confuse everyone). The targetPort in the Service is what actually matters — it must match the port your application is listening on.
| Field | Where | Purpose | Required to match |
|---|---|---|---|
containerPort | Pod spec | Documentation for humans and tooling | Nothing (informational) |
targetPort | Service spec | Actual port the Service sends traffic to on the pod | Must match what the app listens on |
port | Service spec | Port the Service is available on (ClusterIP:port) | What in-cluster clients connect to |
nodePort | Service spec | Port opened on every node for external access | What external clients connect to |
# Verify the chain is correct:
# 1. What port is the app listening on?
kubectl exec my-pod -- ss -tlnp
# LISTEN 0 128 0.0.0.0:8080 0.0.0.0:*
# 2. What targetPort does the Service point to?
kubectl get svc api-service -o jsonpath='{.spec.ports[0].targetPort}'
# 8080 ← must match step 1
# 3. What port is the Service available on?
kubectl get svc api-service -o jsonpath='{.spec.ports[0].port}'
# 80 ← what clients use: api-service:80
# 4. What nodePort is exposed?
kubectl get svc api-service -o jsonpath='{.spec.ports[0].nodePort}'
# 30080 ← external access: <node-ip>:30080
Part 6: Health Checks — TCP vs HTTP
Kubernetes supports two types of network health checks, and they test very different things:
TCP Health Check (tcpSocket)
A TCP health check simply attempts the three-way handshake. If SYN-ACK comes back, the check passes. No data is exchanged.
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
This checks: "Is something listening on port 8080?" It does NOT check whether the application is actually functional. A process could be deadlocked, returning errors on every request, or serving stale data — the TCP check will still pass as long as the socket is open.
HTTP Health Check (httpGet)
An HTTP health check performs a full HTTP GET request and checks the response status code.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
This checks: "Does the application respond with a 200-399 status code on /healthz?" It validates that the application is actually running, processing requests, and returning meaningful responses.
Always prefer HTTP health checks over TCP health checks for HTTP-based services. A TCP check only confirms the port is open — your application could be in a broken state and still accept TCP connections. An HTTP check validates the full application stack. The only time TCP checks make sense is for non-HTTP protocols (databases, Redis, custom TCP services) where you cannot perform an HTTP request.
TCP vs HTTP Health Checks
TCP Health Check (tcpSocket)
Tests if the port is open
HTTP Health Check (httpGet)
Tests if the application responds correctly
Key Concepts Summary
- Ports are 16-bit numbers (0-65535) divided into well-known (0-1023), registered (1024-49151), and ephemeral (49152-65535)
- A socket is identified by the 5-tuple: source IP, source port, destination IP, destination port, and protocol
- A server can handle millions of connections on one port because each connection has a unique 5-tuple
- Ephemeral ports are consumed by every outgoing connection and held for 60 seconds in TIME_WAIT after closing
- Port exhaustion happens when TIME_WAIT accumulates faster than ports are released — fix with connection pooling, not kernel tuning
- ClusterIP is a virtual IP that exists only in iptables rules — packets are DNAT-rewritten to pod IPs before entering the network
- NodePort opens a port on every node and forwards to pods, but uses SNAT that hides the client IP
- LoadBalancer chains through all layers: Cloud LB to NodePort to ClusterIP to Pod
- containerPort is purely informational — targetPort in the Service must match the actual application listen port
- HTTP health checks are better than TCP health checks for validating application health
Common Mistakes
- Setting
containerPortin the pod spec and assuming it opens or restricts the port — it is purely documentation - Mismatching
targetPortin the Service with the actual port the application listens on — traffic silently goes to the wrong port and gets RST - Not configuring connection pooling and hitting ephemeral port exhaustion under load
- Using TCP health checks for HTTP services — they pass even when the application is broken
- Forgetting that ClusterIP cannot be pinged (ICMP) — only TCP/UDP connections to defined ports work
- Setting
externalTrafficPolicy: Localwithout ensuring pods are spread across all nodes receiving traffic - Not realizing that NodePort SNAT hides client IPs — critical for rate limiting and access logging
Why can a server handle millions of TCP connections on a single port (e.g., port 8080)?