Networking Fundamentals for Engineers

Ports, Sockets & Kubernetes Services

You deploy a new version of your API. It crashes on startup with: Error: listen EADDRINUSE: address already in use :::8080. You check, the old pod is terminated. Nothing else is on port 8080. So why is it "already in use"?

The answer involves understanding what a port really is, what a socket really is, and why the kernel treats them as it does. Once you understand these fundamentals, every Kubernetes Service configuration: ClusterIP, NodePort, LoadBalancer, will click into place, because they all map to real TCP connections and real port allocations underneath.

Part 1: Ports: The 16-Bit Address Space

An IP address gets you to a machine. A port gets you to a specific process on that machine. Ports are 16-bit unsigned integers, giving a range of 0 to 65,535.

Port Ranges

Range	Name	Description	Examples
0-1023	Well-known	Reserved for standard services, require root to bind	22 (SSH), 53 (DNS), 80 (HTTP), 443 (HTTPS)
1024-49151	Registered	Assigned by IANA but do not require root	3306 (MySQL), 5432 (PostgreSQL), 6379 (Redis), 8080 (alt HTTP)
49152-65535	Ephemeral (dynamic)	Used by the OS for outgoing connections	Client-side source ports

# See the configured ephemeral port range on Linux
sysctl net.ipv4.ip_local_port_range
# net.ipv4.ip_local_port_range = 32768    60999
# (Linux default: 32768-60999, giving 28,232 ephemeral ports)

# Some systems use a wider range
# net.ipv4.ip_local_port_range = 1024    65535
# (gives 64,512 ephemeral ports — better for high-connection systems)

KEY CONCEPT

The ephemeral port range determines how many simultaneous outgoing connections a single IP address can make to the same destination IP and port. With the default range (32768-60999), a pod can have at most 28,232 concurrent connections to a single destination. In practice, the limit is lower because TIME_WAIT connections hold ports for 60 seconds after closing.

What "Port Already in Use" Actually Means

When a process calls bind() on a port, the kernel marks that port as taken for the specified IP address and protocol. The error EADDRINUSE means another socket is already bound to the same combination of:

IP address (or 0.0.0.0 for "all interfaces")
Port number
Protocol (TCP or UDP)

The common causes in Kubernetes:

# Find what is using port 8080
ss -tlnp | grep 8080
# LISTEN  0  128  0.0.0.0:8080  0.0.0.0:*  users:(("node",pid=1,fd=3))

# If nothing shows up, check for TIME_WAIT
ss -tan | grep 8080
# TIME-WAIT  0  0  10.0.1.5:8080  10.0.2.10:52431

# The socket is in TIME_WAIT — the old connection has not fully closed
# Fix: set SO_REUSEADDR on the server socket

PRO TIP

Every server application should set the SO_REUSEADDR socket option before calling bind(). This allows the server to bind to a port even if there are TIME_WAIT connections from a previous instance. Without it, restarting a server can fail for up to 60 seconds while old TIME_WAIT connections expire. In Node.js, Go, and Python, this is usually set by default in modern frameworks, but always verify when debugging EADDRINUSE errors.

Part 2: Sockets: The Full Connection Identity

A socket is not just a port. A socket is the complete identity of one endpoint of a network connection.

The 5-Tuple

Every TCP connection is uniquely identified by a 5-tuple:

Source IP: the originating machine
Source port: the ephemeral port chosen by the client OS
Destination IP: the target machine
Destination port: the service port (e.g., 8080)
Protocol: TCP or UDP

Connection 1:  10.0.1.5:52431  →  10.0.2.10:8080  TCP
Connection 2:  10.0.1.5:52432  →  10.0.2.10:8080  TCP
Connection 3:  10.0.1.5:52431  →  10.0.2.10:5432  TCP  (same source port, different dest)

All three connections above are distinct because their 5-tuples are different. The kernel uses this 5-tuple to route incoming packets to the correct socket.

KEY CONCEPT

A single server listening on port 8080 can handle millions of concurrent connections because each connection has a unique 5-tuple. The server has one listening socket (bound to 0.0.0.0:8080) and creates a new connected socket for each accepted connection. The number of connections is limited by memory, file descriptors, and ephemeral ports on the client side, not by the server port number.

Listening vs Connected Sockets

A server creates two types of sockets:

Listening socket, Created by bind() + listen(). This socket waits for incoming connections on a specific port. There is exactly one per port per IP.

Connected socket, Created by accept() when a client connects. Each connected socket represents one active TCP connection with a unique 5-tuple. A busy server may have thousands of these.

# See both types
ss -tlnp  # listening sockets only
# LISTEN  0  128  0.0.0.0:8080  0.0.0.0:*  users:(("nginx",pid=1,fd=6))

ss -tnp   # connected sockets
# ESTAB  0  0  10.0.1.5:8080  10.0.2.10:52431  users:(("nginx",pid=2,fd=10))
# ESTAB  0  0  10.0.1.5:8080  10.0.2.10:52432  users:(("nginx",pid=3,fd=11))
# ESTAB  0  0  10.0.1.5:8080  10.0.3.20:48891  users:(("nginx",pid=2,fd=12))

Listening Socket vs Connected Sockets

Listening Socket: 0.0.0.0:8080 (1 socket)

Connected: 10.0.1.5:8080 <-> 10.0.2.10:52431

Connected: 10.0.1.5:8080 <-> 10.0.2.10:52432

Connected: 10.0.1.5:8080 <-> 10.0.3.20:48891

Connected: ... (thousands more)

Kernel: routes packets by 5-tuple to correct socket

Each socket = one file descriptor (check ulimit -n)

Hover components for details

Part 3: Ephemeral Ports and Port Exhaustion

Every outgoing TCP connection needs a source port. The OS picks one from the ephemeral range. When you run out of ephemeral ports, new connections fail.

How Port Exhaustion Happens

Consider a pod making HTTP requests to an external API:

Pod opens connection: 10.0.1.5:32768 → 203.0.113.50:443
Request completes, connection closes
The source port 32768 enters TIME_WAIT for 60 seconds
Pod opens another connection: 10.0.1.5:32769 → 203.0.113.50:443
That port also enters TIME_WAIT
Repeat 28,232 times and you have no ephemeral ports left

# Check current ephemeral port usage
ss -tan | awk '{print $4}' | cut -d: -f2 | sort -n | tail -20

# Count TIME_WAIT connections per destination
ss -tan state time-wait | awk '{print $4}' | sort | uniq -c | sort -rn | head -10
#   8432 203.0.113.50:443
#   3201 10.0.2.10:5432
#   1102 10.0.3.30:6379

# If you see thousands of TIME_WAIT to one destination, you have a port exhaustion risk

# Check if port exhaustion is happening
# This counter increments when the kernel cannot find a free ephemeral port
nstat -az TcpExtTCPTimeWaitOverflow

WARNING

Port exhaustion is one of the most common hidden failures in Kubernetes microservices. It does not produce a clear error message, you get vague "connection timed out" or "cannot assign requested address" errors. The root cause is almost always applications creating a new TCP connection for every request instead of using HTTP keep-alive or connection pooling. A single pod making 500 requests per second with 60-second TIME_WAIT needs 30,000 ephemeral ports, more than the default range provides.

Fixing Port Exhaustion

# Option 1: Expand the ephemeral port range
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
# Now 64,512 ports available instead of 28,232

# Option 2: Enable tcp_tw_reuse (allows reuse of TIME_WAIT sockets for new connections)
sysctl -w net.ipv4.tcp_tw_reuse=1
# Safe when both sides support TCP timestamps (almost always true today)

# Option 3 (BEST): Use connection pooling in your application
# Most HTTP clients support this — enable keep-alive and set pool size

PRO TIP

The real fix for port exhaustion is always connection pooling, not kernel tuning. HTTP keep-alive reuses TCP connections across multiple requests, meaning one connection handles hundreds or thousands of requests instead of one. Configure your HTTP client library with keep-alive enabled (usually the default) and a reasonable pool size (10-50 connections per destination). This eliminates TIME_WAIT accumulation entirely for that destination.

Part 4: Kubernetes Service Types: TCP Connections All the Way Down

Every Kubernetes Service type is ultimately just a mechanism for routing TCP (or UDP) connections from one IP:port to another. Understanding which IP:port pairs are involved at each layer demystifies Service networking.

ClusterIP: Virtual IP with DNAT

A ClusterIP Service creates a virtual IP address that exists only in iptables/IPVS rules. No network interface is bound to this IP. When a pod connects to the ClusterIP, the kernel rewrites the destination:

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  type: ClusterIP
  selector:
    app: api
  ports:
    - port: 80           # The port clients connect to (on the ClusterIP)
      targetPort: 8080    # The port the pod is actually listening on

What happens at the TCP level:

1. Pod A sends SYN to 10.96.0.100:80 (ClusterIP)
2. iptables/IPVS intercepts the packet BEFORE it leaves the node
3. DNAT rewrites destination to 10.244.1.5:8080 (Pod B's IP and targetPort)
4. SYN arrives at Pod B on port 8080
5. Pod B sends SYN-ACK back to Pod A's real IP (SNAT is not needed for in-cluster)
6. Connection is ESTABLISHED between Pod A and Pod B
7. All subsequent packets for this connection go to the same Pod B (conntrack)

KEY CONCEPT

The ClusterIP does not receive packets. No socket is listening on 10.96.0.100:80. The iptables DNAT rule rewrites the destination address before the packet enters the network stack. This means you cannot ping a ClusterIP. ICMP is not handled by iptables DNAT rules. You can only connect to the specific ports defined in the Service.

NodePort: Port on Every Node

A NodePort Service opens a port in the range 30000-32767 on every node in the cluster. Any traffic arriving at that port on any node is forwarded to the backing pods.

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  type: NodePort
  selector:
    app: api
  ports:
    - port: 80            # ClusterIP port (still created)
      targetPort: 8080    # Pod port
      nodePort: 30080     # Port opened on every node

At the TCP level:

1. External client sends SYN to Node1:30080
2. kube-proxy's iptables rules on Node1 intercept the SYN
3. DNAT rewrites destination to Pod B (maybe on Node2): 10.244.2.5:8080
4. If Pod B is on a different node, the packet is forwarded across the node network
5. SNAT rewrites the source IP to Node1's IP (so the return path works)
6. Connection is ESTABLISHED

WARNING

NodePort uses SNAT (Source NAT) when forwarding to pods on other nodes. This means the pod sees the node's IP as the source, not the original client IP. If your application needs the real client IP (for rate limiting, geolocation, or logging), you must set externalTrafficPolicy: Local on the Service. This skips SNAT but only routes to pods on the node that received the traffic, which means if no pod is on that node, the connection is refused.

LoadBalancer: Cloud LB in Front of NodePort

A LoadBalancer Service creates a cloud load balancer (AWS NLB/ALB, GCP LB, etc.) that forwards to the NodePort on your cluster nodes.

Kubernetes Service Types, TCP Connection Path

External Client

Cloud Load Balancer (NLB/ALB)

Node 1 (NodePort 30080)

Node 2 (NodePort 30080)

kube-proxy iptables/IPVS DNAT

ClusterIP 10.96.0.100:80 (virtual)

Pod A: 10.244.1.5:8080

Pod B: 10.244.2.5:8080

Hover components for details

The full path for a LoadBalancer Service:

Client → Cloud LB → Node:NodePort → iptables DNAT → Pod:targetPort

Each arrow is a real TCP connection (or at minimum a packet rewrite). Understanding this chain tells you where to look when things break:

Symptom	Likely layer	How to check
Cannot reach the service at all	Cloud LB or security group	Check LB health checks, security group rules
Intermittent connection refused	NodePort not open or pod not ready	Check `ss -tlnp` on the node, check pod readiness
Connections succeed but responses slow	Pod itself or inter-node network	Check pod CPU/memory, check node-to-node latency
Random 5xx errors	Backend pod crashing or misconfigured	Check pod logs, check targetPort matches actual listen port

Part 5: The Port Naming Confusion: Resolved

Kubernetes has four different port-related fields, and they are a constant source of confusion:

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  type: NodePort
  ports:
    - port: 80            # (1) Service port — what clients inside the cluster connect to
      targetPort: 8080    # (2) Pod port — what the container is actually listening on
      nodePort: 30080     # (3) Node port — external access port on every node
      protocol: TCP

---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: api
          ports:
            - containerPort: 8080   # (4) Documentation only — does NOT open the port

KEY CONCEPT

The containerPort field in a pod spec is purely informational. It does NOT open or close any port. Your container listens on whatever port the application binds to, regardless of what containerPort says. You could set containerPort: 9999 while the app listens on 8080, and everything would work fine (though it would confuse everyone). The targetPort in the Service is what actually matters, it must match the port your application is listening on.

Field	Where	Purpose	Required to match
`containerPort`	Pod spec	Documentation for humans and tooling	Nothing (informational)
`targetPort`	Service spec	Actual port the Service sends traffic to on the pod	Must match what the app listens on
`port`	Service spec	Port the Service is available on (ClusterIP:port)	What in-cluster clients connect to
`nodePort`	Service spec	Port opened on every node for external access	What external clients connect to

# Verify the chain is correct:

# 1. What port is the app listening on?
kubectl exec my-pod -- ss -tlnp
# LISTEN  0  128  0.0.0.0:8080  0.0.0.0:*

# 2. What targetPort does the Service point to?
kubectl get svc api-service -o jsonpath='{.spec.ports[0].targetPort}'
# 8080  ← must match step 1

# 3. What port is the Service available on?
kubectl get svc api-service -o jsonpath='{.spec.ports[0].port}'
# 80  ← what clients use: api-service:80

# 4. What nodePort is exposed?
kubectl get svc api-service -o jsonpath='{.spec.ports[0].nodePort}'
# 30080  ← external access: <node-ip>:30080

Part 6: Health Checks: TCP vs HTTP

Kubernetes supports two types of network health checks, and they test very different things:

TCP Health Check (tcpSocket)

A TCP health check simply attempts the three-way handshake. If SYN-ACK comes back, the check passes. No data is exchanged.

livenessProbe:
  tcpSocket:
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

This checks: "Is something listening on port 8080?" It does NOT check whether the application is actually functional. A process could be deadlocked, returning errors on every request, or serving stale data, the TCP check will still pass as long as the socket is open.

HTTP Health Check (httpGet)

An HTTP health check performs a full HTTP GET request and checks the response status code.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

This checks: "Does the application respond with a 200-399 status code on /healthz?" It validates that the application is actually running, processing requests, and returning meaningful responses.

PRO TIP

Always prefer HTTP health checks over TCP health checks for HTTP-based services. A TCP check only confirms the port is open, your application could be in a broken state and still accept TCP connections. An HTTP check validates the full application stack. The only time TCP checks make sense is for non-HTTP protocols (databases, Redis, custom TCP services) where you cannot perform an HTTP request.

TCP vs HTTP Health Checks

TCP Health Check (tcpSocket)

Tests if the port is open

What it doesSends SYN, expects SYN-ACK

What it provesSomething is listening on the port

What it missesApp deadlocks, error loops, bad state

Use forNon-HTTP protocols (DB, Redis, custom)

SpeedFast (no application processing)

False positivesHigh, broken apps still pass

HTTP Health Check (httpGet)

Tests if the application responds correctly

What it doesFull HTTP GET to /healthz endpoint

What it provesApp is running and responding correctly

What it missesSpecific endpoint failures (tests one path)

Use forHTTP/gRPC services (most K8s workloads)

SpeedSlower (full request/response cycle)

False positivesLow, validates actual application health

Key Concepts Summary

Ports are 16-bit numbers (0-65535) divided into well-known (0-1023), registered (1024-49151), and ephemeral (49152-65535)
A socket is identified by the 5-tuple: source IP, source port, destination IP, destination port, and protocol
A server can handle millions of connections on one port because each connection has a unique 5-tuple
Ephemeral ports are consumed by every outgoing connection and held for 60 seconds in TIME_WAIT after closing
Port exhaustion happens when TIME_WAIT accumulates faster than ports are released, fix with connection pooling, not kernel tuning
ClusterIP is a virtual IP that exists only in iptables rules, packets are DNAT-rewritten to pod IPs before entering the network
NodePort opens a port on every node and forwards to pods, but uses SNAT that hides the client IP
LoadBalancer chains through all layers: Cloud LB to NodePort to ClusterIP to Pod
containerPort is purely informational, targetPort in the Service must match the actual application listen port
HTTP health checks are better than TCP health checks for validating application health

Common Mistakes

Setting containerPort in the pod spec and assuming it opens or restricts the port, it is purely documentation
Mismatching targetPort in the Service with the actual port the application listens on, traffic silently goes to the wrong port and gets RST
Not configuring connection pooling and hitting ephemeral port exhaustion under load
Using TCP health checks for HTTP services, they pass even when the application is broken
Forgetting that ClusterIP cannot be pinged (ICMP), only TCP/UDP connections to defined ports work
Setting externalTrafficPolicy: Local without ensuring pods are spread across all nodes receiving traffic
Not realizing that NodePort SNAT hides client IPs, critical for rate limiting and access logging

KNOWLEDGE CHECK

Why can a server handle millions of TCP connections on a single port (e.g., port 8080)?

TCP vs UDP, When to Use Which

Continue

HTTP Request/Response Lifecycle

←→ navigateM toggle sidebar