Networking Fundamentals for DevOps Engineers

Status Codes That Matter in Production

Your monitoring dashboard lights up. The 502 error rate on the API gateway just jumped from 0.01% to 15%. PagerDuty fires. Three teams jump on a call. The frontend team says it is a backend issue. The backend team says their pods are healthy. The platform team checks the ingress controller.

Everyone is guessing. But the status code itself — 502 — tells you exactly where the problem is. A 502 means the reverse proxy got a bad response from the upstream. The backend is reachable but responding with garbage, or crashing mid-response. This is fundamentally different from a 503 (overloaded) or a 504 (timeout).

This lesson covers every status code you will encounter in production Kubernetes environments, what each one actually means at the protocol level, and what to do when you see it.


Part 1: 2xx — Success

The 2xx range means the request was received, understood, and accepted. But which 2xx code matters.

The Codes

CodeNameMeaningWhen to use
200OKRequest succeeded, response body has the resultGET requests, search results
201CreatedA new resource was createdPOST requests that create entities
202AcceptedRequest accepted for processing, not yet completeAsync operations (job started, will finish later)
204No ContentRequest succeeded, no body to returnDELETE requests, PUT that does not return the entity
# 200 — standard success
curl -s -o /dev/null -w "%{http_code}" https://api.example.com/users
# 200

# 201 — resource created (check the Location header for the new URL)
curl -v -X POST https://api.example.com/users \
  -H "Content-Type: application/json" \
  -d '{"name": "Alice"}'
# < HTTP/2 201
# < Location: /users/456

# 204 — success, no body
curl -v -X DELETE https://api.example.com/users/456
# < HTTP/2 204
# (empty body)
PRO TIP

When designing APIs, use 201 (not 200) for resource creation and include a Location header pointing to the new resource. Use 204 (not 200 with empty body) when there is nothing to return. These distinctions matter for client libraries, API documentation generators, and any middleware that behaves differently based on status codes. Many monitoring tools track 201s separately from 200s to measure creation rates.


Part 2: 3xx — Redirects

The 3xx range means the client needs to take additional action to complete the request — usually following a different URL.

The Codes

CodeNameCacheable?Method preserved?Use case
301Moved PermanentlyYesNo (may change to GET)Domain migration, HTTP to HTTPS
302FoundNoNo (may change to GET)Temporary redirect (legacy, ambiguous)
307Temporary RedirectNoYes (must keep method)Temporary redirect (correct behavior)
308Permanent RedirectYesYes (must keep method)Permanent redirect (preserves method)
KEY CONCEPT

The critical difference between 301/302 and 307/308 is method preservation. A 301 redirect of a POST request may be changed to a GET by the browser (and most clients do this). A 308 redirect of a POST keeps it as a POST. If your API redirects are changing POST requests to GET and losing the body, you need 307/308 instead of 301/302.

# See the redirect chain
curl -v -L https://example.com/old-path
# < HTTP/2 301
# < location: https://example.com/new-path
# > GET /new-path HTTP/2
# < HTTP/2 200

# Common in Kubernetes: HTTP to HTTPS redirect
curl -v http://api.example.com/users
# < HTTP/1.1 308 Permanent Redirect
# < Location: https://api.example.com/users
# NGINX Ingress annotation for HTTP to HTTPS redirect
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
WARNING

Redirect chains (A redirects to B, B redirects to C, C redirects to D) multiply latency and confuse debugging. Each redirect is a full HTTP round trip. Most clients follow a maximum of 5-10 redirects before giving up with "too many redirects." In Kubernetes, the most common redirect loop is when the ingress redirects HTTP to HTTPS, and a downstream proxy redirects HTTPS back to HTTP. Check your ingress annotations and backend configuration to ensure they agree on the protocol.


Part 3: 4xx — Client Errors

The 4xx range means the request was wrong — bad syntax, missing authentication, forbidden resource, or nonexistent URL. The client needs to fix something before retrying.

The Codes

CodeNameWhat it really meansCommon cause in K8s
400Bad RequestRequest syntax is invalidMalformed JSON, missing required field, wrong Content-Type
401UnauthorizedAuthentication missing or invalidExpired token, missing Authorization header
403ForbiddenAuthenticated but not authorizedRBAC denial, missing permissions
404Not FoundThe URL does not match any routeWrong path, missing Ingress rule, backend route not defined
405Method Not AllowedThe HTTP method is not supported for this pathPOST to a GET-only endpoint
408Request TimeoutClient took too long to send the requestSlow client, large upload on slow connection
413Content Too LargeRequest body exceeds server limitsFile upload exceeds NGINX client_max_body_size
415Unsupported Media TypeContent-Type header not acceptedSending JSON without application/json header
429Too Many RequestsRate limit exceededAPI throttling, too many requests from one client
KEY CONCEPT

The HTTP spec named 401 "Unauthorized" but it actually means "Unauthenticated" — the server does not know who you are. Code 403 "Forbidden" means "Authenticated but not Authorized" — the server knows who you are but you do not have permission. This naming confusion causes bugs in API design everywhere. When building or debugging APIs: 401 means "show the login page," 403 means "show the access denied page."

429 Too Many Requests — Rate Limiting

Rate limiting is essential in production. When you hit a 429, check the response headers:

curl -v https://api.example.com/users
# < HTTP/2 429
# < Retry-After: 30
# < X-RateLimit-Limit: 100
# < X-RateLimit-Remaining: 0
# < X-RateLimit-Reset: 1699900800
HeaderMeaning
Retry-AfterSeconds to wait before retrying
X-RateLimit-LimitTotal requests allowed per window
X-RateLimit-RemainingRequests left in the current window
X-RateLimit-ResetUnix timestamp when the window resets
PRO TIP

When your service returns 429, always include a Retry-After header. Well-behaved clients use it to back off. Without it, clients will retry immediately and make the overload worse. On the client side, implement exponential backoff with jitter: wait 1s, then 2s, then 4s (plus random jitter) — never retry in a tight loop.

413 Content Too Large — The NGINX Gotcha

# Your file upload fails with 413
curl -v -X POST https://api.example.com/upload \
  -F "file=@large-report.pdf"
# < HTTP/1.1 413 Request Entity Too Large

In Kubernetes with NGINX Ingress, the default client_max_body_size is 1MB. Any request body larger than that gets rejected before it reaches your backend:

# Fix: increase the limit in the Ingress annotation
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"   # Allow up to 50MB
WAR STORY

A team deployed a new document upload feature that worked perfectly in development (no ingress, direct port-forward). In staging, uploads over 1MB silently failed. The frontend showed a generic "upload failed" error. The backend logs showed nothing — the request never reached the backend. It took four hours to realize the NGINX Ingress was rejecting the request before it was proxied. The 413 response was only visible in the browser network tab. Always check ingress-level limits when debugging request failures that do not appear in backend logs.


Part 4: 5xx — Server Errors

The 5xx range means the server failed to fulfill a valid request. The client did nothing wrong — the server (or the infrastructure in front of it) has a problem.

The Codes

CodeNameWhat brokeWhere to look
500Internal Server ErrorThe application crashed or threw an unhandled exceptionApplication logs
502Bad GatewayThe reverse proxy got an invalid response from the backendIngress logs, backend health
503Service UnavailableThe server is overloaded or explicitly refusing requestsPod readiness, circuit breakers
504Gateway TimeoutThe reverse proxy timed out waiting for the backendBackend latency, proxy timeout config

5xx Error Decision Tree — 502 vs 503 vs 504

Where the error originates

Understanding the error source

500Inside your application code (unhandled exception)
502At the reverse proxy — backend gave bad response
503Backend explicitly says it cannot handle request
504At the reverse proxy — backend did not respond in time
What to check first

Debugging each error type

500Application logs, stack traces, error handlers
502Is the backend pod running? Is targetPort correct? Is the protocol correct?
503Are pods in CrashLoopBackOff? Is readiness probe failing? Is a circuit breaker open?
504Is the backend slow? Is proxy-read-timeout too short? Is the database slow?

502 Bad Gateway — The Proxy Got Garbage

A 502 means the reverse proxy (NGINX Ingress, AWS ALB, Envoy) successfully connected to the backend but received an invalid response. Common causes:

# Cause 1: Backend pod crashed mid-response
# The proxy opened a TCP connection, sent the request, but the backend died
# before completing the response

# Cause 2: Protocol mismatch
# Ingress is sending HTTP/1.1 but the backend expects HTTP/2 (or vice versa)
# Fix for NGINX Ingress:
# nginx.ingress.kubernetes.io/backend-protocol: "GRPC"  (for gRPC backends)
# nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"  (for HTTPS backends)

# Cause 3: targetPort mismatch
# Service points to port 80 but the pod listens on 8080
# The proxy connects to port 80, nothing responds properly → 502

# Cause 4: Pod terminated during rolling update
# Ingress still has the old pod IP in its upstream list
# Fix: proper readiness probes + preStop hooks

# Debug: check the NGINX Ingress error log
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller | grep "502"
# upstream prematurely closed connection while reading response header
KEY CONCEPT

"502 Bad Gateway" almost always means the problem is between the reverse proxy and the backend, not between the client and the reverse proxy. The client's request was fine. The proxy forwarded it to the backend, and the backend either crashed, returned nonsense, or closed the connection prematurely. Start debugging at the backend pod — check logs, check if it is running, check if it is listening on the correct port and protocol.

503 Service Unavailable — Explicitly Overloaded

A 503 means the backend is explicitly saying "I cannot handle this request right now." This is different from a 502 (where the backend broke) — a 503 is an intentional response.

# Common sources of 503 in Kubernetes:

# 1. No ready pods — all pods are failing readiness probes
kubectl get pods -l app=api
# NAME        READY   STATUS    RESTARTS   AGE
# api-abc123  0/1     Running   0          5m    ← not ready

# 2. All endpoints removed from the Service
kubectl get endpoints api-service
# NAME          ENDPOINTS   AGE
# api-service   <none>      5m    ← no backends

# 3. Circuit breaker open (Istio/Envoy)
# The service mesh detected too many errors and is short-circuiting requests

# 4. Application-level rate limiting
# The app itself returns 503 when it is at capacity

504 Gateway Timeout — Backend Too Slow

A 504 means the reverse proxy waited for the backend to respond, and the backend did not respond within the timeout period. The request may still be processing on the backend.

# NGINX Ingress default timeouts
# proxy-connect-timeout: 5s    (time to establish TCP connection to backend)
# proxy-send-timeout: 60s      (time to send the request to backend)
# proxy-read-timeout: 60s      (time to read the response from backend)

# If your API endpoint takes 90 seconds to process a report:
# → NGINX gives up after 60 seconds → returns 504
# → But the backend is still processing the request!

# Fix: increase timeout for slow endpoints
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
WARNING

Increasing timeouts is a band-aid, not a fix. If your backend regularly takes more than 60 seconds to respond, you have an architectural problem. Convert long-running operations to async: accept the request immediately (return 202 Accepted), process it in the background, and let the client poll for completion or receive a webhook. Never make a user's browser wait 2 minutes for an HTTP response.


Part 5: Debugging Status Codes in Kubernetes

When you see an error status code, the first question is: who generated it? Was it the application, the ingress controller, the cloud load balancer, or the service mesh sidecar?

The Debugging Chain

# Step 1: Is the error from the application or the ingress?
# Check the ingress controller logs:
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=50 | grep "api.example.com"

# If the error is in the ingress log with "upstream" messages → problem is behind the ingress
# If the error is NOT in the ingress log → problem is in front of the ingress (LB, DNS)

# Step 2: Can the ingress reach the backend?
# Port-forward directly to the pod, bypassing the ingress:
kubectl port-forward pod/api-abc123 8080:8080
curl -v http://localhost:8080/users
# If this works → problem is in the ingress configuration
# If this fails → problem is in the application

# Step 3: Is the Service routing correctly?
kubectl get endpoints api-service
# ENDPOINTS
# 10.244.1.5:8080,10.244.2.10:8080    ← healthy endpoints exist

# Step 4: Check for recent pod restarts
kubectl get pods -l app=api --sort-by=.status.containerStatuses[0].restartCount
# If restarts are happening, 502s coincide with pods restarting

# Step 5: Check backend response time
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
  curl -s -o /dev/null -w "time_total: %{time_total}s\n" \
  http://api-service.default.svc:80/users
# If time_total exceeds proxy-read-timeout → 504s expected

Status Code Debugging Flowchart

Click each step to explore

PRO TIP

Build a mental model of the request path: Client to Cloud LB to Ingress Controller to K8s Service to Pod. Each layer can generate different error codes. When debugging, bisect the path: port-forward directly to the pod to determine if the problem is in the application or in the infrastructure. This single technique eliminates half the search space immediately.


Key Concepts Summary

  • 2xx means success: 200 (OK), 201 (Created), 204 (No Content) — use the specific code, not just 200 for everything
  • 3xx means redirect: 301/308 are permanent, 307 is temporary. 307/308 preserve the HTTP method; 301/302 may change POST to GET
  • 4xx means the client is wrong: 400 (bad syntax), 401 (not authenticated), 403 (not authorized), 404 (not found), 429 (rate limited)
  • 401 means "unauthenticated" not "unauthorized" despite the confusing name — 403 is the real "unauthorized"
  • 5xx means the server is broken: the critical trio is 502/503/504, each pointing to a different failure
  • 502 Bad Gateway: proxy connected to backend but got an invalid response — check targetPort, protocol, pod health
  • 503 Service Unavailable: backend explicitly refuses — check readiness probes, endpoints, circuit breakers
  • 504 Gateway Timeout: proxy waited too long for backend — check backend latency, proxy-read-timeout
  • The ingress controller is the most common source of 5xx in K8s — always check its logs first

Common Mistakes

  • Returning 200 for everything and putting the real status in the JSON body — breaks HTTP caching, monitoring, and middleware
  • Confusing 401 (not authenticated) with 403 (not authorized) — these have different UI and retry implications
  • Assuming 502 means "the backend is down" — it means the backend responded badly, which could be a protocol mismatch or crash mid-response
  • Increasing proxy-read-timeout to 5 minutes instead of making the operation async — you are masking a design problem
  • Not setting proxy-body-size in the ingress annotation — file uploads fail silently at 1MB default
  • Ignoring Retry-After headers from 429 responses and retrying immediately — this makes rate limiting worse
  • Not checking ingress controller logs when debugging 5xx — the application logs may show nothing if the request never reached the app

KNOWLEDGE CHECK

Your monitoring shows a spike in 502 errors on an NGINX Ingress. The backend pods show no errors in their logs. What is the most likely cause?