Container Networking Issues
A team's staging environment is identical to prod, same compose file, same images, same container count. In staging,
apican reachdbby name. In prod, it cannot. Nothing in the Dockerfile changed. The environment variables are the same. The hosts are both Ubuntu 22.04. One engineer thinks it's a DNS issue; another thinks it's iptables; a third is sure it's corporate network policy. Two hours of debugging later, the actual cause: in staging the compose file declaresnetworks: { default: { name: myapp } }, and prod's compose file was edited to remove that line during a "cleanup" two weeks ago. The default network in prod was nowdocker_default, and the staging name aliases never carried over. Five minutes ofdocker network inspectwould have found it.Container networking failures tend to look like application bugs: connection refused, DNS failures, intermittent timeouts. The fix almost always lives in one of four places: DNS resolution, network membership, iptables rules, or MTU. This lesson is the flowchart — the specific commands to isolate each, the common patterns that cause them, and the tricks (
nsenter,docker network inspect,docker exec curl) that collapse debug sessions from hours to minutes.
The Four Failure Modes
Every "container network is broken" resolves to one of:
- DNS resolution fails. Container can't resolve
dborapi.internal.example.com. - Wrong network membership. Two containers are on different networks; no route between them.
- iptables / NAT rules wrong. External traffic doesn't reach a published port, or outbound traffic is dropped.
- MTU / fragmentation. TCP handshake succeeds, first response fragment never arrives.
In the debug flow:
Container networking problem.
1. What are we trying to do?
• container → container on same host
• container → host
• container → internet
• external → published port on host
2. From inside the container, can we resolve the target?
docker exec myapp getent hosts <target>
3. Can we TCP-connect?
docker exec myapp nc -zv <target> <port>
4. From the host, what does the network look like?
docker network ls
docker network inspect <name>
sudo iptables -t nat -L | head
5. Worst case: tcpdump
sudo nsenter -t $PID -n tcpdump -i eth0 -n
Step 1: Classify by Direction
Know which direction is failing before doing anything:
| From → to | Typical issue |
|---|---|
| Container → container (same network) | DNS, or app-level bug (wrong port, slow response) |
| Container → container (different networks) | Network membership: they cannot reach each other |
| Container → host | host.docker.internal + host-gateway, firewall rules |
| Container → internet | NAT masquerading, DNS upstream, egress filtering |
| External → host's published port | -p publishing, iptables NAT, host firewall interaction |
| Host → container internal IP | Almost always works; when it doesn't, the container is running as expected but listening on wrong interface |
Step 2: DNS First
getent hosts is the canonical "resolve this name the way my system would" command:
# Does the container's resolver see 'db'?
docker exec myapp getent hosts db
# 172.20.0.2 db
# If nothing prints, DNS is failing. Check the resolver config:
docker exec myapp cat /etc/resolv.conf
# nameserver 127.0.0.11 ← Docker embedded DNS
# options ndots:0
# Specific DNS query
docker exec myapp nslookup db
# Server: 127.0.0.11
# Non-authoritative answer:
# Name: db
# Address: 172.20.0.2
Common DNS failures
-
On the default bridge, container names do not resolve. Move containers to a user-defined network.
docker network create mynet docker network connect mynet api docker network connect mynet db -
/etc/resolv.confdoesn't point to 127.0.0.11. Rare; usually caused by the image overwriting it, or a custom--dns=flag. -
Docker embedded DNS is down. The daemon needs to be healthy; restart Docker if 127.0.0.11 is completely unresponsive.
-
Upstream (external) DNS is broken. The container's resolver forwards unknown names to the host's
/etc/resolv.confupstream. If the host can resolvegithub.comand the container can't, Docker's embedded DNS has a forwarding issue — check the daemon log. -
Alpine DNS quirks. musl's resolver does not handle
searchandndotsthe same as glibc. In Kubernetes, pods using/etc/resolv.confwithsearch cluster.local svc.cluster.local ... ndots:5can get unexpected behavior on Alpine. Workaround: usendots:2or absolute names.
A fast DNS sanity check from inside a container: nslookup <target>, getent hosts <target>, dig <target> if available, and cat /etc/resolv.conf. If one works and another does not, you have a name-resolution-order bug (nsswitch / resolv.conf search / glibc-vs-musl mismatch). If all fail, the network itself or the resolver config is wrong.
Step 3: TCP-Level Connectivity
DNS can resolve, but the port might be closed or blocked:
# nc (netcat) as a TCP port prober — fast
docker exec myapp nc -zv db 5432
# db (172.20.0.2:5432) open ← TCP handshake worked
# or
# db (172.20.0.2:5432) failed: Connection refused ← something listening? different port?
# or
# nc: connect to db port 5432 (tcp) timed out ← firewall or host unreachable
# curl as an HTTP prober — gives you the HTTP layer too
docker exec myapp curl -vI http://db:8080
# * Trying 172.20.0.2:8080...
# * Connected to db (172.20.0.2) port 8080 (#0)
# > HEAD / HTTP/1.1
# Without nc / curl in the image (distroless): wget (if that's present) or bash's /dev/tcp
docker exec myapp bash -c '</dev/tcp/db/5432'
# returns 0 = connected, non-zero = failed
Interpreting the results
- "Connection refused" → the destination is reachable (routing/firewall fine) but nothing is listening on that port. Check the target container's logs; confirm it's up and bound correctly.
- "Connection timed out" → the destination is either not reachable at all (routing/firewall) or is silently dropping SYN packets. Bigger problem than refused.
- "No route to host" → the routing layer has no idea how to reach the destination. Usually wrong network or missing veth.
- "Name or service not known" → DNS, not TCP. Go back to step 2.
Step 4: Network Membership
docker network inspect is the ground truth for which containers are on which network, with which IPs:
docker network ls
# NETWORK ID NAME DRIVER SCOPE
# abc123... bridge bridge local
# def456... host host local
# 111222... none null local
# mynet-id myapp bridge local
docker network inspect myapp
# [
# {
# "Name": "myapp",
# "Id": "...",
# "Scope": "local",
# "Driver": "bridge",
# "IPAM": {
# "Config": [{"Subnet": "172.20.0.0/16", "Gateway": "172.20.0.1"}]
# },
# "Containers": {
# "<api-id>": {
# "Name": "api",
# "IPv4Address": "172.20.0.3/16",
# "IPv6Address": ""
# },
# "<db-id>": {
# "Name": "db",
# "IPv4Address": "172.20.0.2/16",
# "IPv6Address": ""
# }
# },
# ...
# }
# ]
Checks:
- Are both containers listed? If not, one is not attached.
- Do they have valid IPs on the same subnet?
- Is "Driver"
bridge(user-defined)? A container on the defaultbridgedoes not get DNS-based discovery.
Attach a running container to a network
docker network connect myapp api
docker network disconnect default api # if you also want to remove from old network
See which networks a container is on
docker inspect api --format='{{range $k, $v := .NetworkSettings.Networks}}{{$k}}: {{$v.IPAddress}}{{printf "\n"}}{{end}}'
# myapp: 172.20.0.3
# bridge: 172.17.0.2
If a container appears on the wrong network (or on none), that is your problem.
Step 5: iptables Rules (For Published Ports or Inter-Container Policy)
# Docker's NAT rules (for -p port publishing)
sudo iptables -t nat -L DOCKER -n -v
# Chain DOCKER (2 references)
# pkts bytes target prot opt in out source destination
# 0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
# 12 720 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80
# SNAT (outbound from container to internet)
sudo iptables -t nat -L POSTROUTING -n -v | grep -i masq
# MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
# Filter rules (allow / drop between containers)
sudo iptables -L DOCKER-USER -n -v
sudo iptables -L FORWARD -n -v
If Docker's iptables rules are missing or wrong, port publishing fails silently:
- "My
-p 8080:80doesn't reach the container" → check DNAT rule exists; restart the container to re-insert. - "UFW is blocking port 8080" → it isn't, because Docker's rules run before UFW's filter rules (covered in Module 3 Lesson 1). Bind to
127.0.0.1:8080:80to keep it private. - "Nothing can reach anything" →
iptables-saveand look for rules dropping all forwarded traffic (some security policies do this). Docker needs FORWARD chain to be accepted.
The DOCKER-USER chain is yours
Docker leaves DOCKER-USER available for your custom rules. Any rules you add there run before Docker's automatic ones:
# Example: block all external access to docker containers
sudo iptables -I DOCKER-USER -i eth0 -j DROP
# ...except SSH
sudo iptables -I DOCKER-USER -i eth0 -p tcp --dport 22 -j ACCEPT
Use this for fleet-wide isolation policies.
Step 6: Packet Capture (The Last Resort)
When nothing else finds it, capture:
# From the host, into the container's network namespace
PID=$(docker inspect myapp --format='{{.State.Pid}}')
sudo nsenter -t $PID -n tcpdump -i eth0 -n 'port 5432'
# Watch packets come in/out; do the SYNs get responses?
# Alternative: tcpdump on the host's veth interface (matches docker0 on default bridge)
# Find the veth connected to our container
sudo bridge link | grep "veth" | head
# 5: vethabcdef@if4: ... master docker0 state forwarding priority 32 cost 100
sudo tcpdump -i vethabcdef -n 'port 5432'
# Or on docker0 / user-defined bridge
sudo ip link show | grep -E 'docker|br-'
# docker0 (default bridge)
# br-11223344 (your user-defined bridge)
sudo tcpdump -i br-11223344 -n
# Write to pcap for Wireshark analysis
sudo tcpdump -i br-11223344 -w /tmp/trace.pcap 'port 5432'
Packet capture reveals:
- TCP SYN-and-no-response → routing or filter issue.
- TCP RST → the destination closed; check the app's logs on the other side.
- Successful handshake, but then short packet lengths → MTU issue.
- DNS queries that never get replies → DNS server unreachable.
nsenter -t <pid> -n tcpdump is the single move that solves the most networking mysteries. You get host-side tools with the container's network namespace view. No matter how minimal the image is, you have full packet capture and analysis. Add this to muscle memory; it pays back tenfold.
MTU Issues: The Silent Weird Case
MTU (Maximum Transmission Unit) defines the largest packet size an interface can send. If the path between two endpoints has a smaller MTU than either endpoint's interface, packets are fragmented — or dropped, if "don't fragment" is set (which it is on modern TCP).
Classic symptoms:
- TCP handshake succeeds (small packets).
- First HTTP request header fits in one packet — gets through.
- The response's body spans multiple packets — first fragment is dropped.
- Request appears to hang; eventually times out.
- Only breaks for larger payloads; small GETs work fine.
MTU mismatches happen often in:
- VPN overlays (Docker's overlay networks, Kubernetes CNIs like Calico, VPN tunnels) — they encapsulate traffic, adding headers that may exceed the base MTU.
- Cloud provider networking (AWS jumbo frames, 9001 MTU on EC2 vs 1500 default on most paths).
- Environments with corporate VPNs.
Diagnose
# Current MTU on each interface
docker exec myapp ip link
# 2: eth0@if5: ... mtu 1500 ...
# The host side
ip link show docker0
# docker0: ... mtu 1500 ...
# Test with ping and DF bit
ping -M do -s 1472 <target> # 1500 - 28 (IP + ICMP headers) = 1472
# If this fails with "Message too long" or times out, path MTU < 1500
# Try smaller: -s 1300, -s 1200, etc., to find the working size
Fix
# Lower the container's interface MTU
docker network create --opt com.docker.network.driver.mtu=1450 mynet
# Or for the default bridge, in /etc/docker/daemon.json:
# {
# "mtu": 1450
# }
# Or via the compose file
# networks:
# mynet:
# driver_opts:
# com.docker.network.driver.mtu: 1450
1450 is a common safe value accounting for common overlays. 1400 is even safer; 1280 is IPv6 minimum.
Case Studies
"Container can reach the internet but not the DB"
Classic: DB is on a different Docker network, or no --network was specified for one of them.
docker network inspect myapp | jq '.[0].Containers'
# Only 'api' is listed. Where's 'db'?
docker inspect db --format='{{range $k,$v := .NetworkSettings.Networks}}{{$k}} {{end}}'
# bridge ← oops, not on myapp
# Fix:
docker network connect myapp db
"Container can reach the internet but DNS is slow"
Container inherits host's /etc/resolv.conf for upstream DNS. If the host's resolver is slow or misconfigured, every non-Docker lookup is slow.
docker exec myapp cat /etc/resolv.conf
# nameserver 127.0.0.11
# options ndots:0
# search <nothing relevant>
# (Docker's embedded DNS forwards to whatever the host has)
# On the host
cat /etc/resolv.conf
# nameserver 192.168.1.1 ← is this DNS server fast?
# nameserver 8.8.8.8 ← fallback
time dig google.com # from the host
# Query time: 4s ← the host itself is slow; container inherits
Fix the host's DNS or configure the daemon with explicit DNS servers:
// /etc/docker/daemon.json
{
"dns": ["1.1.1.1", "8.8.8.8"]
}
"Published port is reachable from the host but not externally"
Host bound the port to a specific interface:
docker port myapp
# 8080/tcp -> 127.0.0.1:8080 ← only localhost
# Fix: change the publish
docker rm -f myapp
docker run -p 0.0.0.0:8080:8080 myapp # or just -p 8080:8080
Or: firewall, cloud security group, or corporate network blocks inbound 8080. Check with curl from an external host.
"Two compose projects can't talk to each other"
Each docker compose project creates its own default network, named <project>_default. Containers in project A are on projA_default; containers in project B are on projB_default. They cannot talk to each other unless you explicitly create a shared external network:
# Project A: compose.yaml
networks:
shared:
external: true
services:
api:
networks: [shared]
# Project B: compose.yaml
networks:
shared:
external: true
services:
db:
networks: [shared]
Create the network once manually:
docker network create shared
Now both projects' containers share the shared network and can see each other by name.
"Everything looks right but requests hang at the TLS handshake"
Often MTU. TLS handshakes involve some medium-sized packets (1400-1800 bytes for cert chains). If path MTU is below 1500 and DF is set, these drop silently. Symptom: curl -v shows "Connected" then hangs.
Try lowering the MTU on the network.
A team running Kubernetes with Cilium CNI hit mysterious intermittent slowness between pods on different nodes. The base MTU was 1500, Cilium added VXLAN overhead (~50 bytes), and the effective path MTU was ~1450. Pods with MTU 1500 emitted 1500-byte packets with DF set; the encapsulation path dropped them. Small requests worked (under 1450); large responses hung. Fix was to lower the interface MTU to 1400. That one sysctl change resolved weeks of on-and-off latency reports. Any overlay networking — Docker overlay, K8s VXLAN-based CNIs, VPNs — is an MTU minefield.
The Quick Checklist
When "container networking is broken," run this in order:
# 1. DNS
docker exec myapp getent hosts <target>
# 2. TCP
docker exec myapp nc -zv <target> <port> # or bash /dev/tcp/<target>/<port>
# 3. Networks
docker network inspect $(docker inspect myapp --format='{{range $k, $v := .NetworkSettings.Networks}}{{$k}} {{end}}')
# 4. Routes inside container
docker exec myapp ip route
# 5. Host firewall / NAT
sudo iptables -t nat -L DOCKER -n -v
sudo iptables -L DOCKER-USER -n -v
sudo iptables -L FORWARD -n -v | head
# 6. MTU sanity check
docker exec myapp ping -M do -s 1472 <target>
# 7. Packet capture if still stuck
PID=$(docker inspect myapp --format='{{.State.Pid}}')
sudo nsenter -t $PID -n tcpdump -i eth0 -n host <target>
95% of container networking issues show up in the first three steps.
Key Concepts Summary
- Classify by direction first. Container → container, container → host, container → internet, external → host — each has a different diagnostic path.
- DNS is the #1 failure mode.
getent hosts,/etc/resolv.conf, user-defined networks vs defaultbridge. - TCP probes with
nc -zvorbash /dev/tcp. Tells you "reachable? port open? timed out or refused?" docker network inspectshows which containers are on which network with which IPs. Ground truth.- iptables in NAT table handles port publishing. In FORWARD / DOCKER-USER for filter rules. UFW does not block Docker's published ports.
- MTU issues are the silent case. Small packets work, large packets hang. Check with
ping -M do -s 1472. nsenter -t <pid> -n <tool>is the master key for debugging from host-side tools with container's namespace view.- tcpdump on the veth or bridge reveals what packets are actually flowing.
- Default
bridgehas no name-based DNS. Always use a user-defined network for multi-container apps. - Two compose projects isolate by default. Use an external network to connect them intentionally.
Common Mistakes
- Debugging DNS with
pingwhengetent hostsis the correct tool (respects nsswitch / musl differences). - Assuming UFW blocks Docker's published ports. It does not; Docker's NAT runs before UFW's filter.
- Using the default
bridgenetwork and expecting container-name DNS to work. - Forgetting that
nc/curlmay not be in a minimal image. Usebash </dev/tcp/...ornsenterfrom host. - Packet-capturing on the wrong interface. For a container on a user-defined network, capture on
br-<something>, notdocker0. - Blaming "network" for connection-refused; it's actually "nothing is listening on that port" — check the destination service.
- Setting
--network=host"for simplicity" in production. You lose network isolation entirely; port conflicts and security surface both increase. - Not pinning MTU on overlay networks. Encapsulation overhead eats MTU budget; silent packet drops ensue.
- Modifying iptables directly without using
DOCKER-USER. Docker's automatic rules re-add themselves; your edits get wiped. - Giving up before running tcpdump. 5 minutes of packet capture usually reveals the problem when everything else is ambiguous.
Two containers on the same host, both started with `docker run -d --network host`. Container A listens on port 8080 inside. Container B tries to curl `http://A:8080` and fails with 'name or service not known.' What went wrong, and how do you fix it?