DNS Debugging with dig, nslookup & tcpdump
It is 2 AM. PagerDuty fires. Your monitoring system reports that 30% of API requests are failing with "no such host" errors. Some pods can resolve DNS, others cannot. The failures are intermittent — a pod fails one query and succeeds the next. CoreDNS is running. The upstream resolver is healthy.
This is when you need more than
nslookup. You needdigwith its full arsenal of flags,tcpdumpto capture what is actually on the wire, and a systematic approach to isolating where in the DNS chain the failure occurs.
dig: The Power Tool
dig (Domain Information Groper) is the most powerful DNS debugging tool. Unlike nslookup, dig gives you complete control over every aspect of the query and shows you the full response with all sections.
Basic Usage
# Simple A record lookup
dig devopsbeast.com A
# Output sections:
# ;; QUESTION SECTION: <-- What you asked
# ;devopsbeast.com. IN A
#
# ;; ANSWER SECTION: <-- The answer
# devopsbeast.com. 300 IN A 104.21.45.67
#
# ;; AUTHORITY SECTION: <-- Who is authoritative
# devopsbeast.com. 1800 IN NS ns1.cloudflare.com.
#
# ;; ADDITIONAL SECTION: <-- Extra info (glue records)
# ns1.cloudflare.com. 300 IN A 173.245.58.51
#
# ;; Query time: 12 msec
# ;; SERVER: 1.1.1.1#53
# ;; MSG SIZE rcvd: 123
Essential dig Flags
# Query a specific DNS server
dig @8.8.8.8 devopsbeast.com A
dig @10.96.0.10 my-service.default.svc.cluster.local A # CoreDNS
# Short output (just the answer)
dig devopsbeast.com A +short
# 104.21.45.67
# Show only the answer section
dig devopsbeast.com A +noall +answer
# devopsbeast.com. 300 IN A 104.21.45.67
# Trace the full resolution chain
dig devopsbeast.com A +trace
# Shows: root → TLD → authoritative → answer
# Check specific record types
dig devopsbeast.com MX +short
dig devopsbeast.com TXT +short
dig devopsbeast.com NS +short
dig devopsbeast.com SOA +short
dig devopsbeast.com ANY +short # All records (many servers block this)
# Reverse DNS lookup
dig -x 8.8.8.8 +short
# dns.google.
# Query with TCP instead of UDP
dig devopsbeast.com A +tcp
# Set a custom timeout (in seconds)
dig devopsbeast.com A +time=2 +tries=1
The +trace flag is your most powerful debugging tool. It shows every step of the resolution chain: root, TLD, authoritative. When DNS is broken, +trace tells you exactly where the chain breaks. If root and TLD respond but the authoritative server does not, the problem is at your DNS provider. If the TLD returns wrong NS records, your domain registration is misconfigured.
dig Inside Kubernetes Pods
Most minimal container images do not include dig. You have several options:
# Option 1: Use a debug container with dig installed
kubectl run debug --image=nicolaka/netshoot --rm -it -- bash
dig my-service.default.svc.cluster.local A
# Option 2: Use kubectl debug (ephemeral containers)
kubectl debug -it my-pod --image=nicolaka/netshoot -- dig google.com
# Option 3: Install dig in a running pod (if you have access)
kubectl exec my-pod -- apt-get update && apt-get install -y dnsutils
kubectl exec my-pod -- dig my-service.default.svc.cluster.local A
The nicolaka/netshoot image is the gold standard for network debugging in Kubernetes. It includes dig, nslookup, curl, tcpdump, ping, traceroute, netstat, ss, iperf, and dozens of other networking tools. Keep it bookmarked. When you need to debug networking issues, kubectl run debug --image=nicolaka/netshoot --rm -it -- bash is your starting command.
nslookup: Quick and Simple
nslookup is simpler than dig but is available in more container images. It is good for quick checks but lacks the detailed output dig provides.
# Basic lookup
nslookup devopsbeast.com
# Server: 1.1.1.1
# Address: 1.1.1.1#53
# Non-authoritative answer:
# Name: devopsbeast.com
# Address: 104.21.45.67
# Query a specific server
nslookup devopsbeast.com 8.8.8.8
# Query specific record type
nslookup -type=MX devopsbeast.com
nslookup -type=TXT devopsbeast.com
nslookup -type=SRV _http._tcp.my-headless.default.svc.cluster.local
# From inside a K8s pod
kubectl exec my-pod -- nslookup my-service.default.svc.cluster.local
kubectl exec my-pod -- nslookup google.com
dig vs nslookup: When to Use Which
dig vs nslookup
dig
The power tool for deep debugging
nslookup
Quick checks and basic troubleshooting
nslookup and dig may give different results because they use different resolvers. nslookup uses the system resolver (respects /etc/resolv.conf, including search domains and ndots). dig queries the DNS server directly by default and does NOT apply search domains unless you explicitly add +search. When debugging K8s DNS, always specify the full domain name to avoid confusion.
tcpdump for DNS: Seeing What Is on the Wire
When dig and nslookup are not enough — when DNS works sometimes but not others, or when you suspect packets are being dropped — you need tcpdump to see the actual DNS packets on the network.
Capturing DNS Traffic
# Capture all DNS traffic on a node
sudo tcpdump -i any port 53 -nn
# Output:
# 10:00:00.001 IP 10.244.0.5.43210 > 10.96.0.10.53: 12345+ A? google.com.default.svc.cluster.local. (54)
# 10:00:00.002 IP 10.96.0.10.53 > 10.244.0.5.43210: 12345 NXDomain 0/1/0 (107)
# 10:00:00.003 IP 10.244.0.5.43210 > 10.96.0.10.53: 12346+ A? google.com.svc.cluster.local. (46)
# ...
# Decode:
# 10.244.0.5.43210 = Source pod IP, ephemeral port
# 10.96.0.10.53 = CoreDNS service IP, port 53
# 12345+ = DNS query ID, + means recursion desired
# A? = Query type (A record)
# NXDomain = Response: domain does not exist
Useful tcpdump Filters for DNS
# DNS traffic from a specific pod IP
sudo tcpdump -i any host 10.244.0.5 and port 53 -nn
# Only DNS queries (not responses) — filter by source port != 53
sudo tcpdump -i any dst port 53 -nn
# Only DNS responses
sudo tcpdump -i any src port 53 -nn
# Capture to a file for later analysis with Wireshark
sudo tcpdump -i any port 53 -w /tmp/dns-capture.pcap -nn
# Capture DNS traffic on a specific interface with verbose decoding
sudo tcpdump -i eth0 port 53 -vv -nn
Inside Kubernetes Pods
# Run tcpdump in a debug pod on the same node
kubectl debug node/my-node -it --image=nicolaka/netshoot -- \
tcpdump -i any port 53 -nn
# Or run tcpdump in the CoreDNS pod's network namespace
# (requires privileged access)
kubectl exec -n kube-system coredns-abc123 -- \
tcpdump -i any port 53 -nn -c 20
tcpdump shows you the ground truth. When dig says the query timed out, tcpdump tells you whether the query packet was actually sent, whether a response was received, or whether the packet was silently dropped. This is the difference between "DNS is broken" and "a firewall is dropping UDP packets on port 53."
The DNS Debugging Checklist
When a pod cannot resolve a hostname, follow this systematic approach:
DNS Debugging Decision Tree
Click each step to explore
Debugging Scenario 1: NXDOMAIN for a Service
# Step 1: Query from the pod
kubectl exec app-pod -- nslookup my-service.production.svc.cluster.local
# ** server cannot find my-service.production.svc.cluster.local: NXDOMAIN
# Step 2: Verify the service exists
kubectl get svc my-service -n production
# Error: services "my-service" not found
# CAUSE: The service does not exist in that namespace!
# Or:
kubectl get svc my-service -n production
# NAME TYPE CLUSTER-IP PORT(S)
# my-service ClusterIP 10.96.0.42 8080/TCP
kubectl get endpoints my-service -n production
# NAME ENDPOINTS
# my-service <none>
# CAUSE: Service exists but has no endpoints — no pods match the selector!
kubectl describe svc my-service -n production | grep Selector
# Selector: app=my-svc <-- Does any pod have this label?
kubectl get pods -n production -l app=my-svc
# No resources found. <-- No pods match!
Debugging Scenario 2: SERVFAIL for External Domains
# Step 1: Query external domain from the pod
kubectl exec debug -- dig google.com A
# ;; ->>HEADER<<- status: SERVFAIL
# Step 2: Query CoreDNS directly
kubectl exec debug -- dig @10.96.0.10 google.com A
# ;; ->>HEADER<<- status: SERVFAIL
# Step 3: Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=20
# [ERROR] plugin/forward: no nameservers found
# OR
# [ERROR] plugin/forward: unreachable backend
# Step 4: Check what CoreDNS is forwarding to
kubectl get configmap coredns -n kube-system -o yaml | grep forward
# forward . /etc/resolv.conf
# Step 5: Check the resolv.conf inside CoreDNS pod
kubectl exec -n kube-system coredns-abc123 -- cat /etc/resolv.conf
# nameserver 169.254.169.253 <-- VPC resolver (AWS)
# Step 6: Test upstream from CoreDNS pod
kubectl exec -n kube-system coredns-abc123 -- nslookup google.com 169.254.169.253
# ;; connection timed out
# CAUSE: CoreDNS cannot reach the upstream resolver!
# Check: NetworkPolicy blocking kube-system egress? VPC resolver down?
Debugging Scenario 3: Intermittent DNS Timeouts
# Step 1: Check CoreDNS resource usage
kubectl top pods -n kube-system -l k8s-app=kube-dns
# NAME CPU MEMORY
# coredns-abc123 450m 96Mi
# coredns-def456 490m 98Mi
# Resource limits might be too low
# Step 2: Check CoreDNS metrics
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl -s localhost:9153/metrics | grep coredns_dns_request_duration_seconds
# Look for high latency percentiles
# Step 3: Check conntrack on nodes
ssh node-1 'cat /proc/sys/net/netfilter/nf_conntrack_count'
# 62000
ssh node-1 'cat /proc/sys/net/netfilter/nf_conntrack_max'
# 65536
# CAUSE: conntrack table almost full — DNS packets being dropped
# Step 4: Check for packet drops
ssh node-1 'netstat -s | grep -i drop'
# InErrors: 0
# NoPorts: 0
ssh node-1 'conntrack -S | grep drop'
# drop=1523 <-- Packets dropped due to conntrack full!
The most insidious DNS bug I have encountered: a cluster where DNS worked for 99% of queries but failed for exactly one specific external domain. The domain had a DNSSEC-signed zone with an invalid signature. CoreDNS was performing DNSSEC validation (enabled by default in some configurations), and the validation failure returned SERVFAIL. Every other domain resolved fine. The fix: either disable DNSSEC validation in CoreDNS (not ideal) or contact the domain owner to fix their DNSSEC configuration.
Advanced dig Techniques
Querying Authoritative Servers Directly
When you suspect caching is the issue, bypass all caches and query the authoritative server directly:
# Find the authoritative nameservers
dig devopsbeast.com NS +short
# ns1.cloudflare.com.
# ns2.cloudflare.com.
# Query the authoritative server directly
dig @ns1.cloudflare.com devopsbeast.com A +short
# 104.21.45.67
# This answer is fresh from the source — no caching involved
Checking DNSSEC
# Check if a domain has DNSSEC
dig devopsbeast.com A +dnssec +short
# If RRSIG records appear, DNSSEC is enabled
# Validate DNSSEC chain
dig devopsbeast.com A +sigchase +trusted-key=/etc/trusted-key.key
# Check DS records at the parent zone
dig devopsbeast.com DS +short
# 12345 13 2 abc123...
Measuring DNS Performance
# Time a single query
dig devopsbeast.com A +stats | grep "Query time"
# ;; Query time: 3 msec
# Batch timing with multiple queries
for i in $(seq 1 100); do
dig devopsbeast.com A +stats +tries=1 +time=2 2>/dev/null | grep "Query time"
done | sort -t: -k2 -n | tail -5
# Shows the 5 slowest queries
# Test from inside a K8s pod
kubectl exec debug -- sh -c 'for i in $(seq 1 20); do dig my-service.default.svc.cluster.local A +stats +short +tries=1 2>/dev/null | tail -3; done'
When measuring DNS performance in Kubernetes, always test both cluster-internal names (my-service.default.svc.cluster.local) and external names (google.com) separately. Slow internal resolution points to CoreDNS issues. Slow external resolution points to upstream resolver issues or ndots overhead. Different root causes require different fixes.
Building a DNS Debug Container
For teams that debug DNS frequently, create a purpose-built debug container:
FROM alpine:3.19
RUN apk add --no-cache \
bind-tools \
curl \
tcpdump \
netcat-openbsd \
busybox-extras \
jq
# bind-tools gives us dig and nslookup
# tcpdump for packet capture
# netcat for port testing
# jq for parsing DNS-over-HTTPS JSON responses
CMD ["sleep", "infinity"]
# Deploy it
kubectl apply -f - <<ENDOFFILE
apiVersion: v1
kind: Pod
metadata:
name: dns-debug
namespace: default
spec:
containers:
- name: debug
image: your-registry/dns-debug:latest
command: ["sleep", "infinity"]
ENDOFFILE
# Use it
kubectl exec -it dns-debug -- dig my-service.default.svc.cluster.local A
kubectl exec -it dns-debug -- tcpdump -i eth0 port 53 -nn -c 50
Key Concepts Summary
- dig is the primary DNS debugging tool — use
+traceto see the full resolution chain,+shortfor concise output,@serverto query specific resolvers - nslookup is simpler but limited — good for quick checks, but lacks trace, custom timeouts, and detailed output
- tcpdump reveals ground truth — when dig says "timeout," tcpdump shows whether the packet was sent and whether a response came back
- Always debug systematically: pod resolv.conf, CoreDNS direct query, CoreDNS health, upstream resolver, packet capture
- The nicolaka/netshoot image has every network debugging tool you need — keep it bookmarked
- Query authoritative servers directly to bypass caching issues —
dig @ns1.provider.com domain.com A - NXDOMAIN usually means: typo, wrong namespace, or service does not exist — check with
kubectl get svcandkubectl get endpoints - SERVFAIL usually means: CoreDNS cannot reach upstream, DNSSEC validation failure, or the authoritative server is broken
- Timeouts usually mean: firewall blocking UDP 53, CoreDNS overloaded, or conntrack table full
Common Mistakes
- Using nslookup inside a pod and forgetting it applies search domains — a query for
google.commight resolve asgoogle.com.default.svc.cluster.localfirst - Running dig without specifying a server (
@10.96.0.10) and getting results from a different resolver than the pod uses - Not checking the pod's /etc/resolv.conf — the pod might have
dnsPolicy: Nonewith no DNS configuration - Forgetting that dig does not apply search domains by default — use
+searchflag or provide the FQDN with trailing dot - Debugging DNS from outside the cluster and assuming results match in-cluster behavior — always debug from inside a pod
- Not checking CoreDNS logs — they often contain the exact error message explaining the failure
You run dig google.com from inside a Kubernetes pod and it returns SERVFAIL. You run dig @8.8.8.8 google.com from the same pod and it succeeds. What does this tell you?