All posts
Kubernetes Security

Your NetworkPolicy Controls the Front Door and Leaves the Building Through the Back.

Most Kubernetes NetworkPolicies restrict who can reach a service and stop there — ingress locked, egress wide open. But the blast radius of a compromised pod is almost entirely an egress story: lateral movement, the cloud metadata endpoint, data exfiltration, calling home. Default-deny in both directions, the DNS gotcha that breaks everything the moment you turn egress on, blocking 169.254.169.254 with an ipBlock except, the CNI that has to actually enforce it, and the audit-then-enforce rollout that doesn't take production down.

By Sharon Sahadevan··13 min read

You wrote NetworkPolicies for your cluster. The database only accepts connections from the API tier, the API only accepts from the gateway, the admin service only from the ops namespace. You reviewed them, they work, and you feel covered. You are not — because every one of those policies controls ingress, and the breach you are actually worried about travels outbound.

This is the most common network-security mistake in Kubernetes, and it is so common because it feels complete. Restricting who can reach a service is the intuitive direction to think about access control: it is how firewalls, security groups, and load balancers trained you. So teams write ingress rules, lock down the front door of every service, and consider network security done. Meanwhile the back door — what each pod is allowed to send, and where — is wide open, and that is the door an attacker leaves through. As the pod-compromise blast-radius walkthrough lays out, almost the entire propagation path of a compromised container is egress: reaching other pods, reaching the cloud metadata endpoint, exfiltrating data, and calling home. A NetworkPolicy posture that controls ingress and ignores egress secures the front door and leaves the building through the back.

This post is the egress half nobody writes: why the default is so permissive, what default-deny in both directions actually looks like, the DNS gotcha that breaks your whole cluster the moment you enable it, how to block the metadata endpoint when policies are allow-lists not deny-lists, the CNI caveat that makes half of these policies decorative, and the rollout that gets you there without an outage.

The default is "everyone talks to everyone"#

Start with the posture you are actually securing against. Kubernetes pod networking is flat and fully open by default. Every pod can open a connection to every other pod in every namespace, to the API server, and to the node's link-local addresses, with nothing in the way. There is no implicit isolation between namespaces, no default segmentation between tiers — the network model's baseline assumption is mutual reachability, and isolation is entirely opt-in.

It is opt-in in a specific way that trips people up: NetworkPolicies are additive allow-lists, not deny rules. You cannot write a policy that says "deny X." You write policies that select pods and permit listed traffic; the act of selecting a pod with any policy flips it from "allow all" to "allow only what some policy permits." Until a pod is selected by at least one policy, all its traffic — both directions — is allowed. And critically, a policy that specifies only Ingress rules leaves egress completely unrestricted for the pods it selects. That is the trap: writing ingress rules feels like you locked the pod down, but you only constrained one direction. The other direction is still wide open because you never named it.

KEY CONCEPT

NetworkPolicy is opt-in, additive, and per-direction. A pod is unrestricted until a policy selects it, and a policy only restricts the directions it names. So an ingress-only policy is not "most of the way to secure" — it is half a control, and it is the wrong half for containing a breach. Egress is where blast radius lives, because a compromised pod's damage is almost all about what it can reach, not what can reach it. If you write one kind of policy in your cluster, write default-deny egress.

Default-deny, both directions#

The foundation is a default-deny policy in every namespace that runs workloads — NetworkPolicies are namespaced, so this is per-namespace, not cluster-wide. An empty podSelector selects all pods, and naming a policyType with no matching rules denies all traffic in that direction:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: app
spec:
  podSelector: {}          # every pod in the namespace
  policyTypes:
    - Ingress
    - Egress               # the line most clusters never write
  # no ingress/egress rules => deny all in both directions

This is the zero-trust baseline: nothing in, nothing out, until you explicitly allow it. From here you add narrow allow policies for the flows each workload genuinely needs — the API tier may egress to the database on 5432, the gateway may ingress from the ingress controller, and so on. The posture inverts from "everything is allowed unless I blocked it" (unmaintainable, and wrong by default) to "nothing is allowed unless I permitted it" (auditable, and safe by default).

And the moment you apply that policy, your cluster breaks. Every pod loses DNS.

The DNS gotcha that breaks everything#

Here is the lesson everyone learns the hard way the first time they enable default-deny egress: it blocks DNS, and without DNS, nothing resolves and every outbound connection fails. Pods resolve service names through CoreDNS (kube-dns) in the kube-system namespace over port 53. That is egress traffic. A default-deny egress policy denies it along with everything else, so suddenly your application cannot resolve postgres.data.svc.cluster.local, the calls time out, and it looks like a total outage rather than a DNS problem.

The fix is the very first allow policy you write — permit egress to CoreDNS on UDP and TCP 53:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: app
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns      # CoreDNS pods carry this label
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Pair the default-deny with this DNS allow as a single bundle you apply to every namespace together. Skipping it is the number-one reason teams try default-deny egress once, watch the cluster fall over, and revert to ingress-only — concluding that egress policy "is too hard" when the real problem was one missing allow rule.

Blocking the metadata endpoint when you can't write a deny#

The sharpest egress target is the cloud metadata endpoint at 169.254.169.254. From almost any pod it is reachable, and on AWS without IMDSv2 hardening it returns the node's instance IAM role credentials — usually far broader than the pod should have, and a breach that has now left Kubernetes for your cloud account. (It is the same instance-role plumbing the GitHub Actions OIDC to AWS post describes, harvested from the inside.)

A pure default-deny posture blocks it for free — the endpoint is just one more denied destination. The problem appears when a workload legitimately needs to reach the external internet and you write an allow rule for it. Because NetworkPolicy is an allow-list with no deny verb, you cannot say "allow everything except the metadata IP." You express it with an ipBlock and an except:

  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 169.254.169.254/32     # cloud metadata endpoint
              - 169.254.0.0/16         # link-local range generally
              - 10.0.0.0/8             # (optional) keep internal RFC1918 off the "internet" rule
      ports:
        - protocol: TCP
          port: 443

The except carves the metadata IP and link-local range out of the otherwise-broad egress allow, so a workload can call external APIs without being able to reach the node's credentials. Belt-and-suspenders: enforce IMDSv2 with a hop limit of 1 at the instance level too, so even a policy gap cannot reach the metadata creds. Defense in depth means the network policy and the instance config both have to fail before the attacker wins.

The caveat that makes policies decorative: the CNI has to enforce them#

A NetworkPolicy is a request for enforcement, not enforcement itself. The Kubernetes API will happily accept and store every policy you apply — and if your CNI plugin does not implement NetworkPolicy, none of them do anything. Plain flannel, for instance, does not enforce NetworkPolicy; you can apply a perfect default-deny posture on a flannel cluster and every packet still flows, because nothing is reading the policy. Calico, Cilium, and Weave Net (among others) do enforce it.

This is genuinely dangerous because it fails open and silent: kubectl get networkpolicy shows your policies, they look applied, and an audit checkbox gets ticked — while the cluster has zero network isolation. Verify enforcement empirically, not by the presence of policy objects: apply a default-deny, then kubectl exec into a pod and confirm a connection that should now be blocked actually times out. If it still connects, your CNI is not enforcing, and every NetworkPolicy in the cluster is decoration.

WARNING

Never assume a NetworkPolicy is in effect just because it applied cleanly. The API server validates and stores policy regardless of whether anything enforces it, so a cluster with the wrong CNI gives you the paperwork of network security with none of the substance — the most dangerous state, because it reads as "done." Test enforcement with an actual blocked connection from inside a pod before you trust a single policy. Security controls that fail silently are worse than no control, because they stop you from looking.

What NetworkPolicy does not cover#

Set expectations so you don't think you bought more than you did. Standard NetworkPolicy operates at L3/L4 — it controls traffic by pod selector, namespace, IP block, and port. It does not understand L7: it cannot allow GET /public while blocking POST /admin on the same port, cannot filter by HTTP header or method, and cannot enforce mTLS identity. For L7 controls you need a service mesh (Istio, Linkerd) or Cilium's L7 policy layer, which is a separate decision with its own cost. NetworkPolicy is also not a replacement for authentication between services — it restricts reachability, not identity; a permitted caller is still unauthenticated unless something above the network layer checks it. Treat default-deny L3/L4 as the floor of a defense-in-depth stack, not the whole thing.

The rollout that doesn't break production#

The reason most clusters never get to default-deny is not that the YAML is hard — it is the fear, justified, that flipping it on will sever flows nobody documented and cause an outage. The answer is the same audit-then-enforce discipline that makes any risky lockdown safe, applied to the network:

  1. Discover the real flows first. Before denying anything, learn what actually talks to what. Flow-visibility tooling — Cilium Hubble, Calico's flow logs, or a service mesh's telemetry — maps the live connection graph so you write allow rules from reality, not from the architecture diagram (which is always missing the cron job, the legacy sidecar, and the one service that calls an external API nobody remembered).
  2. Write allow policies for the observed flows, including the DNS allow and any legitimate external egress, while the namespace is still effectively allow-all.
  3. Apply default-deny to one low-risk namespace and watch. Confirm the allow rules cover the real traffic; catch the flow you missed here, where it affects one non-critical workload, not the whole cluster.
  4. Roll namespace by namespace, lowest-risk to highest, with the metadata-endpoint carve-out and DNS allow in the bundle each time. Keep the rollback trivial: deleting the default-deny policy instantly restores open networking for that namespace if something breaks.

The sequencing is the senior move. The difference between "we have network policies" (a knowledge-level claim) and "we rolled default-deny across the fleet without an incident" (the reasoning-level one the senior-interview gap post is about) is entirely in the rollout, because the controls themselves are a few dozen lines of YAML.

Common mistakes#

Ingress-only policies. The headline mistake. Restricting who can reach a service while leaving egress open secures the wrong direction for containing a breach. Default-deny must name Egress.

Default-deny egress without the DNS allow. Breaks all name resolution instantly and looks like a full outage. Always bundle the CoreDNS egress allow with the deny.

Assuming the CNI enforces policy. Flannel and some managed configs do not. Verify with a blocked connection from inside a pod; do not trust the presence of policy objects.

Trying to "deny" the metadata IP directly. Policies are allow-lists. Block it with an ipBlock except on your broad egress rule, and enforce IMDSv2 at the instance level as backup.

Forgetting policies are namespaced. A default-deny in one namespace does nothing for the others. The posture has to be applied to every workload namespace, ideally by a policy that ships with the namespace.

Expecting L7 from an L3/L4 tool. NetworkPolicy can't filter by path, method, or header, and isn't authentication. Reach for a mesh or Cilium L7 when you need that, and don't assume default-deny gave it to you.

Big-bang rollout. Flipping default-deny across the whole cluster at once turns one undocumented flow into a cluster-wide incident. Discover flows, then roll namespace by namespace with a trivial rollback.

The mental model#

A Kubernetes cluster ships with a flat, fully trusting network — every pod can reach every other pod and the node's credentials endpoint, and isolation only exists where you opted in. The intuitive opt-in, ingress rules, secures the direction that feels like access control but is the wrong direction for a breach, because a compromised pod's damage is measured by what it can reach, not what can reach it. Default-deny egress inverts the network from allow-by-default to deny-by-default in the direction that actually contains the blast radius — and once you commit to it, everything else is consequence: allow DNS or nothing resolves, carve out the metadata endpoint or it leaks cloud creds, confirm the CNI enforces or it is theater, and roll it out by discovering real flows rather than guessing.

The test of whether your network is secured is not whether NetworkPolicies exist. It is whether a pod that has no business reaching the database, the admin service, the internet, or 169.254.169.254 is actually stopped when it tries — in both directions. Write the egress half. It is the half that decides how far a breach travels.


Default-deny network policy, the CNI enforcement models, L3/L4 vs L7 segmentation, egress control, and the flow-discovery-then-enforce rollout are part of the Kubernetes Security course. The networking foundations underneath (the OSI layers, how pod traffic is actually routed, and where policy is enforced in the path) are the Networking Fundamentals course, the security-system-design walkthrough is Kubernetes System Design Interview Prep, and the rollout-without-an-outage patterns are Production Kubernetes Operations. Related reading: A Pod in Your Cluster Just Got Compromised. Walk Me Through the Blast Radius. for why egress is the control that contains the breach, How GitHub Actions OIDC to AWS Actually Works for the metadata-endpoint credentials this policy blocks, and Your Cluster Has 5,000 Services and kube-proxy Is the Bottleneck for how Service traffic is routed beneath the policy layer.

More in Kubernetes Security

Kubernetes Security··14 min read

A Pod in Your Cluster Just Got Compromised. Walk Me Through the Blast Radius.

One container gets popped — an RCE in an app, a malicious dependency, a leaked token. The junior answer is 'kill the pod.' The senior answer traces the blast radius: from the mounted ServiceAccount token to the API server, across a flat pod network to the cloud metadata endpoint, and through a privileged pod to the node and every secret on it. The attacker's path layer by layer, and the single control that caps the damage at each one — the difference between 'one pod' and 'whole cluster.'

Read post