How Kubernetes Stores Data in etcd
Every kubectl get pods command ultimately reads etcd. Every kubectl apply ultimately writes etcd. But Kubernetes hides etcd behind the API server so completely that most engineers never see the layer below. This lesson strips that abstraction away.
By the end, you'll know exactly where a ConfigMap lives in etcd, what encoding is on the wire, how to read it with etcdctl, and why this knowledge matters during outages when the API server is unavailable.
Every Kubernetes object lives at a predictable path in etcd — /registry/<kind>/<namespace>/<name> — serialized as protobuf. Knowing this lets you inspect state directly during incidents, verify backups, migrate clusters, and understand exactly what's using your etcd capacity.
The /registry/ hierarchy
Every Kubernetes object in etcd lives under the /registry/ prefix. The structure is straightforward:
/registry/<resource_type>/<namespace>/<name>
Examples:
/registry/pods/default/my-pod
/registry/configmaps/kube-system/coredns
/registry/secrets/default/db-credentials
/registry/deployments/apps/api-server
/registry/services/default/kubernetes
/registry/namespaces/default
Cluster-scoped resources (no namespace) skip the namespace component:
/registry/nodes/worker-1
/registry/clusterroles/cluster-admin
/registry/persistentvolumes/pv-100g-1
Custom resources follow the same pattern:
/registry/cert-manager.io/certificates/default/my-tls-cert
/registry/argoproj.io/applications/argocd/my-app
This structure makes prefix scans the natural query operation: want all pods in namespace default? Scan /registry/pods/default/. Want every Deployment cluster-wide? Scan /registry/deployments/.
Key-space layout at a glance
Object serialization — protobuf
The value at each /registry/... key isn't JSON. It's protobuf — a compact binary serialization format.
Why protobuf over JSON:
- Smaller: roughly half the size of equivalent JSON.
- Faster to parse: binary fields, no text scanning.
- Strongly typed: every field has a schema, no ambiguity.
The trade-off: you can't just cat an etcd value and read it. You need a protobuf deserializer that understands Kubernetes' schemas.
The k8s\x00 magic prefix
Every Kubernetes protobuf value starts with a magic prefix:
k8s\x00<protobuf-bytes>
The first four bytes are literally k, 8, s, \x00. This tells the API server "this is a Kubernetes protobuf, here's the payload." It also distinguishes Kubernetes-encoded values from other formats the API server might encounter (e.g., if encryption-at-rest is enabled, it's prefixed differently).
If you ever see data in etcd that starts with k8s\x00, you're looking at Kubernetes-encoded protobuf. Anything else (like {...} JSON, or <encrypted>...) means something else is happening — often legacy JSON objects or encryption at rest.
Using etcdctl to read K8s data
On a control-plane node, you have access to the etcd endpoints and client certificates. Here's how to list resources directly:
# Set up environment (adjust paths to your install)
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
Now you can query:
# List all pod keys (not values, just keys)
etcdctl get --prefix --keys-only /registry/pods/
# List all namespaces
etcdctl get --prefix --keys-only /registry/namespaces/
# Count ConfigMaps in kube-system
etcdctl get --prefix --keys-only /registry/configmaps/kube-system/ | grep -c /registry
# Get a specific pod's raw protobuf value
etcdctl get /registry/pods/default/my-pod
The last command returns binary output. To make it readable, pipe through a decoder.
Decoding protobuf values
The auger tool (maintained by Kubernetes community) converts the protobuf back to YAML:
# Install auger
go install github.com/jpbetz/auger@latest
# Decode a pod
etcdctl get /registry/pods/default/my-pod --print-value-only \
| auger decode
Output: the full Pod spec as YAML, identical to kubectl get pod my-pod -o yaml (plus some internal fields).
An alternative using Python:
etcdctl get /registry/pods/default/my-pod --print-value-only \
| head -c 4 # Shows "k8s\0"
So the magic prefix is the first 4 bytes. The rest is protobuf you can decode with the k8s.io/apimachinery protobuf definitions — but that's usually more trouble than using auger.
Storage format for special resources
A few resources have quirks worth knowing about.
Events
Events live at:
/registry/events/<namespace>/<event-name>
Events have a short TTL (~1 hour default) via etcd leases — they auto-delete. This is why kubectl get events only shows recent events.
Events are high-volume and can dominate etcd write load. A misbehaving controller generating events at 10/sec will burn through capacity.
Leases
Lease objects themselves are stored at /registry/leases/<namespace>/<name>. Confusingly, this is different from etcd's native lease primitive (the one covered in lesson 1.2).
- etcd lease (primitive): a TTL on a key, internal to etcd.
- Kubernetes Lease object: a user-facing resource used for coordination and heartbeats.
The Kubernetes Lease object does not use etcd's lease primitive directly. Instead, it's a regular etcd key with a timestamp field, and kubelets/controllers update the timestamp to "renew." The node controller checks timestamps to decide if a node is alive.
Secrets
Secrets are just keys under /registry/secrets/. By default, the value is plain protobuf containing the secret data (base64-encoded inside the protobuf, which is not encryption — just encoding).
This is the #1 reason to enable encryption at rest in Kubernetes: without it, anyone with read access to the etcd database file (e.g., a stolen backup) can read every Secret in the cluster.
With encryption at rest enabled, the value format becomes:
k8s:enc:<provider>:<version>:<encrypted-payload>
More on this in lesson 6.3.
ConfigMaps
Same as Secrets but without the implicit expectation of secrecy. Plain protobuf, no encryption unless you explicitly enable it for ConfigMaps (less common).
Custom resources (CRDs)
CRDs themselves live at:
/registry/apiextensions.k8s.io/customresourcedefinitions/<name>
Instances of custom resources live under the CRD's group:
/registry/<group>/<resource>/<namespace>/<name>
# e.g. for cert-manager's Certificate CRD (group: cert-manager.io):
/registry/cert-manager.io/certificates/default/my-tls-cert
One implication: every CRD you create adds to the etcd key space. Teams that spray CRDs across their cluster (Argo, Flux, cert-manager, external-dns, ingress-nginx admission, OPA Gatekeeper, etc.) can have tens of thousands of custom resources, each a key in etcd.
A worked example: finding a ConfigMap in etcd
Say you want to verify that the coredns ConfigMap in kube-system is actually what kubectl says it is. Full walkthrough:
# 1. Check with kubectl
kubectl get configmap -n kube-system coredns -o yaml
# ... shows the ConfigMap contents ...
# 2. Find the etcd key
etcdctl get --keys-only --prefix /registry/configmaps/kube-system/coredns
# /registry/configmaps/kube-system/coredns
# 3. Fetch the raw value
etcdctl get /registry/configmaps/kube-system/coredns --print-value-only | auger decode
# ... the same ConfigMap contents as kubectl showed ...
# 4. Check metadata
etcdctl get /registry/configmaps/kube-system/coredns -w json | jq '.kvs[0] | {create_revision, mod_revision, version}'
# {
# "create_revision": 1023,
# "mod_revision": 1023,
# "version": 1
# }
The version tells you how many times this ConfigMap has been modified — 1 in this case (freshly created).
If an object in kubectl doesn't match what's in etcd, you have a serious bug (admission webhook misbehaving, mutating controller gone wrong, corrupted data). Being able to check this directly is the escape hatch.
How much space is each resource using?
Useful admin query: which resources are hogging etcd?
# Get key counts by resource type
for kind in pods configmaps secrets services deployments daemonsets events; do
count=$(etcdctl get --prefix --keys-only /registry/$kind/ | grep -c /registry)
echo "$kind: $count"
done
Output:
pods: 512
configmaps: 147
secrets: 89
services: 42
deployments: 38
daemonsets: 8
events: 1203
If events dominate and the DB is big, your retention is off (or a controller is event-spamming).
Approximate value-size query
A more advanced query uses etcd-analyze or custom scripts. For a quick estimate:
etcdctl get --prefix /registry/pods/ | wc -c
# Total bytes of all pod keys + values
Across a cluster, this helps you understand where the bytes are going.
Watch with etcdctl
You can watch etcd keys directly, just like Kubernetes components do internally:
# Watch every change to pods in the default namespace
etcdctl watch --prefix /registry/pods/default/
Modify a pod in another terminal (kubectl scale deployment/foo --replicas=3) and you'll see the PUT events stream by in real time. This is literally what the scheduler and kubelets see.
Fascinating for learning; invaluable for debugging "is this pod actually getting created in etcd?" type issues.
The cluster-resource inventory
A useful one-liner to see what types of resources exist in your cluster:
etcdctl get --prefix --keys-only / | awk -F/ '{print $3}' | sort -u
Output for a typical cluster:
apiregistration.k8s.io
apps
batch
certificates.k8s.io
configmaps
controllerrevisions
coordination.k8s.io
daemonsets
deployments
endpoints
events
leases
namespaces
nodes
persistentvolumes
pods
replicasets
secrets
serviceaccounts
services
statefulsets
... plus any CRD groups you've installed
Each of those represents a type of object stored under /registry/<type>/....
What the API server does with all this
The API server is essentially a typed, validating, caching proxy over etcd's key-value interface. Its layers:
- Admission (validation, mutation).
- Serialization (JSON/YAML ↔ protobuf).
- Storage (etcd get/put/watch/transaction).
- Cache (an in-memory
watchCachefor hot resources).
When kubectl says GET /api/v1/namespaces/default/pods/my-pod, the API server:
- Authenticates.
- Checks RBAC.
- Runs admission.
- Turns the REST URL into the etcd key
/registry/pods/default/my-pod. - Does
etcdctl getequivalent. - Decodes protobuf.
- Converts to JSON/YAML for the response.
This is why the API server is CPU-hungry — every request is schema validation, serialization, and etcd roundtrip.
The API server can cache reads (for popular resources, it serves from the watchCache). Writes always go through to etcd. Watches deliver events from etcd to the API server to the client in near real time.
Why this knowledge matters during incidents
Why go to the trouble of learning etcd's internals? Because during a real incident, you may need to:
1. Verify backup contents
Before restoring, open the snapshot and confirm the data you think is there actually is:
etcdctl --write-out=table snapshot status snapshot.db
# hash / revision / total keys
etcdctl get --prefix --keys-only / | wc -l
# how many keys total?
If a backup claims to have your production state but has 1/10th the keys you expect, something's wrong.
2. Debug API server problems
If kubectl is throwing weird errors, you can check if the data even exists in etcd:
etcdctl get /registry/pods/default/problematic-pod
If it's there but kubectl can't read it, the API server has a problem (RBAC, webhook, cache). If it's not there, something deleted it.
3. Clean up corrupted resources
Rare but real: an API server bug or malfunctioning webhook creates a Kubernetes object that can't be deleted normally (the API server errors on every attempt). With etcd access, you can surgically remove it:
etcdctl del /registry/stuck-resources/default/bad-object
Be careful — any direct etcd write bypasses validation. Only do this when you understand exactly what you're removing and why.
4. Understand capacity
When etcd size is approaching quota, the questions "which resources are eating the space" and "are old revisions accumulating" are answered directly from etcd.
Quiz
You run 'etcdctl get /registry/secrets/default/my-secret --print-value-only' on a cluster that has encryption at rest enabled. What does the output look like?
What to take away
- Every Kubernetes object lives at a predictable path:
/registry/<resource>/<namespace>/<name>. - Cluster-scoped resources skip the namespace segment.
- Values are protobuf with a
k8s\0magic prefix; decode withaugeror similar. - Encryption at rest changes the value format to
k8s:enc:...— essential for Secrets in production. etcdctlwith the right certs lets you read, write, watch, and count directly.- Prefix scans are the natural query:
--prefix /registry/pods/default/returns all pods in a namespace. - Events use auto-expiring leases to keep etcd from filling up with them; still a common capacity offender.
- CRDs live at
/registry/<group>/<resource>/...and add to the key-space. - During incidents, direct etcd access is the escape hatch for backup verification, API server bypass, and surgical fixes.
Next module: sizing etcd for your actual cluster — the 8GB limit, when to raise it, and the disk requirements that determine whether etcd will stay healthy.