Cardinality and Why It Matters
A team adds
user_idas a label to their HTTP request counter. It works beautifully — they can query "how many requests did user 42 make last hour?" directly from Prometheus. A week later, the Prometheus server is consuming 80 GB of RAM, queries are timing out, and the on-call engineer is paged at 3 AM because scraping is failing. They had 2 million active users. Each user generated a new unique time series. The metrichttp_requests_total{user_id="..."}expanded into 2 million separate series — each stored, each indexed, each queried. The fix: removeuser_id. Prometheus recovers, 2 million series get garbage-collected, and the team adopts a rule: user_id never goes in metric labels.Cardinality is the hidden cost center of observability. It is also the number-one way teams accidentally destroy their Prometheus/Mimir/VictoriaMetrics infrastructure. This lesson explains what cardinality is, why high-cardinality metrics cost exponentially more, and the specific label-design practices that keep costs sane.
What Cardinality Is
The cardinality of a metric is the number of unique combinations of label values it can produce. Each unique combination is a separate time series.
Example:
http_requests_total{method="GET", endpoint="/api/users", status="200"}
http_requests_total{method="GET", endpoint="/api/users", status="500"}
http_requests_total{method="POST", endpoint="/api/users", status="201"}
http_requests_total{method="GET", endpoint="/api/orders", status="200"}
Four distinct time series, because the (method, endpoint, status) combinations differ.
Formula:
cardinality = product of (distinct values per label)
If you have:
- 10 endpoints
- 4 methods
- 20 status codes
Cardinality = 10 x 4 x 20 = 800 time series.
Add user_id with 2 million unique values:
Cardinality = 10 x 4 x 20 x 2,000,000 = 1.6 billion time series.
Every new label with high cardinality multiplies your total series count. Prometheus was designed assuming cardinality in the tens of thousands per metric, not billions. High-cardinality labels do not just cost more — they cost exponentially more as they multiply with your existing labels.
Why Cardinality Is Expensive
Prometheus (and every metric backend: VictoriaMetrics, Mimir, Cortex, Thanos) indexes by label set. Each unique label combination needs:
- A row in the series index.
- Memory for the series metadata.
- Storage for the time-series samples.
- CPU for queries that touch it.
A back-of-envelope calculation:
- Each series: ~3 KB of memory (label strings, index entries, sample cache).
- Each series: ~1-2 bytes per sample stored (highly compressed), at default 15s scrape = ~240 samples/hour = ~500 bytes/hour retained.
At 1 million series:
- Memory: ~3 GB for active series index alone.
- Storage: ~500 MB per hour retained = 12 GB/day = 360 GB/month.
At 10 million series:
- Memory: ~30 GB.
- Storage: ~3.6 TB per month.
And query performance degrades non-linearly. A query that scans across millions of series takes seconds instead of milliseconds; during a dashboard refresh, you time out.
Modern long-term stores (VictoriaMetrics, Mimir) handle higher cardinality with better compression, but the economics still favor keeping cardinality bounded.
The High-Cardinality Hall of Shame
Labels you almost never want:
| Label | Typical cardinality | Impact |
|---|---|---|
user_id | 10,000 to billions | Catastrophic |
trace_id / request_id | Every request (millions/hr) | Catastrophic |
session_id | Active sessions (thousands to millions) | Catastrophic |
email | Unique users | Catastrophic |
ip_address | IPv4 space is 4B | Very high |
| Raw URL path (with IDs baked in) | Unbounded | High |
| Timestamp in label | Unbounded | Catastrophic |
| Error message text | Unbounded | Very high |
| Customer name | Depends — tenants? | Can be OK at low scale |
| Version SHA | Low (deploys per day) | OK |
Anything that scales with users, requests, or time is dangerous. Anything with a bounded list (regions, environments, services) is fine.
Good vs Bad Label Design
BAD: URL path with IDs embedded
http_requests_total{path="/api/users/12345"}
http_requests_total{path="/api/users/12346"}
http_requests_total{path="/api/users/12347"}
Every user generates a new series. Cardinality explodes.
GOOD: Route pattern
http_requests_total{route="/api/users/:id"}
One series per route regardless of user. Your HTTP framework (Express, Gin, FastAPI, etc.) exposes the route pattern — use it.
BAD: Error message as label
request_errors_total{error="Invalid email format: foo@bar.invalid"}
request_errors_total{error="Invalid email format: test@nowhere"}
Each distinct error message becomes a new series.
GOOD: Error class
request_errors_total{error_class="invalid_email"}
Bounded set of error classes; unbounded details go to logs.
BAD: Response time as a label
slow_requests_total{duration_ms="1247"}
Every distinct duration is a new series. Use a histogram instead:
GOOD: Histogram buckets
http_request_duration_seconds_bucket{le="0.1"}
http_request_duration_seconds_bucket{le="0.5"}
http_request_duration_seconds_bucket{le="1"}
http_request_duration_seconds_bucket{le="5"}
Fixed number of buckets; answers all "how many requests were faster than X?" questions.
How to Monitor Cardinality
See your top metrics by cardinality
# Number of series per metric name (Prometheus)
topk(10, count by (__name__) ({__name__=~".+"}))
This prints the 10 metrics contributing the most series. If one metric has millions, you have a cardinality problem.
Find which labels are exploding
# Cardinality of a specific metric by label
count(count by (user_id) (http_requests_total))
If this returns 1,847,293 — your user_id label has that many distinct values. Remove it.
Per-tenant cardinality (Mimir / Cortex)
# Mimir/Cortex tenant metrics
curl http://mimir:8080/api/v1/user_stats
# Or query from Prometheus
cortex_ingester_memory_series{job="mimir-ingester"}
For managed services (Grafana Cloud, Datadog, etc.), cardinality is a billable dimension — they surface it in usage dashboards.
The Cardinality Budget
Treat cardinality as a finite budget per metric. Document it:
# metric-cardinality-budgets.md
http_requests_total:
expected labels: method (5), route (~30), status_class (5)
expected cardinality: 5 * 30 * 5 = 750
max allowed: 5,000
database_query_duration_seconds:
expected labels: operation (10), table (50)
expected cardinality: 500
max allowed: 2,000
Alert if cardinality exceeds the budget:
count(count by (__name__) ({__name__="http_requests_total"})) > 5000
This catches accidental label explosions before they crash the server.
Set up a cardinality-ceiling alert for each high-volume metric. When someone accidentally ships http_requests_total{user_id=...}, the alert fires within minutes, they roll back, and the disaster is averted. Most teams learn this lesson the hard way — you can learn it cheaply.
When You Actually Need High Cardinality
Sometimes the question is genuinely per-user, per-request, per-session:
- What is user X experiencing right now?
- Trace this specific request end-to-end.
- What is the distribution of latencies by customer?
The answer is NOT to add high-cardinality labels to Prometheus metrics. The answer is to use the right tool for the job:
| Question | Wrong tool | Right tool |
|---|---|---|
| Aggregate behavior | Logs, traces | Metrics |
| Per-request investigation | Metrics (per-request labels) | Traces |
| Per-user behavior | Metrics (user_id label) | Logs (filterable by user_id) / Events / Wide events |
| Heat-map of latency distribution | Metrics with high-cardinality labels | Histogram metrics + tracing |
Modern observability split: Prometheus for aggregate patterns, Loki/Elasticsearch for log-level detail, Tempo/Jaeger for per-request traces. The tools differ because the cost structures differ. Event-centric tools (Honeycomb, Datadog APM) handle high cardinality natively but cost more per event.
Wide Events — the Modern Alternative
Observability tools like Honeycomb (and increasingly, OpenTelemetry's philosophy) emphasize wide events: structured per-request records with 50-200 attributes. Queries are "show me p99 latency grouped by customer, region, version, feature-flag."
This is powerful — cardinality is no longer a budget constraint — but requires a different backend designed for it. Prometheus cannot do this. ClickHouse-backed tools, Honeycomb, Datadog APM, and similar CAN.
Many teams run:
- Prometheus for aggregate metrics with bounded cardinality (the golden signals, dashboards, alerts).
- Wide-event store for investigation (traces + spans + attributes).
- Loki / Elasticsearch for raw logs.
This gets you cardinality where you need it (per-request/per-user investigation) without blowing up Prometheus.
Migration Paths When Cardinality Explodes
When you discover a high-cardinality metric in production:
1. Stop the bleeding
Remove the offending label from the metric (change the instrumentation, redeploy).
2. Drop the old series
# Prometheus relabel config — drop a specific metric at scrape time
metric_relabel_configs:
- source_labels: [__name__]
regex: 'http_requests_total'
action: drop
# or better, drop the label:
- action: labeldrop
regex: user_id
Or, in Prometheus config, remote_write with write_relabel_configs to drop before shipping to long-term storage.
3. Clean up old TSDB data
Old series persist until retention expires (default 15 days). Either wait it out, or if urgent, delete the tsdb directory and restart (drastic; use only if the series are crashing the server).
4. Post-mortem
Add a cardinality-ceiling alert so this does not recur. Document the label-design rules for your team.
Common Cardinality Traps
Dynamically generated labels from user input
# User input becomes a label value — unbounded
http_request_errors_total{message=$userInput}
# Fix: classify into a bounded error enum first
Changing label values on deploy
# version label changes on every deploy
http_requests_total{version="abc123"}
http_requests_total{version="def456"}
OK if you deploy a few times a day. Dangerous if you deploy many times an hour or run many variants. Set a cardinality ceiling; consider dropping the label for long-term storage.
Kubernetes pod/instance labels
# pod names in k8s are dynamic (pod-abc-xyz123)
up{pod="my-app-abc-def12"}
This is mostly OK because pods have bounded lifetime, but if you never GC old series, it accumulates. Ensure Prometheus retention drops them.
Histogram buckets
Histograms with too many buckets increase cardinality. Typical Prometheus default: 10-15 buckets. Custom histograms sometimes ship with 50+; that is 50x the series count.
Service mesh / Envoy metrics
Istio, Linkerd, Envoy emit rich per-request metrics that can explode. Review their default metric config; filter to keep only what you need.
A team enabled Istio mesh monitoring and their Prometheus server RAM doubled overnight. Istios default metric set has dozens of labels (request method, response code, source service, source version, destination service, destination version, destination subset, connection security, etc.) and multiplies enormously in a cluster with many services. The fix was metric_relabel_configs to keep only the 5-6 labels they actually queried. Prometheus RAM dropped back to normal. Before deploying a mesh or a high-cardinality system, review and cap its metric output.
Summary Rules
- Never put user IDs, session IDs, trace IDs, email addresses, or timestamps in metric labels.
- Route patterns, not raw URLs.
/api/users/:id, not/api/users/12345. - Error classes, not error messages. Bounded enum.
- Histograms for distributions. Fixed bucket count.
- Monitor cardinality. Top-N-by-metric query on a regular dashboard.
- Cardinality ceilings as alerts. Catch accidents within minutes.
- Different tools for different questions. Metrics for aggregates, logs/traces/events for per-request.
- Review dependency-provided metrics. Istio, OpenTelemetry auto-instrumentation can add labels you did not plan for.
Key Concepts Summary
- Cardinality = number of unique label combinations. Each combination is a time series.
- Cardinality is the product of distinct values per label. Adding one high-cardinality label multiplies total series.
- Prometheus (and most metrics backends) cost scales with series count. Memory, storage, and query time all suffer.
- High-cardinality labels to avoid: user_id, trace_id, session_id, email, raw URL, timestamps, error messages.
- Low-cardinality labels to prefer: route pattern, method, status class, region, version, error class.
- Metrics answer aggregates; logs/traces answer per-request. Do not try to make metrics do both.
- Monitor cardinality routinely.
topk(10, count by (__name__) ({__name__=~".+"}))is your friend. - Set cardinality ceilings as alerts. The cheapest way to catch accidental explosions.
- Wide-event tools (Honeycomb, Datadog APM) handle high cardinality natively for investigation use cases.
- Histogram, not per-value label. For latency, size, or any distribution.
Common Mistakes
- Adding
user_id(or any per-entity ID) to metric labels. Near-guaranteed incident. - Putting full URLs with path parameters into labels. Use framework route pattern.
- Free-text error messages as labels. Use a bounded error-class enum.
- Enabling Istio/Envoy default metrics without reviewing. Default output is rich; trim to what you query.
- No cardinality alert. Accidents snowball for days before anyone notices the Prometheus RAM climbing.
- Treating Grafana Cloud/Datadog metrics as unlimited. They are not — high cardinality is billable.
- Thinking small team, small scale, cardinality does not matter yet. It compounds; start with discipline.
- Confusing per-pod labels (bounded by pod count) with per-user labels (unbounded by user count).
- Using percentile labels (
p="99") instead of histogram buckets. Histograms are designed for this; percentile-as-label is not. - Removing a high-cardinality label but forgetting to drop old series or clear retention — you may still hit OOM from old data.
Your Prometheus RAM has grown from 8 GB to 60 GB over a month. You check top metrics by cardinality and see `api_request_duration_seconds` has 4.2 million time series. One recent change: a dev added a `user_id` label to help debug a support ticket. What is the immediate remediation and the long-term fix?