Observability Pipeline — Cardinality Management¶
What This Use Case Achieves¶
Cardinality Management prevents high-cardinality attributes from silently inflating your backend costs and degrading dashboard performance. SemConv Proxy tracks the number of unique values for every discovered attribute using exact counting, HyperLogLog approximation, and Count-Min Sketch frequency estimation. It exposes budget utilization and per-attribute cardinality through a dedicated API endpoint and Prometheus metrics, enabling you to catch cardinality explosions before they cause incidents.
The Problem¶
High-cardinality attributes cause expensive backend queries, increased storage costs, and slow dashboards. An attribute like user.id or k8s.pod.name with thousands of unique values can explode your backend costs. But you often don't know which attributes are the worst offenders until it's too late.
Cardinality Management Flow¶
sequenceDiagram
participant Coll as OTel Collector
participant Proxy as SemConv Proxy
participant BE as Backend
participant Prom as Prometheus
participant Eng as SRE / On-Call
Coll->>Proxy: OTLP signals
par Forwarding
Proxy->>BE: Forward all signals
and Cardinality Tracking
Proxy->>Proxy: Count unique values per attribute
Proxy->>Proxy: Exact count (< cap) or HLL (> cap)
Proxy->>Proxy: Check global budget
alt Budget exceeded
Proxy->>Proxy: Evict least-recently-used entries
end
end
Proxy->>Prom: Expose cardinality metrics
Prom->>Eng: Alert: HighCardinalityDetected
Eng->>Proxy: GET /api/v1/cardinality
Proxy-->>Eng: Budget utilization + high-card attrs
Eng->>Proxy: GET /api/v1/dictionary/{attr}
Proxy-->>Eng: Top-K values for diagnosis
Note over Eng: Take action: drop, aggregate,<br/>or normalize the attribute
The Solution¶
SemConv Proxy tracks cardinality for every discovered attribute and exposes it through a dedicated API endpoint and Prometheus metrics.
Step-by-Step Implementation Guide¶
Step 1: Deploy with Cardinality Limits Configured¶
helm install semconv-proxy ./deployments/helm/semconv-proxy-chart/ \
--set config.backendEndpoint=otel-collector.observability:4317 \
--set config.globalBudget=10000 \
--set config.perAttrCap=1000
Step 2: Configure Prometheus Alerts for Cardinality¶
Add these Prometheus alerts to your monitoring stack:
groups:
- name: semconv-proxy-cardinality
rules:
- alert: HighCardinalityDetected
expr: semconv_proxy_cardinality_high_attributes > 5
for: 10m
labels:
severity: warning
annotations:
summary: "High-cardinality attributes detected"
- alert: CardinalityBudgetApproaching
expr: semconv_proxy_cardinality_budget_utilization > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Cardinality budget is above 80% utilization"
Step 3: Investigate High-Cardinality Attributes via API¶
When an alert fires, investigate:
Response:
{
"global_budget": {
"used": 342,
"limit": 10000,
"utilization_pct": 3.42
},
"attributes": [
{
"name": "k8s.pod.name",
"cardinality": 847,
"cap": 1000,
"utilization_pct": 84.7
},
{
"name": "user.id",
"cardinality": 523,
"cap": 1000,
"utilization_pct": 52.3
}
]
}
Filter by threshold to focus on the worst offenders:
Step 4: Diagnose with Top-K Value Analysis¶
Get the top values for a specific high-cardinality attribute:
Response:
[
{"value": "api-gateway-7d4f8b-x2k9", "approximate_count": 45230},
{"value": "api-gateway-7d4f8b-m3p1", "approximate_count": 12340},
{"value": "payment-svc-9c2e1-a5n7", "approximate_count": 8900}
]
Step 5: Remediate Based on Findings¶
Based on the cardinality data:
| Finding | Action |
|---|---|
k8s.pod.name has 847 unique values |
Add a Collector processor to drop or aggregate pod-level attributes |
user.id has 523 unique values |
Stop propagating user ID as a metric attribute; move to trace-only |
request.path has 200 unique values |
Normalize paths (e.g., /users/123 → /users/{id}) |
Step 6: Track Cardinality Reduction Over Time¶
After making changes, track cardinality reduction over time:
Built-in Safeguards¶
The proxy enforces cardinality limits automatically:
| Mechanism | Default | Behavior |
|---|---|---|
| Per-attribute cap | 1,000 | Switches to approximate counting above this limit |
| Global budget | 10,000 | Evicts least-recently-used entries when exceeded |
| TTL stale | 24 hours | Marks entries stale if not seen for 24h |
| TTL purge | 7 days | Removes stale entries after 7 days |
graph TD
Signal["Incoming Signal"] --> Track["Track unique values"]
Track --> Below{"Below<br/>per-attr cap?"}
Below -->|Yes| Exact["Exact counting"]
Below -->|No| Approx["HyperLogLog<br/>approximate"]
Exact --> Budget{"Global budget<br/>OK?"}
Approx --> Budget
Budget -->|Yes| Store["Store in<br/>dictionary"]
Budget -->|No| Evict["Evict least<br/>recently used"]
Evict --> Store
Result¶
You've identified the root cause of cardinality explosions in minutes instead of hours, using real data from the proxy's live dictionary.