Data Flow¶
Every OTLP signal that enters the proxy follows two parallel paths: a forwarding path that guarantees zero-loss delivery to the backend, and an analysis path that extracts semantic conventions.
Signal Lifecycle¶
sequenceDiagram
participant App as OTel Collector
participant Recv as OTLP Receiver
participant Fwd as Forwarder
participant RB as Ring Buffer
participant WP as Worker Pool
participant Dict as Dictionary
participant Peb as Pebble
participant API as REST API
participant BE as Backend
App->>Recv: OTLP Export Request
par Forwarding (synchronous)
Recv->>Fwd: Forward signal
Fwd->>BE: OTLP Export
BE-->>Fwd: ACK
and Analysis (async)
Recv->>RB: Enqueue AnalysisTask
RB->>WP: Worker picks up task
WP->>WP: Extract attributes
WP->>Dict: Upsert entries
Dict->>Peb: Async write-behind
end
Note over API: Dict changes visible via API immediately
Step-by-Step Breakdown¶
1. Signal Reception¶
The proxy listens on standard OTLP ports:
- OTLP/HTTP on port
4318— acceptsPOST /v1/metrics,/v1/traces,/v1/logs - OTLP/gRPC on port
4317— standard gRPC OTLP export service
The receiver decodes the incoming protobuf payload and creates two references:
- A forwarding reference — passed directly to the forwarder
- An analysis reference — wrapped in an
AnalysisTaskand enqueued to the ring buffer
Both references use zero-copy semantics. No data is cloned.
2. Signal Forwarding (Hot Path)¶
The forwarder sends the original OTLP request to the configured backend endpoint. This path is:
- Synchronous with the receiver — the export response waits for backend acknowledgment
- Never blocked by analysis or storage — completely independent goroutine path
- Retried on transient failures with exponential backoff (up to 3 retries)
- Monitored via
semconv_proxy_signals_forwarded_totalandsemconv_proxy_signals_dropped_total
If the backend is unreachable, the proxy continues analyzing signals and serving API requests. Forwarding retries in the background.
3. Ring Buffer (Decoupling Layer)¶
The ring buffer sits between signal reception and analysis processing:
- Fixed capacity (default: 10,000 tasks) — bounded memory usage
- Drop-oldest overflow — when full, the oldest analysis task is discarded
- Never blocks the sender — the write to the ring buffer is non-blocking
- Monitored via
semconv_proxy_pipeline_lagandsemconv_proxy_pipeline_drops_total
4. Worker Pool (Analysis)¶
A configurable pool of goroutines reads analysis tasks from the ring buffer:
- Default workers = number of CPU cores
- Each worker extracts attribute keys, value types, and signal metadata
- Extraction covers all three signal types: metrics, traces, and logs
For each signal type, the extractor pulls:
| Signal Type | Extracted Data |
|---|---|
| Metrics | Name, type (counter, gauge, histogram, etc.), unit, temporality, attributes |
| Traces | Span name, attributes, status code, parent-child relationships |
| Logs | Attributes, severity level, body field patterns |
5. Dictionary Update¶
Extracted attributes are upserted into the sharded dictionary:
- Upsert semantics — new entries are created, existing entries are updated
- Change detection — tracks whether an attribute is new, changed, or unchanged
- Timestamps —
first_seenandlast_seenupdated on every observation - Signal type tracking — each attribute records which signal types it appeared in
- Cardinality increment — unique value count updated per attribute
Dictionary mutations are immediately visible through the REST API.
6. Persistence (Write-Behind)¶
Dictionary mutations are asynchronously persisted to Pebble:
- Batch writes — up to 1,000 entries per batch, flushed every 100ms
- Key scheme —
{signal_type}:{attribute_name}(e.g.,metric:http.request.method) - Serialization — MessagePack for compact storage
- Non-blocking — persistence runs on a separate goroutine pool
7. API Access¶
The REST API reads from the in-memory dictionary:
- Concurrent reads — 64 shards with
sync.RWMutexallow lock-free reads on non-contended shards - No caching — API queries read live dictionary state directly
- Target latency — <10ms at p95 for queries returning up to 1,000 entries
Data Guarantees¶
| Guarantee | Mechanism |
|---|---|
| Zero signal loss on forwarding | Forwarding is synchronous, independent of analysis |
| No forwarding latency impact | Ring buffer decouples analysis from forwarding |
| Bounded memory | Cardinality caps + global budget + ring buffer limit |
| Crash recovery | Pebble persistence + recovery on startup |
| Immediate API consistency | In-memory dictionary reads, no cache staleness |