Skip to content

Data Flow

Every OTLP signal that enters the proxy follows two parallel paths: a forwarding path that guarantees zero-loss delivery to the backend, and an analysis path that extracts semantic conventions.

Signal Lifecycle

sequenceDiagram
    participant App as OTel Collector
    participant Recv as OTLP Receiver
    participant Fwd as Forwarder
    participant RB as Ring Buffer
    participant WP as Worker Pool
    participant Dict as Dictionary
    participant Peb as Pebble
    participant API as REST API
    participant BE as Backend

    App->>Recv: OTLP Export Request
    par Forwarding (synchronous)
        Recv->>Fwd: Forward signal
        Fwd->>BE: OTLP Export
        BE-->>Fwd: ACK
    and Analysis (async)
        Recv->>RB: Enqueue AnalysisTask
        RB->>WP: Worker picks up task
        WP->>WP: Extract attributes
        WP->>Dict: Upsert entries
        Dict->>Peb: Async write-behind
    end
    Note over API: Dict changes visible via API immediately

Step-by-Step Breakdown

1. Signal Reception

The proxy listens on standard OTLP ports:

  • OTLP/HTTP on port 4318 — accepts POST /v1/metrics, /v1/traces, /v1/logs
  • OTLP/gRPC on port 4317 — standard gRPC OTLP export service

The receiver decodes the incoming protobuf payload and creates two references:

  • A forwarding reference — passed directly to the forwarder
  • An analysis reference — wrapped in an AnalysisTask and enqueued to the ring buffer

Both references use zero-copy semantics. No data is cloned.

2. Signal Forwarding (Hot Path)

The forwarder sends the original OTLP request to the configured backend endpoint. This path is:

  • Synchronous with the receiver — the export response waits for backend acknowledgment
  • Never blocked by analysis or storage — completely independent goroutine path
  • Retried on transient failures with exponential backoff (up to 3 retries)
  • Monitored via semconv_proxy_signals_forwarded_total and semconv_proxy_signals_dropped_total

If the backend is unreachable, the proxy continues analyzing signals and serving API requests. Forwarding retries in the background.

3. Ring Buffer (Decoupling Layer)

The ring buffer sits between signal reception and analysis processing:

  • Fixed capacity (default: 10,000 tasks) — bounded memory usage
  • Drop-oldest overflow — when full, the oldest analysis task is discarded
  • Never blocks the sender — the write to the ring buffer is non-blocking
  • Monitored via semconv_proxy_pipeline_lag and semconv_proxy_pipeline_drops_total

4. Worker Pool (Analysis)

A configurable pool of goroutines reads analysis tasks from the ring buffer:

  • Default workers = number of CPU cores
  • Each worker extracts attribute keys, value types, and signal metadata
  • Extraction covers all three signal types: metrics, traces, and logs

For each signal type, the extractor pulls:

Signal Type Extracted Data
Metrics Name, type (counter, gauge, histogram, etc.), unit, temporality, attributes
Traces Span name, attributes, status code, parent-child relationships
Logs Attributes, severity level, body field patterns

5. Dictionary Update

Extracted attributes are upserted into the sharded dictionary:

  • Upsert semantics — new entries are created, existing entries are updated
  • Change detection — tracks whether an attribute is new, changed, or unchanged
  • Timestampsfirst_seen and last_seen updated on every observation
  • Signal type tracking — each attribute records which signal types it appeared in
  • Cardinality increment — unique value count updated per attribute

Dictionary mutations are immediately visible through the REST API.

6. Persistence (Write-Behind)

Dictionary mutations are asynchronously persisted to Pebble:

  • Batch writes — up to 1,000 entries per batch, flushed every 100ms
  • Key scheme{signal_type}:{attribute_name} (e.g., metric:http.request.method)
  • Serialization — MessagePack for compact storage
  • Non-blocking — persistence runs on a separate goroutine pool

7. API Access

The REST API reads from the in-memory dictionary:

  • Concurrent reads — 64 shards with sync.RWMutex allow lock-free reads on non-contended shards
  • No caching — API queries read live dictionary state directly
  • Target latency — <10ms at p95 for queries returning up to 1,000 entries

Data Guarantees

Guarantee Mechanism
Zero signal loss on forwarding Forwarding is synchronous, independent of analysis
No forwarding latency impact Ring buffer decouples analysis from forwarding
Bounded memory Cardinality caps + global budget + ring buffer limit
Crash recovery Pebble persistence + recovery on startup
Immediate API consistency In-memory dictionary reads, no cache staleness