Token Usage Attributes¶

Understanding the GenAI token usage attributes in OpenClaw observability.

Token Attributes Overview¶

Attribute	Description	Cost Impact
`gen_ai.usage.input_tokens`	Tokens in the prompt sent to the model	Standard input rate
`gen_ai.usage.output_tokens`	Tokens in the model's response	Higher rate (typically 3-5x input)
`gen_ai.usage.cache_read.input_tokens`	Tokens read from prompt cache	90% cheaper than input
`gen_ai.usage.cache_creation.input_tokens`	Tokens written to prompt cache	25% more expensive than input

Schema 1.3.0 (ISI-1004) removed the legacy gen_ai.usage.cache_read_tokens / gen_ai.usage.cache_write_tokens / gen_ai.usage.total_tokens keys. Total tokens are now computed as gen_ai.usage.input_tokens + gen_ai.usage.output_tokens; cache reads / writes use the stable cache_read.input_tokens / cache_creation.input_tokens keys.

How Tokens Are Calculated¶

Input Tokens¶

What counts: Everything you send to the model: - System prompt (AGENTS.md, SOUL.md, TOOLS.md, etc.) - Conversation history (all previous messages) - Current user message - Tool definitions and descriptions - Any injected context

Example breakdown:

System prompt:     ~8,000 tokens (workspace files + tool list)
History:           ~5,000 tokens (previous messages)
Current message:      ~50 tokens (user's question)
─────────────────────────────────
Total input:      ~13,050 tokens

Output Tokens¶

What counts: Everything the model generates: - The assistant's response text - Tool calls (function names + arguments) - Thinking/reasoning (if using extended thinking)

Note: Output tokens are typically 3-5x more expensive than input tokens.

Cache Read Tokens¶

What it is: Anthropic's prompt caching feature. When you send the same prefix (system prompt + early conversation), Claude can reuse a cached version instead of reprocessing it.

When it happens: - Same system prompt across requests - Stable conversation history prefix - Request made within cache TTL (typically 5 minutes)

Cost benefit: Cache reads cost ~90% less than regular input tokens.

Your example: 918,174 cache read tokens means the model reused ~918K tokens from cache instead of reprocessing them. This saved significant cost!

Cache Write Tokens¶

What it is: When new content is added to the cache for future requests.

When it happens: - First request with a new system prompt - Conversation grows beyond previously cached content - Cache TTL expired and content needs re-caching

Cost impact: Cache writes cost ~25% more than regular input tokens, but enable cheaper cache reads on subsequent requests.

Your example: 62,437 cache write tokens means new content was cached for future use.

Real-World Example¶

Your span shows:

cache_read:   918,174 tokens  (reused from cache — very cheap!)
cache_write:   62,437 tokens  (newly cached — slight premium)
input:            156 tokens  (new content not in cache)
output:        17,831 tokens  (model's response)
─────────────────────────────────
total:        998,598 tokens

What this means:

Large context reuse — 918K tokens were already cached (system prompt + conversation history). You paid ~10% of normal input cost for these.
Incremental caching — 62K new tokens were added to cache. Slightly more expensive now, but future requests can read them cheaply.
Minimal new input — Only 156 tokens were truly "new" input (probably just the latest message).
Reasonable output — 17,831 tokens is a substantial response (maybe code generation or detailed explanation).

Cost Calculation¶

Current Anthropic API Pricing (2026)¶

Model	Input	Output	Cache Read	Cache Write
Opus 4.5	$5/MTok	$25/MTok	$0.50/MTok	$6.25/MTok
Sonnet 4.5	$3/MTok	$15/MTok	$0.30/MTok	$3.75/MTok
Haiku 4.5	$1/MTok	$5/MTok	$0.10/MTok	$1.25/MTok

(MTok = Million Tokens)

Example Cost Breakdown (Opus 4.5)¶

Token Type	Count	Rate (per 1M)	Cost
Cache read	918,174	$0.50	$0.46
Cache write	62,437	$6.25	$0.39
Input	156	$5.00	$0.001
Output	17,831	$25.00	$0.45
Total			~$1.30

Without caching, the same request would cost: - All input at standard rate: (918,174 + 62,437 + 156) × $5/1M = $4.90 - Output: $0.45 - Total without cache: $5.35

Savings from caching: ~76%

Important: Subscriptions vs API¶

Claude Subscription	API (OpenClaw uses this)
Free: $0	Pay per token
Pro: $20/month	No monthly fee
Max: $100-200/month	Billed to API account

Your Claude Pro/Max subscription does NOT cover OpenClaw usage! OpenClaw uses the API, which is billed separately.

Why Total Doesn't Equal Sum¶

You might notice:

cache_read + cache_write + input + output ≠ a backend's reported total

Adding cache_read.input_tokens + cache_creation.input_tokens + input_tokens + output_tokens may not equal a backend's reported total — different backends count overhead tokens differently. As of schema 1.3.0, OpenClaw no longer emits gen_ai.usage.total_tokens; compute it as gen_ai.usage.input_tokens + gen_ai.usage.output_tokens on the consumer side.

Optimizing Token Usage¶

Reduce Input Tokens¶

Trim workspace files — Keep AGENTS.md, SOUL.md concise
Use /compact — Summarize long conversations
Prune tool list — Disable unused tools/skills

Maximize Cache Hits¶

Stable system prompts — Don't change workspace files frequently
Consistent conversation prefix — Same session = better caching
Heartbeat within TTL — Keep cache warm with periodic requests

Control Output¶

Be specific — Vague prompts generate verbose responses
Request concise answers — "Brief answer:" prefix helps
Use appropriate models — Smaller models for simple tasks

Monitoring Token Usage¶

Key Metrics to Watch¶

# Total token cost over time
sum(rate(openclaw_tokens_total[5m])) by (model)

# Cache hit ratio
sum(openclaw_tokens{type="cache_read"}) / 
sum(openclaw_tokens{type=~"cache_read|input"})

# Output to input ratio (efficiency)
sum(openclaw_tokens{type="output"}) /
sum(openclaw_tokens{type="input"})

Alerting Thresholds¶

Consider alerts for: - Single request > 100K output tokens - Cache hit ratio < 50% - Hourly cost > $X threshold