Token Usage Attributes¶
Understanding the GenAI token usage attributes in OpenClaw observability.
Token Attributes Overview¶
| Attribute | Description | Cost Impact |
|---|---|---|
gen_ai.usage.input_tokens |
Tokens in the prompt sent to the model | Standard input rate |
gen_ai.usage.output_tokens |
Tokens in the model's response | Higher rate (typically 3-5x input) |
gen_ai.usage.cache_read_tokens |
Tokens read from prompt cache | 90% cheaper than input |
gen_ai.usage.cache_write_tokens |
Tokens written to prompt cache | 25% more expensive than input |
gen_ai.usage.total_tokens |
Sum of all token types | — |
How Tokens Are Calculated¶
Input Tokens¶
What counts: Everything you send to the model: - System prompt (AGENTS.md, SOUL.md, TOOLS.md, etc.) - Conversation history (all previous messages) - Current user message - Tool definitions and descriptions - Any injected context
Example breakdown:
System prompt: ~8,000 tokens (workspace files + tool list)
History: ~5,000 tokens (previous messages)
Current message: ~50 tokens (user's question)
─────────────────────────────────
Total input: ~13,050 tokens
Output Tokens¶
What counts: Everything the model generates: - The assistant's response text - Tool calls (function names + arguments) - Thinking/reasoning (if using extended thinking)
Note: Output tokens are typically 3-5x more expensive than input tokens.
Cache Read Tokens¶
What it is: Anthropic's prompt caching feature. When you send the same prefix (system prompt + early conversation), Claude can reuse a cached version instead of reprocessing it.
When it happens: - Same system prompt across requests - Stable conversation history prefix - Request made within cache TTL (typically 5 minutes)
Cost benefit: Cache reads cost ~90% less than regular input tokens.
Your example: 918,174 cache read tokens means the model reused ~918K tokens from cache instead of reprocessing them. This saved significant cost!
Cache Write Tokens¶
What it is: When new content is added to the cache for future requests.
When it happens: - First request with a new system prompt - Conversation grows beyond previously cached content - Cache TTL expired and content needs re-caching
Cost impact: Cache writes cost ~25% more than regular input tokens, but enable cheaper cache reads on subsequent requests.
Your example: 62,437 cache write tokens means new content was cached for future use.
Real-World Example¶
Your span shows:
cache_read: 918,174 tokens (reused from cache — very cheap!)
cache_write: 62,437 tokens (newly cached — slight premium)
input: 156 tokens (new content not in cache)
output: 17,831 tokens (model's response)
─────────────────────────────────
total: 998,598 tokens
What this means:
-
Large context reuse — 918K tokens were already cached (system prompt + conversation history). You paid ~10% of normal input cost for these.
-
Incremental caching — 62K new tokens were added to cache. Slightly more expensive now, but future requests can read them cheaply.
-
Minimal new input — Only 156 tokens were truly "new" input (probably just the latest message).
-
Reasonable output — 17,831 tokens is a substantial response (maybe code generation or detailed explanation).
Cost Calculation¶
Current Anthropic API Pricing (2026)¶
| Model | Input | Output | Cache Read | Cache Write |
|---|---|---|---|---|
| Opus 4.5 | $5/MTok | $25/MTok | $0.50/MTok | $6.25/MTok |
| Sonnet 4.5 | $3/MTok | $15/MTok | $0.30/MTok | $3.75/MTok |
| Haiku 4.5 | $1/MTok | $5/MTok | $0.10/MTok | $1.25/MTok |
(MTok = Million Tokens)
Example Cost Breakdown (Opus 4.5)¶
| Token Type | Count | Rate (per 1M) | Cost |
|---|---|---|---|
| Cache read | 918,174 | $0.50 | $0.46 |
| Cache write | 62,437 | $6.25 | $0.39 |
| Input | 156 | $5.00 | $0.001 |
| Output | 17,831 | $25.00 | $0.45 |
| Total | ~$1.30 |
Without caching, the same request would cost: - All input at standard rate: (918,174 + 62,437 + 156) × $5/1M = $4.90 - Output: $0.45 - Total without cache: $5.35
Savings from caching: ~76%
Important: Subscriptions vs API¶
| Claude Subscription | API (OpenClaw uses this) |
|---|---|
| Free: $0 | Pay per token |
| Pro: $20/month | No monthly fee |
| Max: $100-200/month | Billed to API account |
Your Claude Pro/Max subscription does NOT cover OpenClaw usage! OpenClaw uses the API, which is billed separately.
Why Total Doesn't Equal Sum¶
You might notice:
The total_tokens is the sum of all types. Some backends may calculate it differently or include additional overhead tokens.
Optimizing Token Usage¶
Reduce Input Tokens¶
- Trim workspace files — Keep AGENTS.md, SOUL.md concise
- Use
/compact— Summarize long conversations - Prune tool list — Disable unused tools/skills
Maximize Cache Hits¶
- Stable system prompts — Don't change workspace files frequently
- Consistent conversation prefix — Same session = better caching
- Heartbeat within TTL — Keep cache warm with periodic requests
Control Output¶
- Be specific — Vague prompts generate verbose responses
- Request concise answers — "Brief answer:" prefix helps
- Use appropriate models — Smaller models for simple tasks
Monitoring Token Usage¶
Key Metrics to Watch¶
# Total token cost over time
sum(rate(openclaw_tokens_total[5m])) by (model)
# Cache hit ratio
sum(openclaw_tokens{type="cache_read"}) /
sum(openclaw_tokens{type=~"cache_read|input"})
# Output to input ratio (efficiency)
sum(openclaw_tokens{type="output"}) /
sum(openclaw_tokens{type="input"})
Alerting Thresholds¶
Consider alerts for: - Single request > 100K output tokens - Cache hit ratio < 50% - Hourly cost > $X threshold