Tokens and budgets
Every proxied call writes a token-usage row. Policies can attach caps; the proxy enforces them before forwarding upstream.
Token usage rows
After a call completes (stream end or block), the proxy writes one row to token_usage:
| Field | Notes |
|---|---|
run_id | Joins to the audit run |
class_id | Which class made the call |
provider | anthropic, mock, etc. |
model | The model string from the request |
input_tokens | Parsed from message_start.usage.input_tokens in the SSE stream |
output_tokens | Parsed from message_delta.usage.output_tokens |
cost_usd | Currently always 0 — provider-pricing wire-up is a roadmap item |
recorded_at | When the row was written |
These rows feed:
- per-class spend roll-ups
- budget enforcement
- whatever reporting you build on top
Budgets
A policy may include a budgets block:
budgets: cost_usd_per_day: 100.00 cost_usd_per_month: 2500.00 on_exceeded: action: block # block | flag | throttleThe proxy checks both caps before running detectors:
- Daily — sums
cost_usdfor the class over the last 24 hours (rolling window) - Monthly — sums over the last 30 days (rolling window)
If either cap is hit, the proxy applies on_exceeded.action:
| Action | What happens |
|---|---|
block | The proxy writes a detector=budget step with effect Block, closes the run with final_effect=Block, and raises PolicyBlocked → HTTP 403 |
flag | A step with effect Flag is recorded; the cascade continues; the call proceeds |
throttle | Currently degrades to block with reason "throttle action not yet implemented" — rate-limit machinery is roadmap |
Rolling windows are used instead of calendar-aligned windows for v1 to avoid timezone edge cases. Calendar-aligned reporting can land later if a customer asks.
Querying spend
HTTP
There’s no direct spend endpoint yet — query audit and token rows directly.
Postgres
-- Spend per class over the last 24hSELECT c.slug, SUM(t.cost_usd) AS usd, SUM(t.input_tokens + t.output_tokens) AS tokensFROM token_usage tJOIN classes c ON t.class_id = c.idWHERE t.recorded_at > now() - INTERVAL '24 hours'GROUP BY c.slugORDER BY usd DESC;
-- Calls that contributed the most to spend in the last hourSELECT t.run_id, c.slug, t.cost_usd, t.input_tokens, t.output_tokensFROM token_usage tJOIN classes c ON t.class_id = c.idWHERE t.recorded_at > now() - INTERVAL '1 hour'ORDER BY t.cost_usd DESC LIMIT 50;Cost calculation
For v1, cost_usd is 0 for everything — token counts are parsed and stored, but per-provider pricing isn’t wired up. Customers can:
- compute cost in their reporting layer using their per-model rates against the recorded token counts
- when provider-aware pricing lands, the existing rows are unchanged but new rows will carry real
cost_usdvalues
Caps as guardrails, not budgets
Caps are guardrails — they fail closed when you don’t trust the agent yet, or as a safety net while a class is in early production. They are not a substitute for upstream provider quotas or contract limits.
For per-key quotas at the provider level (e.g. Anthropic monthly limit), use the provider’s own controls. For per-class quotas inside your org, use Quayside.