Tokens and budgets

Every proxied call writes a token-usage row. Policies can attach caps; the proxy enforces them before forwarding upstream.

Token usage rows

After a call completes (stream end or block), the proxy writes one row to token_usage:

Field	Notes
`run_id`	Joins to the audit run
`class_id`	Which class made the call
`provider`	`anthropic`, `mock`, etc.
`model`	The model string from the request
`input_tokens`	Parsed from `message_start.usage.input_tokens` in the SSE stream
`output_tokens`	Parsed from `message_delta.usage.output_tokens`
`cost_usd`	Currently always `0` — provider-pricing wire-up is a roadmap item
`recorded_at`	When the row was written

These rows feed:

per-class spend roll-ups
budget enforcement
whatever reporting you build on top

Budgets

A policy may include a budgets block:

budgets:
  cost_usd_per_day: 100.00
  cost_usd_per_month: 2500.00
  on_exceeded:
    action: block             # block | flag | throttle

The proxy checks both caps before running detectors:

Daily — sums cost_usd for the class over the last 24 hours (rolling window)
Monthly — sums over the last 30 days (rolling window)

If either cap is hit, the proxy applies on_exceeded.action:

Action	What happens
`block`	The proxy writes a `detector=budget` step with effect `Block`, closes the run with `final_effect=Block`, and raises `PolicyBlocked` → HTTP 403
`flag`	A step with effect `Flag` is recorded; the cascade continues; the call proceeds
`throttle`	Currently degrades to `block` with reason `"throttle action not yet implemented"` — rate-limit machinery is roadmap

Rolling windows are used instead of calendar-aligned windows for v1 to avoid timezone edge cases. Calendar-aligned reporting can land later if a customer asks.

Querying spend

HTTP

There’s no direct spend endpoint yet — query audit and token rows directly.

Postgres

-- Spend per class over the last 24h
SELECT c.slug, SUM(t.cost_usd) AS usd, SUM(t.input_tokens + t.output_tokens) AS tokens
FROM token_usage t
JOIN classes c ON t.class_id = c.id
WHERE t.recorded_at > now() - INTERVAL '24 hours'
GROUP BY c.slug
ORDER BY usd DESC;

-- Calls that contributed the most to spend in the last hour
SELECT t.run_id, c.slug, t.cost_usd, t.input_tokens, t.output_tokens
FROM token_usage t
JOIN classes c ON t.class_id = c.id
WHERE t.recorded_at > now() - INTERVAL '1 hour'
ORDER BY t.cost_usd DESC LIMIT 50;

Cost calculation

For v1, cost_usd is 0 for everything — token counts are parsed and stored, but per-provider pricing isn’t wired up. Customers can:

compute cost in their reporting layer using their per-model rates against the recorded token counts
when provider-aware pricing lands, the existing rows are unchanged but new rows will carry real cost_usd values

Caps as guardrails, not budgets

Caps are guardrails — they fail closed when you don’t trust the agent yet, or as a safety net while a class is in early production. They are not a substitute for upstream provider quotas or contract limits.

For per-key quotas at the provider level (e.g. Anthropic monthly limit), use the provider’s own controls. For per-class quotas inside your org, use Quayside.