Skip to content

Tokens and budgets

Every proxied call writes a token-usage row. Policies can attach caps; the proxy enforces them before forwarding upstream.

Token usage rows

After a call completes (stream end or block), the proxy writes one row to token_usage:

FieldNotes
run_idJoins to the audit run
class_idWhich class made the call
provideranthropic, mock, etc.
modelThe model string from the request
input_tokensParsed from message_start.usage.input_tokens in the SSE stream
output_tokensParsed from message_delta.usage.output_tokens
cost_usdCurrently always 0 — provider-pricing wire-up is a roadmap item
recorded_atWhen the row was written

These rows feed:

  • per-class spend roll-ups
  • budget enforcement
  • whatever reporting you build on top

Budgets

A policy may include a budgets block:

budgets:
cost_usd_per_day: 100.00
cost_usd_per_month: 2500.00
on_exceeded:
action: block # block | flag | throttle

The proxy checks both caps before running detectors:

  • Daily — sums cost_usd for the class over the last 24 hours (rolling window)
  • Monthly — sums over the last 30 days (rolling window)

If either cap is hit, the proxy applies on_exceeded.action:

ActionWhat happens
blockThe proxy writes a detector=budget step with effect Block, closes the run with final_effect=Block, and raises PolicyBlocked → HTTP 403
flagA step with effect Flag is recorded; the cascade continues; the call proceeds
throttleCurrently degrades to block with reason "throttle action not yet implemented" — rate-limit machinery is roadmap

Rolling windows are used instead of calendar-aligned windows for v1 to avoid timezone edge cases. Calendar-aligned reporting can land later if a customer asks.

Querying spend

HTTP

There’s no direct spend endpoint yet — query audit and token rows directly.

Postgres

-- Spend per class over the last 24h
SELECT c.slug, SUM(t.cost_usd) AS usd, SUM(t.input_tokens + t.output_tokens) AS tokens
FROM token_usage t
JOIN classes c ON t.class_id = c.id
WHERE t.recorded_at > now() - INTERVAL '24 hours'
GROUP BY c.slug
ORDER BY usd DESC;
-- Calls that contributed the most to spend in the last hour
SELECT t.run_id, c.slug, t.cost_usd, t.input_tokens, t.output_tokens
FROM token_usage t
JOIN classes c ON t.class_id = c.id
WHERE t.recorded_at > now() - INTERVAL '1 hour'
ORDER BY t.cost_usd DESC LIMIT 50;

Cost calculation

For v1, cost_usd is 0 for everything — token counts are parsed and stored, but per-provider pricing isn’t wired up. Customers can:

  • compute cost in their reporting layer using their per-model rates against the recorded token counts
  • when provider-aware pricing lands, the existing rows are unchanged but new rows will carry real cost_usd values

Caps as guardrails, not budgets

Caps are guardrails — they fail closed when you don’t trust the agent yet, or as a safety net while a class is in early production. They are not a substitute for upstream provider quotas or contract limits.

For per-key quotas at the provider level (e.g. Anthropic monthly limit), use the provider’s own controls. For per-class quotas inside your org, use Quayside.