Plan tiers
| Plan | Sustained | Burst |
|---|---|---|
| Free | 2 req/sec | 10 req/sec |
| Solo | 100 req/sec | 500 req/sec |
| Pro | 1,000 req/sec | 5,000 req/sec |
Headers on every response
X-RateLimit-Reset is a Unix timestamp — when the bucket will next be full.
When you hit the cap
Retry-After — it’s whole seconds.
Hard caps
The token bucket above is about pacing. The hard cap is about cost — a customer-set ceiling on how many API calls an agent can consume per billing period. The effective cap on agent traffic is the lesser of (plan cap, hard cap). Plan caps come from the Billing page (free=500, solo=100_000, pro=1_000_000). workspaces.hard_cap_api_calls is a customer-set ceiling on top — settable via PATCH /workspace, defaults to null (no extra limit beyond the plan). The counter increments atomically per request via the consume_quota(workspace_id) PL/pgSQL function (SELECT FOR UPDATE on the workspace row → lazy month rollover → cap check → increment). The response’s error message names which kind of cap was hit (plan vs hard):
sk_live_… API keys or salty_oat_… OAuth tokens. JWT-authed requests from the admin UI (billing, raising the cap, browsing /records) stay reachable when the cap is exhausted, so a customer is never trapped. See Concepts → Billing for the plan limits.