Skip to main content
Rate limiting is per workspace (not per key), implemented as a token bucket in Postgres. Every request consumes 1 token; tokens refill at the sustained rate per second, capped at burst.

Plan tiers

PlanSustainedBurst
Free2 req/sec10 req/sec
Solo100 req/sec500 req/sec
Pro1,000 req/sec5,000 req/sec

Headers on every response

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 8
X-RateLimit-Reset: 1779587687
X-RateLimit-Reset is a Unix timestamp — when the bucket will next be full.

When you hit the cap

HTTP/1.1 429 Too Many Requests
Retry-After: 3
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1779587690
Content-Type: application/json

{
  "error": {
    "type": "rate_limit",
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded for plan \"free\". Retry in 3s."
  }
}
Respect Retry-After — it’s whole seconds.

Hard caps

The token bucket above is about pacing. The hard cap is about cost — a customer-set ceiling on how many API calls an agent can consume per billing period. The effective cap on agent traffic is the lesser of (plan cap, hard cap). Plan caps come from the Billing page (free=500, solo=100_000, pro=1_000_000). workspaces.hard_cap_api_calls is a customer-set ceiling on top — settable via PATCH /workspace, defaults to null (no extra limit beyond the plan). The counter increments atomically per request via the consume_quota(workspace_id) PL/pgSQL function (SELECT FOR UPDATE on the workspace row → lazy month rollover → cap check → increment). The response’s error message names which kind of cap was hit (plan vs hard):
HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": {
    "type": "rate_limit",
    "code": "cap_exceeded",
    "message": "Hard cap of 100000 calls exhausted this period. Raise the cap via PATCH /workspace or wait for the next calendar month."
  }
}
Period rollover is lazy — checked on every call, no cron required. The first request on the 1st of the month resets the counter automatically. Admin actions are exempt. The cap meters agent traffic only — requests authed with sk_live_… API keys or salty_oat_… OAuth tokens. JWT-authed requests from the admin UI (billing, raising the cap, browsing /records) stay reachable when the cap is exhausted, so a customer is never trapped. See Concepts → Billing for the plan limits.

Idempotency + rate limits

Cached idempotency replays do count against the rate limit. The cache speeds up your application; it doesn’t grant immunity from pacing.