Buckets
| Bucket | Limit | Endpoints |
|---|---|---|
criteria_ai | 2 req/s + 4 concurrent in-flight | POST /v1/jobs/{jobId}/question-sets, POST /v1/jobs/{jobId}/criteria/generate |
scoring_intake_batch | 1 req/s | POST /v1/jobs/{jobId}/scoring-batches |
scoring_intake_single | 10 req/s | POST /v1/jobs/{jobId}/applications/{applicationId}/scoring-jobs |
read_and_ops | 20 req/s | All other /v1/* endpoints (except analytics and rate-limit status) |
analytics | 20 req/s | GET /v1/analytics/* |
rate_limit_status | 2 req/s | GET /v1/rate-limit-status |
criteria_ai bucket has an extra concurrency cap. At most 4 requests can process simultaneously, returning 429 even if RPS tokens are available.
If you need higher limits, contact us through the Embed Portal.
Response headers
All/v1/* responses include:
| Header | Description |
|---|---|
X-RateLimit-Bucket | Which bucket the request was classified into |
X-RateLimit-Limit | Max requests per second for this bucket |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
X-RateLimit-Degraded | "true" when rate limiting is in degraded mode |
Checking status
GET /v1/rate-limit-status returns all buckets at once without consuming tokens from other buckets. Only needs Authorization, not X-Tenant-Id.
Handling 429
When rate-limited, the response includes aRetry-After header (seconds to wait).
If you use
Idempotency-Key on criteria or library mutations, reuse the same key when retrying after a 429. Rate limiting runs before the idempotency replay layer. Scoring submissions use built-in scoring idempotency instead, so retry the same scoring request directly.Degraded mode
If Redis is temporarily unavailable, the system enters degraded mode: requests are allowed through (fail-open),X-RateLimit-Degraded: true is set, and header values are best-effort estimates. Normal limiting resumes automatically when Redis recovers.