Rate limits protect the API from traffic spikes while keeping throughput predictable. Every authenticated /v1/* request is rate-limited.
How it works
Rate limiting is partner-scoped: limits are shared across all tenants under the same API key and environment. Each request is classified into one of five buckets based on its HTTP method and path. Each bucket has its own per-second limit.
Buckets
| Bucket | Limit | Endpoints |
|---|
criteria_ai | 2 req/s + 4 concurrent in-flight | POST /v1/jobs/{jobId}/question-sets, POST /v1/jobs/{jobId}/criteria/generate |
scoring_intake_batch | 1 req/s | POST /v1/jobs/{jobId}/scoring-batches |
scoring_intake_single | 10 req/s | POST /v1/jobs/{jobId}/applications/{applicationId}/scoring-jobs |
read_and_ops | 20 req/s | All other /v1/* endpoints (reads, criteria CRUD, library, etc.) |
rate_limit_status | 2 req/s | GET /v1/rate-limit-status |
Buckets are isolated: exhausting one bucket does not affect others.
In-flight concurrency (criteria_ai)
The criteria_ai bucket has an additional concurrency cap of 4 in-flight requests. This means at most 4 criteria generation or question set requests can be processing simultaneously. If the in-flight limit is reached, the API returns 429 even if RPS tokens are available.
If you need higher limits, contact us through the Embed Portal.
All /v1/* responses include rate limit headers for the bucket the request was classified into:
X-RateLimit-Bucket: scoring_intake_single
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 8
X-RateLimit-Reset: 1734187201
| Header | Description |
|---|
X-RateLimit-Bucket | Which rate limit bucket the request was classified into |
X-RateLimit-Limit | Maximum requests allowed per second for this bucket |
X-RateLimit-Remaining | Requests remaining in the current 1-second window |
X-RateLimit-Reset | Unix timestamp when the current window resets |
X-RateLimit-Degraded | Present and set to "true" when rate limiting is operating in degraded mode (see below) |
Checking your status
Use GET /v1/rate-limit-status to check the status of all buckets at once without consuming rate limit tokens from any bucket other than rate_limit_status.
This endpoint does not require X-Tenant-Id, only a valid API key.
Handling 429 responses
When a rate limit is exceeded:
- HTTP status is
429
Retry-After header tells you how many seconds to wait
- Rate limit headers are included so you can see which bucket was exhausted
{
"type": "https://embed.nova.dweet.com/errors/rate-limited",
"code": "RATE_LIMITED",
"status": 429,
"message": "Rate limit exceeded. Retry after 2 seconds.",
"retryable": true,
"traceId": "5c2f4f5b2c0a4ce0b6a31a1a18f8e9a1"
}
For high-volume backfills, use POST /v1/jobs/{jobId}/scoring-batches with up to 25 applications per request. This lets you score many candidates within the scoring_intake_batch bucket’s 1 req/s limit while keeping recovery clean if you hit 429.
Degraded mode
Rate limiting uses Redis internally. If Redis becomes temporarily unavailable, the system enters degraded mode:
- Requests are allowed through (fail-open posture) to avoid blocking your integration
- The
X-RateLimit-Degraded: true header is set on responses
- Rate limit values in headers and the status endpoint are best-effort estimates
Degraded mode is transient. Once Redis recovers, normal rate limiting resumes automatically.