Rate Limits

Rate limits protect the API from traffic spikes while keeping throughput predictable. Every authenticated /v1/* request is rate-limited.

How it works

Rate limiting is partner-scoped: limits are shared across all tenants under the same API key and environment. Each request is classified into one of five buckets based on its HTTP method and path. Each bucket has its own per-second limit.

Buckets

Bucket	Limit	Endpoints
`criteria_ai`	2 req/s + 4 concurrent in-flight	`POST /v1/jobs/{jobId}/question-sets`, `POST /v1/jobs/{jobId}/criteria/generate`
`scoring_intake_batch`	1 req/s	`POST /v1/jobs/{jobId}/scoring-batches`
`scoring_intake_single`	10 req/s	`POST /v1/jobs/{jobId}/applications/{applicationId}/scoring-jobs`
`read_and_ops`	20 req/s	All other `/v1/*` endpoints (reads, criteria CRUD, library, etc.)
`rate_limit_status`	2 req/s	`GET /v1/rate-limit-status`

Buckets are isolated: exhausting one bucket does not affect others.

In-flight concurrency (criteria_ai)

The criteria_ai bucket has an additional concurrency cap of 4 in-flight requests. This means at most 4 criteria generation or question set requests can be processing simultaneously. If the in-flight limit is reached, the API returns 429 even if RPS tokens are available.

If you need higher limits, contact us through the Embed Portal.

Rate limit headers

All /v1/* responses include rate limit headers for the bucket the request was classified into:

X-RateLimit-Bucket: scoring_intake_single
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 8
X-RateLimit-Reset: 1734187201

Header	Description
`X-RateLimit-Bucket`	Which rate limit bucket the request was classified into
`X-RateLimit-Limit`	Maximum requests allowed per second for this bucket
`X-RateLimit-Remaining`	Requests remaining in the current 1-second window
`X-RateLimit-Reset`	Unix timestamp when the current window resets
`X-RateLimit-Degraded`	Present and set to `"true"` when rate limiting is operating in degraded mode (see below)

Checking your status

Use GET /v1/rate-limit-status to check the status of all buckets at once without consuming rate limit tokens from any bucket other than rate_limit_status. This endpoint does not require X-Tenant-Id, only a valid API key.

Handling 429 responses

When a rate limit is exceeded:

HTTP status is 429
Retry-After header tells you how many seconds to wait
Rate limit headers are included so you can see which bucket was exhausted

{
  "type": "https://embed.nova.dweet.com/errors/rate-limited",
  "code": "RATE_LIMITED",
  "status": 429,
  "message": "Rate limit exceeded. Retry after 2 seconds.",
  "retryable": true,
  "traceId": "5c2f4f5b2c0a4ce0b6a31a1a18f8e9a1"
}

For high-volume backfills, use POST /v1/jobs/{jobId}/scoring-batches with up to 25 applications per request. This lets you score many candidates within the scoring_intake_batch bucket’s 1 req/s limit while keeping recovery clean if you hit 429.

Degraded mode

Rate limiting uses Redis internally. If Redis becomes temporarily unavailable, the system enters degraded mode:

Requests are allowed through (fail-open posture) to avoid blocking your integration
The X-RateLimit-Degraded: true header is set on responses
Rate limit values in headers and the status endpoint are best-effort estimates

Degraded mode is transient. Once Redis recovers, normal rate limiting resumes automatically.

Getting started

Core concepts

API endpoints

Integration guides

How it works

Buckets

In-flight concurrency (criteria_ai)

Rate limit headers

Checking your status

Handling 429 responses

Degraded mode

Getting started

Core concepts

API endpoints

Integration guides

​How it works

​Buckets

​In-flight concurrency (criteria_ai)

​Rate limit headers

​Checking your status

​Handling 429 responses

​Degraded mode

How it works

Buckets

In-flight concurrency (criteria_ai)

Rate limit headers

Checking your status

Handling 429 responses

Degraded mode