Skip to main content
Rate limits protect the API from traffic spikes while keeping throughput predictable. Every authenticated /v1/* request is rate-limited.

How it works

Rate limiting is partner-scoped: limits are shared across all tenants under the same API key and environment. Each request is classified into one of five buckets based on its HTTP method and path. Each bucket has its own per-second limit.

Buckets

BucketLimitEndpoints
criteria_ai2 req/s + 4 concurrent in-flightPOST /v1/jobs/{jobId}/question-sets, POST /v1/jobs/{jobId}/criteria/generate
scoring_intake_batch1 req/sPOST /v1/jobs/{jobId}/scoring-batches
scoring_intake_single10 req/sPOST /v1/jobs/{jobId}/applications/{applicationId}/scoring-jobs
read_and_ops20 req/sAll other /v1/* endpoints (reads, criteria CRUD, library, etc.)
rate_limit_status2 req/sGET /v1/rate-limit-status
Buckets are isolated: exhausting one bucket does not affect others.

In-flight concurrency (criteria_ai)

The criteria_ai bucket has an additional concurrency cap of 4 in-flight requests. This means at most 4 criteria generation or question set requests can be processing simultaneously. If the in-flight limit is reached, the API returns 429 even if RPS tokens are available.
If you need higher limits, contact us through the Embed Portal.

Rate limit headers

All /v1/* responses include rate limit headers for the bucket the request was classified into:
X-RateLimit-Bucket: scoring_intake_single
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 8
X-RateLimit-Reset: 1734187201
HeaderDescription
X-RateLimit-BucketWhich rate limit bucket the request was classified into
X-RateLimit-LimitMaximum requests allowed per second for this bucket
X-RateLimit-RemainingRequests remaining in the current 1-second window
X-RateLimit-ResetUnix timestamp when the current window resets
X-RateLimit-DegradedPresent and set to "true" when rate limiting is operating in degraded mode (see below)

Checking your status

Use GET /v1/rate-limit-status to check the status of all buckets at once without consuming rate limit tokens from any bucket other than rate_limit_status. This endpoint does not require X-Tenant-Id, only a valid API key.

Handling 429 responses

When a rate limit is exceeded:
  • HTTP status is 429
  • Retry-After header tells you how many seconds to wait
  • Rate limit headers are included so you can see which bucket was exhausted
{
  "type": "https://embed.nova.dweet.com/errors/rate-limited",
  "code": "RATE_LIMITED",
  "status": 429,
  "message": "Rate limit exceeded. Retry after 2 seconds.",
  "retryable": true,
  "traceId": "5c2f4f5b2c0a4ce0b6a31a1a18f8e9a1"
}
For high-volume backfills, use POST /v1/jobs/{jobId}/scoring-batches with up to 25 applications per request. This lets you score many candidates within the scoring_intake_batch bucket’s 1 req/s limit while keeping recovery clean if you hit 429.

Degraded mode

Rate limiting uses Redis internally. If Redis becomes temporarily unavailable, the system enters degraded mode:
  • Requests are allowed through (fail-open posture) to avoid blocking your integration
  • The X-RateLimit-Degraded: true header is set on responses
  • Rate limit values in headers and the status endpoint are best-effort estimates
Degraded mode is transient. Once Redis recovers, normal rate limiting resumes automatically.