Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nova.dweet.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Nova uses bias testing to look for unfair scoring patterns in AI-assisted candidate review. The goal is practical: find role-irrelevant differences in scores or rankings before they affect hiring workflows. Bias testing is one control, not a guarantee. It works alongside job-relevant criteria, explainable scoring, human review, data protection controls, and customer hiring policies.
Recruitment and candidate-evaluation AI systems may be subject to additional legal requirements in some jurisdictions, including the EU. This page focuses only on Nova’s bias-testing method.

What We Test

Nova’s current public bias test is a synthetic cohort stress test. It varies demographic-linked signals across generated profiles, scores the cohort, then reviews whether lower-selection groups need closer inspection before interpreting the result. We test for:
  • Average score differences across demographic groups.
  • Selection-rate differences at defined scoring thresholds.
  • Race and sex intersections where the sample size supports it.
  • Criteria or scoring patterns that could introduce role-irrelevant proxy signals.
  • Whether assessments remain tied to job criteria and candidate evidence.
The current public methodology covers sex, race and ethnicity, age, disability status, and race x sex intersections using synthetic candidate profiles.

Methodology

1

Define the role and criteria

We start with a representative job, job description, and scoring criteria. The criteria should be job-relevant, resume-verifiable, and separated by importance.
2

Create synthetic candidate profiles

We generate resumes with broadly comparable role-relevant qualifications and varied demographic-linked signals, such as names, locations, education signals, age-related experience patterns, or disability-related wording.
3

Score through an evaluation harness

The synthetic profiles are scored through a Nova scoring evaluation harness. The output is the candidate score and supporting evidence used for review.
4

Compare outcomes

We compare score distributions and selection rates across groups. The main check asks whether one group passes the review threshold much less often than another comparable group.
5

Review and remediate

If a test shows a material adverse-impact signal, we review the criteria, test profiles, and scoring evidence before treating the result as acceptable.

Metrics

MetricWhat it means
Score distributionWhether one group receives materially different scores from another group.
Selection rateThe share of a group scoring above the selected review threshold.
Impact ratioA group’s selection rate divided by the highest selection rate in the comparison.
Intersectional impact ratioThe same comparison across combined groups, such as race x sex.
Nova uses the four-fifths rule as one practical benchmark: a selection rate below 80% of the highest group rate is treated as a signal to review. For example, if 50% of the highest-selection group passes a review threshold, a comparable group below 40% would trigger review.
The four-fifths rule is a screening benchmark, not a complete fairness test. Small samples, unusual candidate pools, and role-specific requirements can all affect interpretation.

Result Labels

LabelMeaning
ClearThe synthetic test did not find a material adverse-impact signal.
ReviewThe test found a possible signal, so a person should review the result before drawing conclusions.
ConcernThe test found a stronger signal. Nova should investigate the criteria, test data, or scoring behavior before relying on that setup.
These labels describe this synthetic test result, not every role or customer workflow.

Current Public Evaluation

Nova has a public bias evaluation summary at nova.dweet.com/bias-evaluation. The public run was generated on May 26, 2025. It uses 500 synthetic profiles for a Senior Software Engineer role and compares scoring patterns across sex, race and ethnicity, age, disability status, and race x sex intersections. Use that page as the product-level summary of the public run. The public run is limited to a representative role and synthetic profiles. It does not evaluate customer-specific criteria, recruiter behavior, interview decisions, final hiring decisions, sourcing, ATS filters, or real applicant pools.

How We Use Findings

When a test raises a signal, Nova reviews it by checking:
  1. Check whether the synthetic profiles are comparable and realistic.
  2. Check whether the criterion is job-relevant and verifiable.
  3. Look for role-irrelevant proxies, such as school prestige, geography, name signals, age proxies, or accommodation wording.
  4. Review candidate-level scoring evidence to see what drove the score.
  5. Adjust criteria or test data where needed.
  6. Re-run the test and keep the before/after evidence.

Customer Responsibilities

Nova can test and explain its scoring behavior, but customers still control the hiring process. You should:
  • Configure job-relevant criteria.
  • Avoid criteria that rely on protected characteristics or weak proxies.
  • Review Nova outputs before making employment decisions.
  • Provide candidate, employee, or worker notices required by your policies and applicable law.
For scoring criteria guidance, see Configuring Scoring Criteria. For human review guidance, see AI Candidate Scoring.

How To Read The Results

Bias testing can show useful evidence, but it should be read in context:
  • Synthetic tests are designed to stress-test scoring. They do not represent every real applicant pool.
  • A clean result for one role does not prove the same result for every role.
  • Real-world outcomes also depend on sourcing, human review, interviews, and final decisions.
This is why Nova treats scoring as decision support. Human reviewers stay responsible for the final hiring process.