AI in recruitment: CV extraction and legal compliance

A recruiter in a company with a thousand employees may review tens of thousands of CVs in a year. For a single specialist position, 300-600 applications can arrive within a week. Manually reviewing each one takes dozens of hours, and errors inevitably appear after the fourth hour of sifting through the same documents.

Where AI actually helps in HR#

The fastest ROI comes from three tasks that share common traits: they are repetitive, measurable, and previously done entirely manually.

Extraction of structured data from CVs. The model reads the document (PDF, DOCX, scan) and extracts fields into the ATS: first and last name, current title, years of experience, listed skills, universities, certifications, languages. This is what a recruiter does in the first 90 seconds of review—just without fatigue and with a full log. See OCR and extraction in the glossary.

Initial classification against job requirements. A classifier compares extracted features with the job profile and divides applications into three groups: meets mandatory requirements, partially meets, does not meet. The recruiter immediately sees which group to review first.

Search assistant in candidate database. RAG on an existing application database allows natural language queries: “show candidates with ERP implementation experience in manufacturing who applied in the last 12 months.” No indexing or database rewriting required.

System architecture: what runs locally, what in the cloud#

Application documents often contain PII: PESEL (in older templates), date of birth, residential address, photo, health information in cover letters. Rule: PII is masked before being sent to an external model.

In practice, this works as follows: a local OCR/parser component extracts text and anonymizes or pseudonymizes protected fields before the text reaches the inference layer. For companies with strict confidentiality requirements, the entire pipeline can run on their own infrastructure. See self-hosting and data residency.

Component	Can run remotely	Requires locality
Text parser for PDF/DOCX	Yes (stateless, no PII)	If file = sensitive data
PII anonymization	Locally (raw data never sent)	Always
Feature extraction from anonymized text	Yes	No
Candidate classifier	Yes (features without PII)	No
Results storage	Locally (company database)	Always
Full pipeline for medical data	No	Always

Bias and pitfalls no one talks about#

Models learn from historical data. If your company has hired mostly men from one university for a given position over the past 10 years, the model—without proper audit—will replicate this pattern and reward the same traits. Scale is deceptive: with manual review, bias spreads slowly; with a model classifying 500 CVs per minute, systematic error strikes immediately and at scale.

Practical countermeasures:

Remove from the profile passed to the classifier features unrelated to the position: first and last name (indicators of gender and ethnicity), age, photo.
Test the classifier on a test set with controlled demographic distribution and measure whether recommendation rates are comparable across groups.
Log every decision with evidence: which features influenced the recommendation. Without logs, you cannot demonstrate lack of discrimination.
Mandate regular audit of results in system documentation, not just at deployment.

This is not excessive caution. It’s an AI Act requirement for high-risk systems in employment.

The AI Act explicitly lists recruitment and candidate evaluation as a high-risk area. This entails specific obligations:

Obligation	What it means in practice
Technical documentation	Description of the model, training data, quality metrics, limitations
Human oversight	Recruiter sees AI recommendation and can reject it; decision rests with human
Explainability	Candidate can ask why they didn’t proceed; system must provide an answer
Log registry	Every candidate evaluation logged with timestamp, model version, feature set
DPIA	Data protection impact assessment when processing large numbers of candidates

Key boundary: a system that only extracts and organizes data without scoring or ranking candidates is not automatically high-risk. That’s why many implementations deliberately separate extraction (limited risk) from pre-selection (high risk) and document both layers separately.

For details on the legal regime, see the article AI Act and GDPR in 2026.

Integration with ATS and recruitment CRM#

Most companies already use some form of applicant tracking system (Workday, SAP SuccessFactors, Greenhouse, Teamtailor, local solutions). The AI layer doesn’t replace the ATS—it feeds it data.

The most common integration pattern:

Candidate submits an application via standard channel (form, email, portal).
Webhook or polling retrieves the new file and passes it to the extraction pipeline.
Pipeline returns a JSON structure to the ATS (standard fields + match features).
Recruiter sees the candidate with a populated profile and AI annotation, not the raw CV.
Recruiter’s action (rejection, invitation, transfer) is a business event, not an automated system decision.

Steps 2-3 are code and configuration. Step 5 is the boundary not crossed without human approval. See human-handoff in the glossary.

How to start: pilot on one channel#

Before rolling out the system across all recruitments, launch a pilot on one position or application channel. Choose a recruitment with a high number of applications and low risk (e.g., operational roles, not managerial). Measure baseline review time before implementation. After the pilot, measure it again.

If the result is positive, you have hard data for scaling. If the model makes systematic errors, you catch them on a small sample before wrongly rejecting a hundred candidates.

Check the organizational readiness assessment and ROI calculator to quantify recovered hours before making a decision.

Try it live#

Paste a job description and sample requirements, and the model will extract a structured profile for classification and indicate which features should remain outside AI evaluation for legal reasons (playground: PII masked, zero retention):

▶Build a job profile for AI extractionsandbox · reasoning

FAQ#

Can AI independently reject applications?#

It shouldn’t, if we’re talking about an AI Act-compliant system. Recruitment and candidate evaluation are explicitly listed as high-risk areas, meaning human oversight is mandatory. AI can prepare recommendations and prioritize reviews, but the decision to reject or invite rests with the recruiter. A human gate for irreversible actions is a legal requirement here, not just a best practice.

Direct identifiers (first name, last name, PESEL, address) and sensitive attributes (age, gender visible from name or photo, health information, religion, union affiliation). A good implementation pseudonymizes or removes these fields before passing the text to the inference layer. The extraction result is stored in the company’s system, not in a public cloud.

Do small HR departments also need bias audits?#

Yes. Bias doesn’t depend on department size but on the data the model was trained on. If you use an off-the-shelf external model, ask the provider for training dataset documentation and bias audit results. If you build your own classifier on historical company data, an audit is mandatory before production deployment.

How much does implementing AI in recruitment cost?#

Cost depends on scope: CV extraction alone without ATS integration is significantly cheaper than a full pipeline with a classifier, logging, and compliance dashboard. Instead of providing numbers that don’t account for your specifics, we invite you to use the ROI calculator and contact us via contact. Typically, starting with a fixed-cost pilot and verifying results before scaling is best.

How quickly can a pilot be launched?#

For a typical HR department, a pilot on one position (CV extraction + initial classification) can usually be launched within a few weeks, depending on input data format and integration level with the existing ATS. We start with a data audit and readiness assessment, and the first measurable results appear in the first recruitment after launch. For details, see the article how to start AI implementation.

AI in recruitment: CV extraction and legal compliance

Where AI actually helps in HR#

System architecture: what runs locally, what in the cloud#

Bias and pitfalls no one talks about#

AI Act and GDPR: implications for recruitment systems#

Integration with ATS and recruitment CRM#

How to start: pilot on one channel#

Try it live#

FAQ#

Can AI independently reject applications?#

Which CV data is protected by GDPR and shouldn’t be sent to the model?#

Do small HR departments also need bias audits?#

How much does implementing AI in recruitment cost?#

How quickly can a pilot be launched?#

AI in recruitment: CV extraction and legal compliance

Where AI actually helps in HR#

System architecture: what runs locally, what in the cloud#

Bias and pitfalls no one talks about#

AI Act and GDPR: implications for recruitment systems#

Integration with ATS and recruitment CRM#

How to start: pilot on one channel#

Try it live#

FAQ#

Can AI independently reject applications?#

Which CV data is protected by GDPR and shouldn’t be sent to the model?#

Do small HR departments also need bias audits?#

How much does implementing AI in recruitment cost?#

How quickly can a pilot be launched?#