Aadhaar
Standard cards, e-Aadhaar PDFs from the UIDAI portal, the masked print version, and DigiLocker-issued downloads. Verhoeff checksum is validated on every extraction.
- name
- dob
- gender
- aadhaar_masked
- address
- face_crop
Send an Aadhaar, a PAN, a passport, a driving licence, or a voter ID. The agent figures out which one it is, pulls the fields, validates the format and checksum, parses the MRZ when there is one, and crops the portrait for your liveness check. PII is masked by default — you opt in to see the rest.
Most ID extraction tools were built for one job: pull text from a card. That falls apart the moment the format changes, the photo is rotated, or your compliance team asks who signed off on the field.
| Capability | Generic ID OCR | Template-based KYC | DocPro ID Agent |
|---|---|---|---|
| Handles new card revisions without retraining | No | Re-template needed | Yes |
| Validates format and checksum per ID type | No | Format only | Both |
| Parses passport MRZ to ICAO 9303 | Lines as text | Sometimes | Full, with check digits |
| Cross-checks visual zone against MRZ | No | No | Yes |
| Front-back reconciliation on two-sided cards | No | Manual stitch | Built in |
| PII masked by default on response | No | No | Yes |
| Confidence score per field | No | Per doc only | Yes |
| Face crop ready for liveness providers | No | Bolt-on | Yes |
| Audit log: who, what, when, which model | No | Varies | Default |
| Data residency in India | Some vendors | Yes | Yes (default) |
The agent ships with handlers for the documents that show up in 95% of KYC workflows. Anything else is a custom agent we build during onboarding.
Standard cards, e-Aadhaar PDFs from the UIDAI portal, the masked print version, and DigiLocker-issued downloads. Verhoeff checksum is validated on every extraction.
Both physical and e-PAN. The 10-character format is validated: five letters, four digits, one letter, with the fourth letter indicating the holder type. Father's name and signature region captured where present.
The two-line, 44-character MRZ is parsed and each of the five check digits is verified. We surface the visual zone and the MRZ as separate objects so you can compare them, not trust one over the other silently.
Indian DLs vary wildly across states — old smart cards, new PVC, paper extensions, DigiLocker downloads. The agent treats each state as its own layout instead of forcing a national template.
Same flow as the rest of DocPro. Upload, classify, validate, return. The difference for IDs is what happens around the data: masking, face cropping, and the second-pass MRZ check.
Front and back as separate files, or both in one PDF. Photos taken on a phone are fine. The API accepts JPG, PNG, HEIC, and PDF up to 20 MB by default.
The agent identifies which ID type the image is, even when the photo is rotated, partially obscured, or shot at an angle. Deskew runs before extraction, not after.
Format checks per ID type. PAN structure, Aadhaar Verhoeff, passport MRZ check digits, DL state-format match, DOB and expiry sanity. Each check returns pass, fail, or warning.
Structured JSON shaped to your schema, PII masked by default. The portrait is cropped to a normalised size and exposed as a short-lived URL or inline base64.
An ID extraction API that returns full Aadhaar numbers in the default response is a data breach waiting for a misconfigured logger. We picked the safer default and made you opt in to anything looser.
unmasked_aadhaar scope, granted on a case basis.Anywhere a person has to look at an ID and type fields into a form. Onboarding queues, periodic refresh, employee verification, vendor due diligence.
Cut the time from "user uploads ID" to "user is approved" from 36 hours to under a minute on clean cases. Borderline ones still go to a human, just faster.
PAN, Aadhaar, and bank statement pulled together in one underwriting pipeline. Field-level confidence lets you auto-approve thin-file cases when every field clears 0.95.
Re-verify existing customers without making them resubmit. Compare what you have on file with what the latest upload says, and flag the mismatches for review.
HR teams stop typing PANs into spreadsheets. Procurement teams stop emailing vendor IDs around. Both flows hit the same endpoint with different schemas.
Patient ID and insurance card captured in the same upload. The face crop is reused for visit-time identity confirmation when the patient returns.
Renter, host, or rider verification with masked output. The rest of your platform never sees the full Aadhaar — only the verification result and the masked tail.
Admission packs include multiple IDs and proofs. Batch-mode extraction returns one envelope of structured records, not a folder of OCR text files.
Proposer ID, nominee ID, and address proof in one submission. Cross-document name matching catches the typos that used to surface only at claims time.
The KYC & ID Agent sits behind the same /v1/extract endpoint as the rest of DocPro. There is no separate KYC product to integrate, no separate SDK to install, no separate auth flow.
This endpoint sees more PII per request than anything else in DocPro. The defaults reflect that.
Annex A controls and an ISO 9001 quality regime. Audit artefacts available under NDA. SOC 2 Type II in progress.
Customer documents are not used to train foundation models. Per-tenant tuning is opt-in only, and the tuned weights stay scoped to that tenant.
Every extraction logs the document type, model version, rules fired, reviewer actions, and unmasking events. Exportable for internal and external audits.
If yours is not here, a half-hour call with an engineer is faster than three rounds of email.
Aadhaar (card, e-Aadhaar PDF, and the masked print version), PAN card, Indian passport (Type P), driving licence across all 36 state and UT formats, and Voter ID (EPIC). The agent also reads older laminated cards, the new PVC formats, and DigiLocker-issued PDFs. International coverage runs to US, UK, EU, UAE, Singapore, and the major Southeast Asian formats — anything else is a custom agent during onboarding.
By default the API returns the Aadhaar number masked to the last four digits, in line with UIDAI's published guidance for service providers. The full 12-digit value is only available when your tenant has the unmasked_aadhaar scope enabled and your storage configuration permits it. Either way, the Verhoeff checksum is validated and the result is included in the response.
Yes. The two-line, 44-character MRZ on Type P passports is parsed and every check digit is validated against ICAO 9303. Mismatches between the visual zone and the MRZ surface as a warning, not a silent overwrite. The agent returns both extractions in the response so you can decide which one to trust, or surface the diff to a reviewer.
Yes. The portrait region on the document is cropped, normalised to a configurable size, and exposed as a short-lived signed URL or inline base64. We do not run liveness or face-match ourselves on this endpoint — that is its own product category and other vendors are good at it. The crop is shaped for the major liveness providers and most in-house models.
US driver licences and state IDs, UK passports and DLs, EU national IDs and passports, UAE Emirates ID, Singapore NRIC, and the common Southeast Asian formats are in the standard library. Anything else is a custom agent we build during onboarding — usually four to six weeks from sample documents to production endpoint, depending on document volume and edge cases.
On a clean scan or a phone photo with both sides visible, field-level accuracy lands between 96% and 99%. Older laminated cards and poor lighting drag accuracy down, and the confidence score reflects that honestly. The review queue is built for the cases the agent is not sure about — that is the part of accuracy that actually matters in production. We will run your sample documents through the sandbox so you see the real number on your real data before you commit.
Yes. India region pinning is the default for Indian tenants — AWS Mumbai, Azure Pune, or Google Cloud Delhi NCR. EU, US, and APAC regions are available on enterprise plans. On-premise and air-gapped deployments are supported for banks and regulated tenants where the data cannot leave a specific network at all.
Controls are mapped to DPDPA (India) and GDPR (EU). We sign a Data Processing Agreement before any document moves — your paper or ours. Customer documents are not used to train foundation models. Retention defaults to seven days for raw uploads and is configurable for extracted JSON. Right-to-erasure requests are honoured within seven working days and logged.
Per document, with volume tiers. Pricing is indicative until your document mix is reviewed: high-volume Aadhaar and PAN workflows tier differently from passport and DL-heavy ones. Enterprise plans bundle committed throughput, customer-managed keys, on-prem deployment, priority support, and a named solutions engineer. Sandbox is free.
Yes. Send a batch of your own IDs to the sandbox and you get extracted JSON back, typically within a working day. No credit card and no sales call needed for the first run. If the output is useful, we talk pricing. If it is not, you still have real data to compare against whatever you are evaluating.
Aadhaars, PANs, passports, the old DL nobody can read — whatever your verification team is stuck on. We run them through the agent and send back the JSON, masked by default. No deck, no sales call, no NDA before the first batch.