DocPro · Agentic Document Extraction

Document processing that reads, reasons, and hands you structured data.

DocPro is an agentic document extraction (ADE). Send an invoice, a bank statement, a KYC pack, or a receipt. The agent figures out what kind of document it is, pulls the fields you care about, checks them against your rules, and returns JSON your ERP can post without a human re-keying anything.

95–99% field accuracy · ISO 27001–aligned
Feature matrix

DocPro vs. legacy OCR vs. rule-based IDP

What actually differs when a new invoice layout shows up on a Monday morning.

Capability Legacy OCR Rule-based IDP DocPro ADE
Reads unseen layouts without retraining No Rare Yes
Confidence score per field No Per doc only Yes
Cross-page validation No If coded Yes
Custom schema output No Yes Yes
Handles rotated / blurry scans Sometimes Sometimes Yes
Human-in-the-loop routing No Bolt-on Built in
Language coverage Latin scripts Latin + some 40+
Time to onboard a new document type Months Weeks, every edge case Days to weeks
Audit log + replay No Varies Yes
On-prem / air-gapped option Legacy only Yes Yes
Document agents

One API. A library of document agents you can call by name.

Each agent is tuned for a specific document type. Call a single endpoint, pass the agent name, and you get back the schema that agent produces. Custom agents for the paperwork you cannot find on this list are part of standard onboarding.

Invoice Agent

Reads vendor invoices in any layout — tax invoices, proforma, credit notes, utility bills — and returns a full header plus line-item breakdown. Built for AP teams that are tired of keying 200 invoices a day.

  • Header + line items + taxes, separately tagged
  • 6-way match against PO, GRN, contract
  • Duplicate detection across vendor + invoice number + date
  • GSTIN and tax computation checks on Indian invoices
Invoice agent details →

Bank Statement Agent

Converts scanned or PDF bank statements from any major bank into clean transaction rows. Used in loan origination, reconciliation, and fraud checks where the source PDF is non-negotiable.

  • Per-transaction parsing: date, narration, debit, credit, balance
  • Opening and closing balance reconciliation
  • Transaction categorisation for underwriting
  • Flags suspicious patterns for analyst review
Bank statement agent details →

KYC & ID Agent

Identity documents across India and beyond — PAN, Aadhaar, driver licences, passports, voter IDs. Extracts the fields, validates the format, and runs the MRZ check where applicable.

  • Format and checksum validation by document type
  • Face-on-document capture for downstream liveness
  • MRZ parsing on passports
  • Masked output option for PII-sensitive pipelines
KYC & ID agent details →

Receipt Agent

Payment receipts, expense receipts, the crumpled ones from a cab ride. The agent pulls totals, tax, merchant name, and category, and books up to 95% of receipts without a human in the loop.

  • Merchant recognition across thousands of formats
  • Tip, tax, and total split automatically
  • Category tagging for expense policies
  • Photo-quality scoring so users know when to retake
Receipt agent details →

Purchase Order Agent

Reads the POs your buyers still send as PDFs and the ones your vendors print on templates from 2004. Extracts line items, payment terms, delivery windows, and the fields your procurement team actually checks.

  • Line-item level extraction with UOM normalisation
  • Payment terms and Incoterms capture
  • PO-to-contract linkage for three-way match
  • Supplier master auto-lookup
Purchase order schema →

Custom Document Agents

If your document is not on this page, that is expected. Shipping manifests, loan application packs, insurance claim forms, tax notices — we build the agent during onboarding and hand you back a schema you approve.

  • Schema defined with your team in a working session
  • Tuned on a representative sample of your documents
  • Versioned like code, updated when your documents change
  • Typically in production within four to six weeks
Describe your document →
How it works

From upload to posted record in four steps.

No black box. At every step you get the data, the rules that fired, and the confidence score.

  1. Upload

    Send a PDF, image, or multi-page scan to /v1/extract. Single file, a zip, or a webhook-based drop folder — whichever fits your workflow.

  2. Classify & read

    The agent identifies the document type, locates the regions that matter, and pulls the text with a model trained to handle skew, low-resolution scans, and unfamiliar layouts.

  3. Validate

    Your rules run next. Tax math, PO match, duplicate check, vendor whitelist, format validation. Each rule returns a pass, fail, or warning alongside the fields it checked.

  4. Return JSON

    Structured JSON shaped to your schema, with a confidence score per field. High-confidence rows auto-post. Low-confidence ones land in a review queue for a human to confirm.

Built for engineers

One endpoint. Your schema. Your rules.

DocPro is a REST API, not a portal your developers have to wrap in yet another abstraction. Upload the document, pass the agent name and the schema identifier, get back validated JSON.

  • SDKs for Python, Node.js, Java, and .NET.
  • Webhooks for async batches and large workloads.
  • Schema definitions checked into Git like any other contract.
  • Rules expressed as predicates you can read in a code review.
  • Every request carries an audit trail: model version, rule set, reviewer, timestamp.
Who uses it

Built for teams that move paperwork for a living.

Finance, banking, and operations teams where the document backlog is real and the cost of a miskeyed field is measured in hours of reconciliation — sometimes worse.

95–99%
Field accuracy
4–6 wks
From signed SOW to prod
3+ yrs
Enterprise delivery
24/7
Support on paid plans
Security & compliance

Regulated workloads, treated like regulated workloads.

Finance, healthcare, and lending teams run DocPro on documents that carry real PII. The defaults reflect that.

ISO 27001 & ISO 9001-aligned

Controls mapped to ISO 27001 Annex A and an ISO 9001 quality regime. Audit artefacts are available under NDA.

Region-pinned data

Pick the cloud region on AWS, Azure, or Google Cloud. On-premise and air-gapped deployments are available for regulated environments.

DPA before document one

We co-sign a Data Processing Agreement before any document leaves your environment. Use your paper or ours, either works.

No training on your data

Customer documents are not used to train foundation models. Period. Your agents can be tuned on your data only when you opt in, per tenant.

Audit trail by default

Every extraction logs the model version, rule set, reviewer actions, and timestamps. Exportable for internal and external audits.

Encryption at rest & in transit

TLS 1.2+ on the wire, AES-256 at rest. Customer-managed keys available on enterprise plans.

FAQ

What engineers and finance leads usually ask first

If your question is not here, a 30-minute call with an engineer is usually faster than email.

How is DocPro different from an OCR API?

OCR gives you text. DocPro gives you structured fields. The agent reads the document, figures out what kind of document it is, extracts the fields you care about, checks them against your business rules, and returns JSON with a confidence score on every field. Your team stops re-keying OCR output into a ledger.

What document types does DocPro handle out of the box?

Invoices, purchase orders, bank statements, payment receipts, KYC and ID documents, delivery notes, remittance advices, and expense reports. Anything outside that list is handled by a custom agent we build during onboarding — usually in four to six weeks, sometimes sooner.

What accuracy should we expect?

Field-level accuracy lands between 95% and 99% on standard formats once the agent is tuned to your documents. Every field carries a confidence score so you can auto-post the high-confidence ones and route the rest to a human reviewer. The real question is rarely raw accuracy — it is what you do with the 1 to 5% that needs a second look. The review queue is built for exactly that.

Can DocPro read scanned, rotated, or photographed documents?

Yes. The pipeline handles skew, rotation, multi-page PDFs, mixed-quality scans, and photos taken on a phone. Poor scans lower confidence scores, which means they go to review rather than posting bad data silently. If the photo is genuinely unreadable, the API says so instead of guessing.

How do we integrate DocPro?

A single REST endpoint. Upload the file, receive structured JSON, hand the JSON to your ERP, loan-origination system, or in-house tool. SDKs ship for Python, Node.js, Java, and .NET. Webhooks are available for async workflows, and a review UI ships with every tenant so non-engineers can clear the exception queue.

Where is our document data stored?

In the cloud region you pick on AWS, Azure, or Google Cloud. On-premise and air-gapped deployments are available for regulated environments. We sign a DPA before the first document moves.

Is there a free trial or sample environment?

Yes. You can send a batch of your own documents to our sandbox and see the extracted JSON back within a working day. No credit card, no sales call needed to run the first batch. If it works, we talk pricing. If it does not, you have real output to compare against whatever else you are evaluating.

How is DocPro priced?

Per document, with volume tiers. Custom agents are quoted separately as a one-time build plus ongoing usage. Enterprise plans bundle committed throughput, CMK, priority support, and a named solutions engineer. Contact sales for a quote tied to your actual document mix.

Send us 100 of your hardest documents.

Invoices, statements, KYC packs — whatever your team is stuck keying today. We run them through DocPro and send back the JSON. No deck, no sales call. Just output you can compare against what you have now.