PDF Extraction Agent Playbook with Claude Code
A high-signal playbook for extracting structured data from PDFs with a run-scoped AI agent workflow.
Audience: Document-heavy SaaS teams handling invoices, contracts, or compliance packets.
The problem
PDF extraction workflows break when prompts, files, schemas, and validation are scattered across scripts.
Implementation path
Put extraction instructions in SKILL.md, attach PDFs as run inputs, validate output against a schema, and return both JSON and review artifacts.
Tradeoffs and failure modes
Schema-first extraction narrows the task, but it makes the result usable by downstream systems.
SKILL.md starter
You extract structured fields from PDFs.
Read only /skill/.argo/inputs.
Return argo.result.v1 with body.type = "pdf_extraction".
If confidence is low, include review_notes and attach a CSV artifact.
Run this on Argo