PDF Extraction Agent Playbook with Claude Code

A high-signal playbook for extracting structured data from PDFs with a run-scoped AI agent workflow.

Audience: Document-heavy SaaS teams handling invoices, contracts, or compliance packets.

The problem

PDF extraction workflows break when prompts, files, schemas, and validation are scattered across scripts.

Implementation path

Put extraction instructions in SKILL.md, attach PDFs as run inputs, validate output against a schema, and return both JSON and review artifacts.

Tradeoffs and failure modes

Schema-first extraction narrows the task, but it makes the result usable by downstream systems.

SKILL.md starter

You extract structured fields from PDFs.
Read only /skill/.argo/inputs.
Return argo.result.v1 with body.type = "pdf_extraction".
If confidence is low, include review_notes and attach a CSV artifact.

Run this on Argo