MCP Tool Evaluation with Claude Code: Artifact Delivery
A production playbook for MCP tool evaluation in cross-industry operations using Claude Code: artifact delivery, run-scoped inputs, logs, typed results, and artifacts.
Audience: AI platform teams adopting MCP
The problem
AI platform teams adopting MCP need MCP tool evaluation to run repeatedly against tool definitions, auth policy, traces, and test cases. In cross-industry operations, the pain is not one good answer; it is repeatability, auditability, exception handling, and evidence that survives handoff.
Implementation path
Require Claude Code to write customer-visible files under /skill/output/artifacts, validate filenames and sizes, then return signed artifact metadata in argo.result.v1.
Tradeoffs and failure modes
Artifact policy constrains file output, but customers receive files that are durable, typed, and safe to download. For MCP tool evaluation, the practical test is whether a second run can be debugged, retried, and consumed by a product without reading the raw agent transcript.
Artifact manifest
artifacts:
- mcp-tool-evaluation-summary.md
- mcp-tool-evaluation-evidence.csv
- mcp-tool-evaluation-review.json
signed_urls: true
retention: org_policy
Run this on Argo