MCP Tool Evaluation with Claude Code: Result JSON Schema

A production playbook for MCP tool evaluation in cross-industry operations using Claude Code: result json schema, run-scoped inputs, logs, typed results, and artifacts.

Audience: AI platform teams adopting MCP

The problem

AI platform teams adopting MCP need MCP tool evaluation to run repeatedly against tool definitions, auth policy, traces, and test cases. In cross-industry operations, the pain is not one good answer; it is repeatability, auditability, exception handling, and evidence that survives handoff.

Implementation path

Define the outer result contract once, let the MCP tool evaluation skill own body.data, and reject terminal output that does not match the expected schema.

Tradeoffs and failure modes

Schema enforcement adds upfront design work, but removes prompt parsing from the product surface. For MCP tool evaluation, the practical test is whether a second run can be debugged, retried, and consumed by a product without reading the raw agent transcript.

Result shape

{
  "schema_version": "argo.result.v1",
  "summary": "MCP tool evaluation completed",
  "body": { "type": "mcp_tool_evaluation", "data": {}, "exceptions": [] },
  "artifacts": []
}

Run this on Argo