The real event schema this table generalizes
Managed Agents outcome grading is not a black box — it emits a typed sequence of spans as the grader runs: span.outcome_evaluation_start, one or more span.outcome_evaluation_ongoing events while the grader works, and a terminal span.outcome_evaluation_end carrying the final verdict. Per platform.claude.com/docs/en/managed-agents/define-outcomes.md: "the harness automatically provisions a grader to evaluate the artifact against a rubric... the grader returns an explanation summarizing which criteria passed or failed."
The result enum, unmodified
The terminal event's result is one of five values, and this table's result column is a CHECK constraint against exactly that enum, unmodified:
| result | meaning |
|---|---|
satisfied | The grader judged the artifact met the rubric. |
needs_revision | The grader found gaps; another iteration is expected. |
max_iterations_reached | The iteration budget ran out before the rubric was satisfied. |
failed | The grader could not evaluate, or evaluation itself failed. |
interrupted | The evaluation run was stopped before completion. |
Why iteration is a first-class column
The real event schema is iterative by design — a needs_revision result feeds back into another attempt, tracked by an incrementing iteration counter, until either satisfied or max_iterations_reached is reached. This table's iteration column (default 0) exists so a future re-grading pass over the same target_site can be stored as a new row rather than overwriting history — every prior verdict stays queryable.
Explanation, not just a score
The schema deliberately pairs a pass/fail-shaped result with free-text: "the grader returns an explanation summarizing which criteria passed or failed." This table mirrors that exactly — explanation is a required TEXT NOT NULL column, never optional, alongside the numeric criteria_passed / criteria_total pair. A bare 15/15 tells you the shape of the verdict; the explanation is what makes it checkable by a human six months from now.
The pattern this follows
This is the same move subagenttasks.com made generalizing the literal TaskCreate/TaskUpdate/TaskList tool schema, and subagentcitations.com made generalizing the real Citations API response shape. Take a schema Anthropic already ships in production, map columns to it field-for-field, and seed with real rows instead of synthetic ones.