Runs

A run represents a single execution of a test against a target AI agent. It tracks:

Execution status and progress
Individual scenario results
Aggregate metrics and pass rates
Timing information

Run Lifecycle

stateDiagram-v2
    [*] --> scheduled: Scheduled for later
    [*] --> pending: Test started
    pending --> running: Scenarios executing
    running --> completed: All scenarios graded
    running --> failed: Error occurred
    running --> canceled: User canceled

Note

Grading happens at the scenario run level (each scenario is graded individually), not the test run level. A test run moves directly from running to completed once all scenario runs finish.

Status Definitions

Status	Description
`scheduled`	Run scheduled for future execution
`pending`	Run created, waiting to start
`running`	Scenarios actively executing and being graded
`completed`	All scenarios finished
`failed`	Unrecoverable error occurred
`canceled`	User canceled the run

Scenario Run Statuses

Individual scenario runs have their own status progression:

Status	Description
`pending`	Waiting to execute
`running`	Conversation in progress
`grading`	Conversation complete, being graded
`passed`	Graded and passed
`failed`	Graded and failed
`error`	Execution error
`canceled`	Canceled

Viewing Results

Summary

After completion, the run shows aggregate metrics:

Metric	Description
Pass Rate	Percentage of scenarios that passed
Passed	Count of passing scenarios
Failed	Count of failing scenarios
Errors	Count of execution errors
Duration	Total run time

Detail View

Click any scenario to see:

Full conversation transcript
Per-criterion grading results
Evidence quotes from the transcript
Timing metrics