Architecture

Infrastructure

Component	Description
App	Hono API server + Vite React frontend (unified build)
Worker	pg-boss job queue (runs in the same server process)
Database	Postgres 16
Containers	2 total: Postgres, App (API + frontend)

All services run via docker compose up.

+───────────────────────+     +────────────────+
│   App (API + UI)      │────>│  PostgreSQL    │
│  Hono + Vite+React    │     │   + pg-boss    │
│  :3000                │     │  :5432         │
+───────────────────────+     +────────────────+
       │
       │  SSE (/events)
       │  scenario worker (in-process)

LangGraph Execution Flow

Each scenario runs as a pg-boss job that invokes two LangGraph StateGraphs:

POST /start-run → pg-boss job → testerGraph.invoke() → graderGraph.invoke() → finalize

Tester graph (server/src/graphs/tester-graph.ts):

planAttack → connectProvider → executeTurn ⇄ generateNextMessage → coverageReview

Grader graph (server/src/graphs/grader-graph.ts):

gradeTranscript → verifyEvidence → consistencyAudit → computeScore

State definitions use LangGraph Annotation.Root in server/src/graphs/tester-state.ts and grader-state.ts. Per-phase skill injection is handled by server/src/graphs/skill-loaders.ts.

Run Lifecycle

User starts a run from the UI or API with an agent_id and run params.
API creates test_runs + scenario_runs rows and enqueues pg-boss jobs.
Worker executes each scenario: testerGraph (plan, turn loop, coverage review) then graderGraph (grade, verify, audit, score).
Frontend receives updates via SSE (PG LISTEN/NOTIFY) which invalidates TanStack Query caches.

Provider Routing

Provider	Executor	Description
`openai`	In-process	OpenAI-compatible chat completions target
`vapi`	In-process	Vapi chat API target
`browser`	In-process	Browser Use Cloud target
`livekit`	In-process	LiveKit WebRTC target
`pipecat`	In-process	Pipecat Cloud target (LiveKit transport)

Provider-target parity is enforced via target-agents/registry.json and server/src/__tests__/provider-targets.test.ts, so every server provider must ship with a runnable target agent path.

Tester and Grader Agents

Tester -- Generates adversarial patient messages using a LangGraph StateGraph with five nodes: planning, provider connection, turn execution, message generation, and coverage review. Runs on TESTER_MODEL.

Grader -- Scores transcripts against rubric criteria using a LangGraph StateGraph with four nodes: grading, evidence verification, consistency audit, and score computation. Uses three-level grading (MET / PARTIALLY MET / NOT MET). Runs on GRADER_MODEL.

Both use server-side OpenAI credentials. Target agent config is only used for the system under test.

Key Directories

Directory	Description
`server/`	Hono API + pg-boss worker
`server/src/graphs/`	LangGraph StateGraphs (tester, grader), state schemas, skill loaders
`server/src/shared/`	Prompts, schemas, skills, attack vectors
`server/src/providers/`	Provider implementations
`frontend/`	Vite + React + TanStack Query
`tests/`	Vitest API tests

Turn Limits

DEFAULT_MAX_TURNS=11
MIN_MAX_TURNS=5
MAX_MAX_TURNS=15