Architecture
Infrastructure
| Component | Description |
|---|---|
| App | Hono API server + Vite React frontend (unified build) |
| Worker | pg-boss job queue (runs in the same server process) |
| Database | Postgres 16 |
| Containers | 2 total: Postgres, App (API + frontend) |
All services run via docker compose up.
+───────────────────────+ +────────────────+
│ App (API + UI) │────>│ PostgreSQL │
│ Hono + Vite+React │ │ + pg-boss │
│ :3000 │ │ :5432 │
+───────────────────────+ +────────────────+
│
│ SSE (/events)
│ scenario worker (in-process)
LangGraph Execution Flow
Each scenario runs as a pg-boss job that invokes two LangGraph StateGraphs:
Tester graph (server/src/graphs/tester-graph.ts):
Grader graph (server/src/graphs/grader-graph.ts):
State definitions use LangGraph Annotation.Root in server/src/graphs/tester-state.ts and grader-state.ts. Per-phase skill injection is handled by server/src/graphs/skill-loaders.ts.
Run Lifecycle
- User starts a run from the UI or API with an
agent_idand run params. - API creates
test_runs+scenario_runsrows and enqueues pg-boss jobs. - Worker executes each scenario: testerGraph (plan, turn loop, coverage review) then graderGraph (grade, verify, audit, score).
- Frontend receives updates via SSE (PG LISTEN/NOTIFY) which invalidates TanStack Query caches.
Provider Routing
| Provider | Executor | Description |
|---|---|---|
openai |
In-process | OpenAI-compatible chat completions target |
vapi |
In-process | Vapi chat API target |
browser |
In-process | Browser Use Cloud target |
livekit |
In-process | LiveKit WebRTC target |
pipecat |
In-process | Pipecat Cloud target (LiveKit transport) |
Provider-target parity is enforced via target-agents/registry.json and server/src/__tests__/provider-targets.test.ts, so every server provider must ship with a runnable target agent path.
Tester and Grader Agents
Tester -- Generates adversarial patient messages using a LangGraph StateGraph with five nodes: planning, provider connection, turn execution, message generation, and coverage review. Runs on TESTER_MODEL.
Grader -- Scores transcripts against rubric criteria using a LangGraph StateGraph with four nodes: grading, evidence verification, consistency audit, and score computation. Uses three-level grading (MET / PARTIALLY MET / NOT MET). Runs on GRADER_MODEL.
Both use server-side OpenAI credentials. Target agent config is only used for the system under test.
Key Directories
| Directory | Description |
|---|---|
server/ |
Hono API + pg-boss worker |
server/src/graphs/ |
LangGraph StateGraphs (tester, grader), state schemas, skill loaders |
server/src/shared/ |
Prompts, schemas, skills, attack vectors |
server/src/providers/ |
Provider implementations |
frontend/ |
Vite + React + TanStack Query |
tests/ |
Vitest API tests |
Turn Limits
DEFAULT_MAX_TURNS=11MIN_MAX_TURNS=5MAX_MAX_TURNS=15