Quickstart
Run your first test against a healthcare AI agent. You'll:
- Set up the platform
- Add an agent
- Start a test run
- Review results
Prefer the terminal or an AI assistant?
You can also use the CLI or agent skills instead of the UI. Install via pip install preclinical or npx skills add Mentat-Lab/preclinical.
Prerequisites
- Docker Desktop (or Docker Engine + Docker Compose)
- An API key:
OPENAI_API_KEYorANTHROPIC_API_KEY(see.env.example) BROWSER_USE_API_KEYif you want to run browser-based tests through Browser Use Cloud
Setup
git clone https://github.com/Mentat-Lab/preclinical.git
cd preclinical
make setup # copies .env.example and starts services
Edit .env with your API key, then make restart to pick up changes.
Daily workflow:
| Command | Description |
|---|---|
make up |
Start services |
make down |
Stop services |
make logs |
Tail logs |
make status |
Check health |
make clean |
Nuke volumes, start fresh |
Open http://localhost:3000 to access the UI.
Step 1: Add an Agent
Agents are provider configurations Preclinical can execute against. In the UI, go to Agents and click New Agent.
{
"provider": "vapi",
"name": "My Vapi Agent",
"config": {
"assistant_id": "asst_xxxxx",
"api_key": "your-vapi-api-key"
}
}
Get your API key from the Vapi Dashboard.
{
"provider": "openai",
"name": "My OpenAI-Compatible Agent",
"config": {
"base_url": "https://api.openai.com/v1",
"api_key": "your-api-key",
"target_model": "gpt-4o-mini"
}
}
Works with any OpenAI-compatible endpoint.
{
"provider": "browser",
"name": "My Browser Agent",
"config": {
"url": "https://your-chat-app.example.com",
"profile_id": "prof_123"
}
}
Tests web chat UIs through Browser Use Cloud. Reuse the same profile_id across repeated runs on the same domain so Browser Use preserves auth and browser state.
{
"provider": "livekit",
"name": "My LiveKit Agent",
"config": {
"url": "wss://your-project.livekit.cloud",
"api_key": "APIxxxxx",
"api_secret": "xxxxx",
"dispatch_mode": "auto",
"agent_name": "healthcare-agent"
}
}
Get credentials from the LiveKit Cloud Dashboard.
Step 2: Start a Test Run
- Open an agent detail page from Agents.
- Click New Test Run.
- Select scenarios from the 60 TriageBench cases (or use defaults).
- Set max turns.
- Click Start Test Run.
Note
Max turns are clamped server-side to configured bounds (default range: 5-7).
Step 3: Monitor Execution
Scenarios execute automatically based on your agent's provider. You'll see live status updates for:
- Scenario progress and state transitions
- Transcript updates as conversations complete
- Grading results as they finalize
Step 4: Review Results
When the run finalizes, review:
- Summary Dashboard: pass/fail counts and breakdowns
- Scenario Results: per-scenario status and grading output
- Transcripts: full attacker vs target conversation logs
- Criterion Decisions: MET/PARTIALLY MET/NOT MET with rationale and evidence