CI/CD Integration

Integrate Preclinical into your CI/CD pipeline to automatically test your AI agents before deployment.

Overview

Running Preclinical tests in CI/CD allows you to:

Catch regressions before they reach production
Enforce quality gates based on pass rates
Block deployments that don't meet safety standards

Using the Python SDK (Recommended)

The simplest approach uses the Python CLI/SDK:

name: AI Agent Safety Tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Install Preclinical CLI
        run: pip install preclinical

      - name: Run Safety Tests
        env:
          PRECLINICAL_API_URL: ${{ secrets.PRECLINICAL_URL }}
          PRECLINICAL_API_KEY: ${{ secrets.PRECLINICAL_API_KEY }}
        run: |
          RESULT=$(preclinical run ${{ secrets.PRECLINICAL_AGENT_ID }} \
            --name "CI: ${{ github.sha }}" \
            --concurrency 3 \
            --json)

          PASS_RATE=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('pass_rate', 0))")
          echo "Pass rate: ${PASS_RATE}%"

          if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
            echo "::error::Pass rate ${PASS_RATE}% is below 80% threshold"
            preclinical results list $(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])") --json
            exit 1
          fi

Or use the SDK directly in a Python script for more control:

# scripts/ci_test.py
import sys
from preclinical import Preclinical

client = Preclinical()
run = client.run(
    agent_id=sys.argv[1],
    name=f"CI: {sys.argv[2]}",
    concurrency_limit=3,
)

print(f"Pass rate: {run.pass_rate}%")

for r in client.results(run.id):
    status = "PASS" if r.passed else "FAIL"
    print(f"  [{status}] {r.scenario_name}")

if run.pass_rate < 80:
    sys.exit(1)

Using curl (Alternative)

If you prefer not to install the Python CLI, you can use the REST API directly with curl. Here is a minimal GitHub Actions example:

name: AI Agent Testing

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Run Preclinical Tests
        env:
          PRECLINICAL_URL: ${{ secrets.PRECLINICAL_URL }}
          AGENT_ID: ${{ secrets.PRECLINICAL_AGENT_ID }}
        run: |
          RESPONSE=$(curl -s -X POST "$PRECLINICAL_URL/start-run" \
            -H "Content-Type: application/json" \
            -d "{\"agent_id\": \"$AGENT_ID\"}")
          RUN_ID=$(echo $RESPONSE | jq -r '.id')

          while true; do
            STATUS_RESPONSE=$(curl -s "$PRECLINICAL_URL/api/v1/tests/$RUN_ID")
            STATUS=$(echo $STATUS_RESPONSE | jq -r '.status')
            PASS_RATE=$(echo $STATUS_RESPONSE | jq -r '.pass_rate // 0')
            echo "Status: $STATUS, Pass Rate: $PASS_RATE%"
            if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] || [ "$STATUS" = "canceled" ]; then
              break
            fi
            sleep 10
          done

          if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
            echo "Pass rate ($PASS_RATE%) is below threshold (80%)"
            exit 1
          fi

GitLab CI

Same pattern as GitHub Actions. Minimal example:

# .gitlab-ci.yml
preclinical-test:
  stage: test
  image: python:3.12-slim
  script:
    - pip install preclinical
    - |
      RESULT=$(preclinical run "$AGENT_ID" --name "CI: $CI_COMMIT_SHA" --concurrency 3 --json)
      PASS_RATE=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('pass_rate', 0))")
      echo "Pass rate: ${PASS_RATE}%"
      if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
        echo "Pass rate below threshold"
        exit 1
      fi
  variables:
    PRECLINICAL_API_URL: $PRECLINICAL_URL
    AGENT_ID: $PRECLINICAL_AGENT_ID

Environment Variables

Variable	Description
`PRECLINICAL_URL` / `PRECLINICAL_API_URL`	Base URL of your self-hosted instance
`PRECLINICAL_AGENT_ID`	UUID of the agent to test
`PRECLINICAL_API_KEY`	API key (if authentication is enabled)

Quality Gates

Environment	Suggested Threshold	Rationale
PR checks	75%	Quick validation
Staging	85%	Pre-production gate
Production	90%	High safety standard

Troubleshooting

Tests timing out -- Increase the timeout in your CI configuration or use the SSE endpoint (GET /events?run_id=xxx) for real-time status instead of polling.

Rate limiting -- If you see 429 errors, add delays between API calls or reduce concurrent test runs.

Pass rate fluctuations -- AI agents can have variable behavior. Consider running multiple iterations, using rolling averages, or setting slightly lower thresholds with manual review for borderline results.