Skip to content

CI/CD Integration

Integrate Preclinical into your CI/CD pipeline to automatically test your AI agents before deployment.

Overview

Running Preclinical tests in CI/CD allows you to:

  • Catch regressions before they reach production
  • Enforce quality gates based on pass rates
  • Block deployments that don't meet safety standards

The simplest approach uses the Python CLI/SDK:

name: AI Agent Safety Tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Install Preclinical CLI
        run: pip install preclinical

      - name: Run Safety Tests
        env:
          PRECLINICAL_API_URL: ${{ secrets.PRECLINICAL_URL }}
          PRECLINICAL_API_KEY: ${{ secrets.PRECLINICAL_API_KEY }}
        run: |
          RESULT=$(preclinical run ${{ secrets.PRECLINICAL_AGENT_ID }} \
            --name "CI: ${{ github.sha }}" \
            --concurrency 3 \
            --json)

          PASS_RATE=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('pass_rate', 0))")
          echo "Pass rate: ${PASS_RATE}%"

          if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
            echo "::error::Pass rate ${PASS_RATE}% is below 80% threshold"
            preclinical results list $(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])") --json
            exit 1
          fi

Or use the SDK directly in a Python script for more control:

# scripts/ci_test.py
import sys
from preclinical import Preclinical

client = Preclinical()
run = client.run(
    agent_id=sys.argv[1],
    name=f"CI: {sys.argv[2]}",
    concurrency_limit=3,
)

print(f"Pass rate: {run.pass_rate}%")

for r in client.results(run.id):
    status = "PASS" if r.passed else "FAIL"
    print(f"  [{status}] {r.scenario_name}")

if run.pass_rate < 80:
    sys.exit(1)

Using curl (Alternative)

If you prefer not to install the Python CLI, you can use the REST API directly with curl. Here is a minimal GitHub Actions example:

name: AI Agent Testing

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Run Preclinical Tests
        env:
          PRECLINICAL_URL: ${{ secrets.PRECLINICAL_URL }}
          AGENT_ID: ${{ secrets.PRECLINICAL_AGENT_ID }}
        run: |
          RESPONSE=$(curl -s -X POST "$PRECLINICAL_URL/start-run" \
            -H "Content-Type: application/json" \
            -d "{\"agent_id\": \"$AGENT_ID\"}")
          RUN_ID=$(echo $RESPONSE | jq -r '.id')

          while true; do
            STATUS_RESPONSE=$(curl -s "$PRECLINICAL_URL/api/v1/tests/$RUN_ID")
            STATUS=$(echo $STATUS_RESPONSE | jq -r '.status')
            PASS_RATE=$(echo $STATUS_RESPONSE | jq -r '.pass_rate // 0')
            echo "Status: $STATUS, Pass Rate: $PASS_RATE%"
            if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] || [ "$STATUS" = "canceled" ]; then
              break
            fi
            sleep 10
          done

          if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
            echo "Pass rate ($PASS_RATE%) is below threshold (80%)"
            exit 1
          fi

GitLab CI

Same pattern as GitHub Actions. Minimal example:

# .gitlab-ci.yml
preclinical-test:
  stage: test
  image: python:3.12-slim
  script:
    - pip install preclinical
    - |
      RESULT=$(preclinical run "$AGENT_ID" --name "CI: $CI_COMMIT_SHA" --concurrency 3 --json)
      PASS_RATE=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('pass_rate', 0))")
      echo "Pass rate: ${PASS_RATE}%"
      if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
        echo "Pass rate below threshold"
        exit 1
      fi
  variables:
    PRECLINICAL_API_URL: $PRECLINICAL_URL
    AGENT_ID: $PRECLINICAL_AGENT_ID

Environment Variables

Variable Description
PRECLINICAL_URL / PRECLINICAL_API_URL Base URL of your self-hosted instance
PRECLINICAL_AGENT_ID UUID of the agent to test
PRECLINICAL_API_KEY API key (if authentication is enabled)

Quality Gates

Environment Suggested Threshold Rationale
PR checks 75% Quick validation
Staging 85% Pre-production gate
Production 90% High safety standard

Troubleshooting

Tests timing out -- Increase the timeout in your CI configuration or use the SSE endpoint (GET /events?run_id=xxx) for real-time status instead of polling.

Rate limiting -- If you see 429 errors, add delays between API calls or reduce concurrent test runs.

Pass rate fluctuations -- AI agents can have variable behavior. Consider running multiple iterations, using rolling averages, or setting slightly lower thresholds with manual review for borderline results.