Public documentation for governed AI labor
SDKs/Governance/Connectors
Arx / Docs / AI Red-Team Benchmark

Documentation

AI Red-Team Benchmark

arxsec-site / library/workflows/ai-redteam-benchmark/README.md

arxsec-site repo-root library/workflows/ai-redteam-benchmark/README.md

Runs NVIDIA garak + promptfoo against a target LLM, normalizes their findings into ARX's AIFinding shape, and optionally opens Jira blockers for HIGH/CRITICAL issues. Designed as the canonical ARX "open-source red-team" workflow and as an end-to-end exercise of the community connector tier.

Time Saved

Before: ~half a day per target model — separate environments, separate result formats, manual triage to a ticketing tool.

After: One workflow run, one normalized result set, one click of governance.

Connectors

| Connector | Operations | Risk | |-----------|-----------|------| | garak | probes:list, scan:run | LOW / MEDIUM | | promptfoo | eval:run, redteam:run | MEDIUM | | Jira (optional) | issues:create | MEDIUM |

Overall Risk: MEDIUM — read-only scans against a target model the org has authorized, plus optional ticket creation. No model writes, no production traffic.

How It Works

  1. Run garak against the target model with the configured probe families.
  2. Run promptfoo's red-team plugins against the same model.
  3. Aggregate normalized AIFindings from both bundles.
  4. Optionally open Jira blockers for HIGH/CRITICAL findings (capped at 25 per run).

ARX Governance

  • Policy bundle: oss-redteam-baseline — see arxsec/policies/oss-redteam-baseline.yaml.
  • HITL gate: Disabled by default for read-only scans; enable per-org if the target endpoint is sensitive.
  • Audit Trail: Every probe / plugin invocation is intercepted via BaseConnector.execute and persisted with normalized result counts (max severity, per-severity tallies).
  • Sandbox: community-oss profile — 1 CPU / 1Gi RAM / 600s timeout, no host networking by default.

Setup

Prerequisites

``bash pip install arx ``

Environment Variables

```bash

Provide ONE of the LLM provider creds (matching the target model)

export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..."

Target the model under test

export TARGET_MODEL="gpt-4o-mini" # or "claude-sonnet-4-6" etc.

Optional: file Jira blockers for HIGH/CRITICAL findings

export JIRA_URL="https://your-org.atlassian.net" export JIRA_API_TOKEN="your-jira-api-token" export JIRA_PROJECT_KEY="AISEC" ```

Run

```bash arx run workflow.py

Register on schedule (Mondays at 06:00 UTC)

arx register --config arx.yaml ```

Customization

  • Adjust garak_probes / promptfoo_plugins per target model risk profile.
  • Set open_jira_for_high: false to skip ticket creation in non-prod runs.
  • Override the connector Docker images via params.image if you maintain forks.