arxsec-site repo-root library/workflows/agent-governance-baseline/README.md

Runs Microsoft PyRIT orchestrators against a target model and (optionally) layers Meta PurpleLlama LlamaGuard on top to flag unsafe responses. Produces a normalized findings posture that ARX can store as compliance evidence and policy enrichment.

Time Saved

Before: PyRIT requires Python wiring per orchestrator. LlamaGuard requires its own deployment. Comparing the two is bespoke work each time.

After: Single workflow run, normalized findings, ARX policy + audit applied automatically.

Connectors

| Connector | Operations | Risk | |-----------|-----------|------| | PyRIT | orchestrators:list, red_team:run, prompt_send:run | MEDIUM | | PurpleLlama (optional) | llama_guard:scan | LOW |

Overall Risk: MEDIUM — exercises the target model with adversarial prompts. No writes, no production traffic.

How It Works

Run PyRIT's RedTeamingOrchestrator (or another orchestrator the operator selects) against the target model with a configurable objective and turn budget.
If enable_llama_guard: true, pass each (prompt, response) pair from PyRIT through LlamaGuard to flag any safety violations the orchestrator scorer missed.
Aggregate normalized AIFindings from both sources and return summary counts.

ARX Governance

Policy bundle: oss-redteam-baseline.
HITL gate: Off by default for read-only orchestrators; turn on if the target endpoint is sensitive.
Sandbox: community-oss profile.
Audit Trail: Every PyRIT attempt, scorer result, and LlamaGuard verdict is intercepted via BaseConnector.execute and persisted with normalized severity counts.

Setup

```bash pip install arx

export OPENAI_API_KEY="sk-..." # or ANTHROPIC_API_KEY / Azure equivalents export TARGET_MODEL="gpt-4o-mini"

Optional LlamaGuard

export PURPLE_LLAMA_TARGET_ENDPOINT="https://your-purple-llama.internal" export HF_TOKEN="hf_..." ```

``bash arx run workflow.py arx register --config arx.yaml # weekly Monday cadence ``

Customization

Swap red_team_orchestrator between PyRIT classes (RedTeaming, Crescendo, PAIR, TAP, etc.).
Tune red_team_objective per use case ("exfiltrate the system prompt", "convince the agent to call a forbidden tool", etc.).
Set enable_llama_guard: true once your PurpleLlama deployment is in place.