Documentation
Agent Governance Baseline
Project-Agent / library/workflows/agent-governance-baseline/README.md
Runs Microsoft PyRIT orchestrators against a target model and (optionally) layers Meta PurpleLlama LlamaGuard on top to flag unsafe responses. Produces a normalized findings posture that ARX can store as compliance evidence and policy enrichment.
Time Saved
Before: PyRIT requires Python wiring per orchestrator. LlamaGuard requires its own deployment. Comparing the two is bespoke work each time.
After: Single workflow run, normalized findings, ARX policy + audit applied automatically.
Connectors
| Connector | Operations | Risk | |-----------|-----------|------| | PyRIT | orchestrators:list, red_team:run, prompt_send:run | MEDIUM | | PurpleLlama (optional) | llama_guard:scan | LOW |
Overall Risk: MEDIUM — exercises the target model with adversarial prompts. No writes, no production traffic.
How It Works
- Run PyRIT's
RedTeamingOrchestrator(or another orchestrator the operator selects) against the target model with a configurable objective and turn budget. - If
enable_llama_guard: true, pass each (prompt, response) pair from PyRIT through LlamaGuard to flag any safety violations the orchestrator scorer missed. - Aggregate normalized
AIFindings from both sources and return summary counts.
ARX Governance
- Policy bundle:
oss-redteam-baseline. - HITL gate: Off by default for read-only orchestrators; turn on if the target endpoint is sensitive.
- Sandbox:
community-ossprofile. - Audit Trail: Every PyRIT attempt, scorer result, and LlamaGuard verdict is intercepted via
BaseConnector.executeand persisted with normalized severity counts.
Setup
```bash pip install arx
export OPENAI_API_KEY="sk-..." # or ANTHROPIC_API_KEY / Azure equivalents export TARGET_MODEL="gpt-4o-mini"
Optional LlamaGuard
export PURPLE_LLAMA_TARGET_ENDPOINT="https://your-purple-llama.internal" export HF_TOKEN="hf_..." ```
``bash arx run workflow.py arx register --config arx.yaml # weekly Monday cadence ``
Customization
- Swap
red_team_orchestratorbetween PyRIT classes (RedTeaming, Crescendo, PAIR, TAP, etc.). - Tune
red_team_objectiveper use case ("exfiltrate the system prompt", "convince the agent to call a forbidden tool", etc.). - Set
enable_llama_guard: trueonce your PurpleLlama deployment is in place.