Documentation
Adversarial-Prompt Regression Suite (scaffold)
arxsec-app / tests/llm/red_team/README.md
This directory holds the adversarial-prompt regression suite for ARX's first-party AI surface — the LLM router (app/llm/) and related code. Per docs/governance/ai-risk-policy.md §5.1 mandatory practice 5, every first-party AI feature must have an eval suite registered here (or in app/llm/eval/), and per mandatory practice 6, a pre-deploy gate must run them.
This is a scaffold. The corpus and the runner shape are real and exercised by test_red_team_corpus.py. What is intentionally NOT here yet:
- A large corpus of vendor-targeted jailbreaks. Foundation-model
vendors maintain those internally; ARX's corpus focuses on router-specific failure modes (scope escape, service-internal attribution, failover-induced disclosure, malformed-response handling) plus a small example set of vendor-targeted cases to document the pattern.
- Live runs against real foundation-model providers. CI runs the
corpus against mocked providers (deterministic, free). A separate scheduled job — not in this directory — runs the same corpus against live providers periodically.
- CI gating. The pytest entrypoint runs today; the deploy-gate hook
is a follow-up.
Corpus structure
corpus.json is the source of truth. Each case has:
id— stable unique identifier.category— one of:jailbreak— try to bypass the model's safety policy.injection— try to make the model take an action the user did not intend.scope-bypass— try to cause the router to call a tier or
provider the caller is not authorized for.
attribution— try to invoke the router in a way that produces
no audit row (the canonical example: service-internal call without agent_id, see router.py lines 206–210).
crashy— known-malformed inputs that should fail gracefully
rather than crash the worker.
severity—BLOCKER(CI-blocking) orWARN(alert-only).description— what this case is actually testing.input— the request payload or scenario.expectation— human-readable description of "pass."assertion— machine-readable predicate (currently a small DSL —
see _eval_case in test_red_team_corpus.py).
Adding a case
- Open
corpus.jsonand add a new case object to thecasesarray. - Add a regression test for any case where the assertion DSL is
insufficient.
- If the case is for a class not yet covered, document it in this
README first.
- Run
pytest tests/llm/red_team/locally. - Submit. The case becomes part of the pre-deploy gate once the CI
wiring lands.
Tracker rows
This scaffold partially closes:
- SAF.4 (adversarial-prompt regression suite) — directory and
runner exist, real corpus growth is incremental.
- SAF.5 (pre-release eval gate) — pytest entrypoint runs today,
CI gate is a follow-up.
A row moves to Met when:
- The corpus covers every threat class enumerated in
docs/security/threat-model-llm-router.md (T-1 through T-8) with at least one case each, and
- The CI pipeline blocks merges when any BLOCKER case fails.