Documentation
Arxsec AI Agent Governance Maturity Benchmark v1.0
Project-Agent-trust-merge / benchmark/README.md
The first independent assessment of how today's leading AI agent frameworks handle the governance requirements enterprises actually need.
Author: Mershard Frierson Publisher: Arxsec · arxsec.io Deployment: benchmark.arxsec.io Status: v1.0 scaffold complete — scores pending primary-source verification
---
What This Is
A Next.js 14 static site that scores 10 AI agent frameworks across 10 governance dimensions derived verbatim from the Arxsec Security Agent Maturity Model. Every score requires traceable primary-source evidence before publication.
The benchmark uses the same 5-level scale as the Arxsec maturity model:
| Level | Name | Risk | |-------|------|------| | 0 | Undecided | Unknown | | 1 | Ungoverned | Very High | | 2 | Enforced | Managed | | 3 | Governed | Low | | 4 | Accountable | Minimal |
---
Updating Scores
Scores are the single source of truth in data/scores.json. The site, PDF report, and all pages render directly from this file.
Score record format
``json { "level": 2, "scores_pending": false, "evidence": "2-4 sentence justification, specific and verifiable. Cite the exact doc section.", "sources": ["https://docs.example.com/governance", "https://github.com/org/repo#governance"], "last_verified": "2026-04-23" } ``
To update a score
- Edit
data/scores.json— find the framework slug, then the dimension ID - Set
"level"to 0–4 matching the Arxsec maturity level - Write the
"evidence"paragraph (2–4 sentences, specific, verifiable) - Add at least one primary source URL to
"sources" - Set
"last_verified"to today's date - Set
"scores_pending": false - Run
npm run build— the site regenerates - Commit and push
Framework slugs
| Framework | Slug | |-----------|------| | LangChain / LangGraph | langchain-langgraph | | Microsoft AutoGen | microsoft-autogen | | CrewAI | crewai | | LlamaIndex Agents | llamaindex-agents | | Microsoft Semantic Kernel | semantic-kernel | | OpenAI Assistants API | openai-assistants | | Anthropic Claude + MCP | anthropic-claude-mcp | | Google Vertex AI Agent Builder | google-vertex-agent | | Dify | dify | | Haystack Agents | haystack-agents |
Dimension IDs
| Dimension | ID | |-----------|-----| | Agent Registry & Inventory | agent-registry | | Policy Enforcement | policy-enforcement | | Audit Trail | audit-trail | | Human-in-the-Loop (HITL) Gates | hitl-gates | | CISO Visibility & Dashboard | ciso-visibility | | Automated Compliance Generation | automated-compliance | | Vendor Security Review Speed | vendor-review | | Behavioral Drift Detection | drift-detection | | Deployment-Time Governance | deployment-governance | | Continuous Compliance | continuous-compliance |
---
Updating Dimension Weights
Edit data/dimensions.json. Each dimension has a weight field (default 1.0). The Overall Maturity Index is a weighted average.
---
Adding or Swapping a Framework
- Add an entry to
data/frameworks.jsonwith all required fields - Add a score block to
data/scores.jsonwith all 10 dimension entries set to"scores_pending": true - Optionally add an MDX narrative file at
content/frameworks/<slug>.mdx - Run
npm run build
---
Local Development
```bash cd benchmark npm install npm run dev
→ http://localhost:3000
```
---
Building for Deployment
```bash npm run build
Output: ./out/ (static export, deploy to Vercel or any CDN)
```
Vercel deployment
Point Vercel to the benchmark/ directory with:
- Build command:
npm run build - Output directory:
out - Framework preset: Next.js
---
Pages
| Route | Description | |-------|-------------| | / | Landing page — hero, benchmark matrix, headline findings, methodology summary | | /methodology | Full model — 5 levels, 10 dimensions, scoring rubric, citation block | | /framework/[slug] | Per-framework scorecard — radar chart, strengths, gaps, per-dimension breakdown | | /compare | Multi-select comparison tool — overlaid radar, side-by-side table | | /report | PDF report download — generated from data/scores.json |
---
Citation
Frierson, M. (2026). *Arxsec AI Agent Governance Maturity Benchmark v1.0.* Arxsec. https://benchmark.arxsec.io
``bibtex @techreport{frierson2026arxsec, author = {Frierson, Mershard}, title = {{Arxsec AI Agent Governance Maturity Benchmark v1.0}}, institution = {Arxsec}, year = {2026}, month = apr, url = {https://benchmark.arxsec.io} } ``
---
Submit a Correction
Framework vendors and maintainers may submit evidence-backed corrections to mershard@arxsec.io. Include: framework slug, dimension ID, proposed score, evidence paragraph, and at least one primary-source URL.