Public documentation for governed AI labor
SDKs/Governance/Connectors
Arx / Docs / Arxsec AI Agent Governance Maturity Benchmark v1.0

Documentation

Arxsec AI Agent Governance Maturity Benchmark v1.0

Project-Agent-trust-merge / benchmark/README.md

Project-Agent-trust-merge repo-root benchmark/README.md

The first independent assessment of how today's leading AI agent frameworks handle the governance requirements enterprises actually need.

Author: Mershard Frierson Publisher: Arxsec · arxsec.io Deployment: benchmark.arxsec.io Status: v1.0 scaffold complete — scores pending primary-source verification

---

What This Is

A Next.js 14 static site that scores 10 AI agent frameworks across 10 governance dimensions derived verbatim from the Arxsec Security Agent Maturity Model. Every score requires traceable primary-source evidence before publication.

The benchmark uses the same 5-level scale as the Arxsec maturity model:

| Level | Name | Risk | |-------|------|------| | 0 | Undecided | Unknown | | 1 | Ungoverned | Very High | | 2 | Enforced | Managed | | 3 | Governed | Low | | 4 | Accountable | Minimal |

---

Updating Scores

Scores are the single source of truth in data/scores.json. The site, PDF report, and all pages render directly from this file.

Score record format

``json { "level": 2, "scores_pending": false, "evidence": "2-4 sentence justification, specific and verifiable. Cite the exact doc section.", "sources": ["https://docs.example.com/governance", "https://github.com/org/repo#governance"], "last_verified": "2026-04-23" } ``

To update a score

  1. Edit data/scores.json — find the framework slug, then the dimension ID
  2. Set "level" to 0–4 matching the Arxsec maturity level
  3. Write the "evidence" paragraph (2–4 sentences, specific, verifiable)
  4. Add at least one primary source URL to "sources"
  5. Set "last_verified" to today's date
  6. Set "scores_pending": false
  7. Run npm run build — the site regenerates
  8. Commit and push

Framework slugs

| Framework | Slug | |-----------|------| | LangChain / LangGraph | langchain-langgraph | | Microsoft AutoGen | microsoft-autogen | | CrewAI | crewai | | LlamaIndex Agents | llamaindex-agents | | Microsoft Semantic Kernel | semantic-kernel | | OpenAI Assistants API | openai-assistants | | Anthropic Claude + MCP | anthropic-claude-mcp | | Google Vertex AI Agent Builder | google-vertex-agent | | Dify | dify | | Haystack Agents | haystack-agents |

Dimension IDs

| Dimension | ID | |-----------|-----| | Agent Registry & Inventory | agent-registry | | Policy Enforcement | policy-enforcement | | Audit Trail | audit-trail | | Human-in-the-Loop (HITL) Gates | hitl-gates | | CISO Visibility & Dashboard | ciso-visibility | | Automated Compliance Generation | automated-compliance | | Vendor Security Review Speed | vendor-review | | Behavioral Drift Detection | drift-detection | | Deployment-Time Governance | deployment-governance | | Continuous Compliance | continuous-compliance |

---

Updating Dimension Weights

Edit data/dimensions.json. Each dimension has a weight field (default 1.0). The Overall Maturity Index is a weighted average.

---

Adding or Swapping a Framework

  1. Add an entry to data/frameworks.json with all required fields
  2. Add a score block to data/scores.json with all 10 dimension entries set to "scores_pending": true
  3. Optionally add an MDX narrative file at content/frameworks/<slug>.mdx
  4. Run npm run build

---

Local Development

```bash cd benchmark npm install npm run dev

→ http://localhost:3000

```

---

Building for Deployment

```bash npm run build

Output: ./out/ (static export, deploy to Vercel or any CDN)

```

Vercel deployment

Point Vercel to the benchmark/ directory with:

  • Build command: npm run build
  • Output directory: out
  • Framework preset: Next.js

---

Pages

| Route | Description | |-------|-------------| | / | Landing page — hero, benchmark matrix, headline findings, methodology summary | | /methodology | Full model — 5 levels, 10 dimensions, scoring rubric, citation block | | /framework/[slug] | Per-framework scorecard — radar chart, strengths, gaps, per-dimension breakdown | | /compare | Multi-select comparison tool — overlaid radar, side-by-side table | | /report | PDF report download — generated from data/scores.json |

---

Citation

Frierson, M. (2026). *Arxsec AI Agent Governance Maturity Benchmark v1.0.* Arxsec. https://benchmark.arxsec.io

``bibtex @techreport{frierson2026arxsec, author = {Frierson, Mershard}, title = {{Arxsec AI Agent Governance Maturity Benchmark v1.0}}, institution = {Arxsec}, year = {2026}, month = apr, url = {https://benchmark.arxsec.io} } ``

---

Submit a Correction

Framework vendors and maintainers may submit evidence-backed corrections to mershard@arxsec.io. Include: framework slug, dimension ID, proposed score, evidence paragraph, and at least one primary-source URL.