Documentation
Launch Assets for Benchmark v1.0
Project-Agent-trust-merge / benchmark/LAUNCH_ASSETS.md
LinkedIn Announcement Post (Draft)
---
We just published the first version of the Arxsec AI Agent Governance Maturity Benchmark.
It scores 10 of the most widely deployed AI agent frameworks — LangGraph, AutoGen, CrewAI, Semantic Kernel, Claude + MCP, OpenAI Assistants, Vertex AI Agent Builder, LlamaIndex, Dify, and Haystack — against the five-level governance maturity model we published at arxsec.io/maturity.
The benchmark is built on a single question: how well does each framework support governance natively — not with external platforms bolted on after the fact, but as a first-class capability?
We scored 10 dimensions: agent registry, policy enforcement, audit trail, human-in-the-loop gates, CISO visibility, automated compliance generation, vendor review speed, behavioral drift detection, deployment-time governance, and continuous compliance.
Every score requires a primary-source evidence statement — official documentation, GitHub, or published changelogs. No scores are published without a traceable citation. Framework vendors can submit corrections.
This is a v1.0. Scores are provisional and will be updated quarterly. We are making the scoring rubric, data, and methodology fully public — so the community can hold us accountable the same way we're holding the frameworks accountable.
If you're evaluating AI agent frameworks for enterprise deployment, this is the governance lens that's been missing.
benchmark.arxsec.io
#AIAgents #EnterpriseAI #AIGovernance #SecurityEngineering #Arxsec
---
Research Sources for Scoring Each Framework
The following primary sources should be consulted when assigning verified scores. Organized by dimension.
1. Agent Registry & Inventory
- LangSmith (LangChain): https://docs.smith.langchain.com/
- AutoGen Studio agent management: https://microsoft.github.io/autogen/docs/autogen-studio
- CrewAI Enterprise dashboard: https://docs.crewai.com/enterprise/
- OpenAI Assistants list/retrieve: https://platform.openai.com/docs/api-reference/assistants
- Google Vertex AI Agent Builder management: https://cloud.google.com/vertex-ai/generative-ai/docs/agent-builder/manage-agents
- Dify app management: https://docs.dify.ai/guides/application-orchestrate
2. Policy Enforcement
- Semantic Kernel function invocation filters: https://learn.microsoft.com/en-us/semantic-kernel/concepts/filters
- LangGraph conditional edges / interrupt: https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/
- MCP authorization spec: https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/authorization/
- Vertex AI IAM and access controls: https://cloud.google.com/vertex-ai/docs/general/iam
- Haystack pipeline guards: https://docs.haystack.deepset.ai/docs/guardrails
3. Audit Trail
- LangSmith tracing: https://docs.smith.langchain.com/tracing
- OpenAI run lifecycle: https://platform.openai.com/docs/assistants/how-it-works/run-lifecycle
- Google Cloud Audit Logs: https://cloud.google.com/logging/docs/audit
- Haystack tracing: https://docs.haystack.deepset.ai/docs/tracing
- AutoGen logging: https://microsoft.github.io/autogen/docs/autogen-ext/events/
4. Human-in-the-Loop (HITL) Gates
- LangGraph HITL: https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/
- AutoGen human proxy: https://microsoft.github.io/autogen/docs/tutorial/human-in-the-loop
- MCP sampling (human review hooks): https://spec.modelcontextprotocol.io/specification/2024-11-05/client/sampling/
- Semantic Kernel function choice behavior: https://learn.microsoft.com/en-us/semantic-kernel/concepts/ai-services/chat-completion/function-calling/
5. CISO Visibility & Dashboard
- LangSmith dashboards: https://docs.smith.langchain.com/monitoring
- Dify monitoring: https://docs.dify.ai/guides/monitoring
- Google Cloud Monitoring for AI: https://cloud.google.com/vertex-ai/generative-ai/docs/monitor-models
- CrewAI Enterprise observability: https://docs.crewai.com/enterprise/
6. Automated Compliance Generation
- Review each framework's compliance or audit documentation
- SOC 2 / ISO 27001 reference: check vendor compliance pages
- Relevant search terms: "[framework] compliance report", "[framework] audit export"
7. Vendor Security Review Speed
- Framework SOC 2 / CAIQ availability: check security trust pages
- Shared responsibility model documentation
- Vendor security questionnaire responses (if publicly available)
8. Behavioral Drift Detection
- LangGraph state inspection and conditional edges: https://langchain-ai.github.io/langgraph/
- Haystack evaluation pipeline: https://docs.haystack.deepset.ai/docs/evaluation
- OpenTelemetry integration per framework
- Relevant search terms: "[framework] anomaly detection", "[framework] behavior monitoring"
9. Deployment-Time Governance
- LangGraph deployment: https://langchain-ai.github.io/langgraph/concepts/deployment/
- Semantic Kernel plugin registration at startup
- MCP server initialization: https://spec.modelcontextprotocol.io/specification/2024-11-05/server/
- Dify workflow deployment: https://docs.dify.ai/guides/workflow
- Vertex AI Agent Builder deployment: https://cloud.google.com/vertex-ai/generative-ai/docs/agent-builder/deploy
10. Continuous Compliance
- Review each framework's ongoing compliance posture
- Relevant: automated policy drift detection, real-time compliance state
- Check if frameworks offer live compliance status vs. point-in-time reports