Launch Assets for Benchmark v1.0

Project-Agent-trust-merge repo-root benchmark/LAUNCH_ASSETS.md

LinkedIn Announcement Post (Draft)

---

We just published the first version of the Arxsec AI Agent Governance Maturity Benchmark.

It scores 10 of the most widely deployed AI agent frameworks — LangGraph, AutoGen, CrewAI, Semantic Kernel, Claude + MCP, OpenAI Assistants, Vertex AI Agent Builder, LlamaIndex, Dify, and Haystack — against the five-level governance maturity model we published at arxsec.io/maturity.

The benchmark is built on a single question: how well does each framework support governance natively — not with external platforms bolted on after the fact, but as a first-class capability?

We scored 10 dimensions: agent registry, policy enforcement, audit trail, human-in-the-loop gates, CISO visibility, automated compliance generation, vendor review speed, behavioral drift detection, deployment-time governance, and continuous compliance.

Every score requires a primary-source evidence statement — official documentation, GitHub, or published changelogs. No scores are published without a traceable citation. Framework vendors can submit corrections.

This is a v1.0. Scores are provisional and will be updated quarterly. We are making the scoring rubric, data, and methodology fully public — so the community can hold us accountable the same way we're holding the frameworks accountable.

If you're evaluating AI agent frameworks for enterprise deployment, this is the governance lens that's been missing.

benchmark.arxsec.io

#AIAgents #EnterpriseAI #AIGovernance #SecurityEngineering #Arxsec

---

Research Sources for Scoring Each Framework

The following primary sources should be consulted when assigning verified scores. Organized by dimension.

1. Agent Registry & Inventory

LangSmith (LangChain): https://docs.smith.langchain.com/
AutoGen Studio agent management: https://microsoft.github.io/autogen/docs/autogen-studio
CrewAI Enterprise dashboard: https://docs.crewai.com/enterprise/
OpenAI Assistants list/retrieve: https://platform.openai.com/docs/api-reference/assistants
Google Vertex AI Agent Builder management: https://cloud.google.com/vertex-ai/generative-ai/docs/agent-builder/manage-agents
Dify app management: https://docs.dify.ai/guides/application-orchestrate

2. Policy Enforcement

Semantic Kernel function invocation filters: https://learn.microsoft.com/en-us/semantic-kernel/concepts/filters
LangGraph conditional edges / interrupt: https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/
MCP authorization spec: https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/authorization/
Vertex AI IAM and access controls: https://cloud.google.com/vertex-ai/docs/general/iam
Haystack pipeline guards: https://docs.haystack.deepset.ai/docs/guardrails

3. Audit Trail

LangSmith tracing: https://docs.smith.langchain.com/tracing
OpenAI run lifecycle: https://platform.openai.com/docs/assistants/how-it-works/run-lifecycle
Google Cloud Audit Logs: https://cloud.google.com/logging/docs/audit
Haystack tracing: https://docs.haystack.deepset.ai/docs/tracing
AutoGen logging: https://microsoft.github.io/autogen/docs/autogen-ext/events/

4. Human-in-the-Loop (HITL) Gates

LangGraph HITL: https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/
AutoGen human proxy: https://microsoft.github.io/autogen/docs/tutorial/human-in-the-loop
MCP sampling (human review hooks): https://spec.modelcontextprotocol.io/specification/2024-11-05/client/sampling/
Semantic Kernel function choice behavior: https://learn.microsoft.com/en-us/semantic-kernel/concepts/ai-services/chat-completion/function-calling/

5. CISO Visibility & Dashboard

LangSmith dashboards: https://docs.smith.langchain.com/monitoring
Dify monitoring: https://docs.dify.ai/guides/monitoring
Google Cloud Monitoring for AI: https://cloud.google.com/vertex-ai/generative-ai/docs/monitor-models
CrewAI Enterprise observability: https://docs.crewai.com/enterprise/

6. Automated Compliance Generation

Review each framework's compliance or audit documentation
SOC 2 / ISO 27001 reference: check vendor compliance pages
Relevant search terms: "[framework] compliance report", "[framework] audit export"

7. Vendor Security Review Speed

Framework SOC 2 / CAIQ availability: check security trust pages
Shared responsibility model documentation
Vendor security questionnaire responses (if publicly available)

8. Behavioral Drift Detection

LangGraph state inspection and conditional edges: https://langchain-ai.github.io/langgraph/
Haystack evaluation pipeline: https://docs.haystack.deepset.ai/docs/evaluation
OpenTelemetry integration per framework
Relevant search terms: "[framework] anomaly detection", "[framework] behavior monitoring"

9. Deployment-Time Governance

LangGraph deployment: https://langchain-ai.github.io/langgraph/concepts/deployment/
Semantic Kernel plugin registration at startup
MCP server initialization: https://spec.modelcontextprotocol.io/specification/2024-11-05/server/
Dify workflow deployment: https://docs.dify.ai/guides/workflow
Vertex AI Agent Builder deployment: https://cloud.google.com/vertex-ai/generative-ai/docs/agent-builder/deploy

10. Continuous Compliance

Review each framework's ongoing compliance posture
Relevant: automated policy drift detection, real-time compliance state
Check if frameworks offer live compliance status vs. point-in-time reports