Public documentation for governed AI labor
SDKs/Governance/Connectors
Arx / Docs / 07 — Executive synthesis

Documentation

07 — Executive synthesis

Project-Agent / control-plane-evaluation/07-synthesis.md

Project-Agent repo-root control-plane-evaluation/07-synthesis.md

What is being built

A vendor-neutral control plane that issues cryptographic non-human identities for AI agents, enforces Cedar-based action-level policy in-process at every agent runtime, and produces auditor-verifiable compliance evidence — across seven commercial agent platforms (Salesforce Agentforce, Microsoft Foundry/Copilot Studio, Google Gemini Enterprise Agent Platform, AWS Bedrock AgentCore, ServiceNow AI Agents, UiPath, IBM watsonx Orchestrate) and four open frameworks (LangChain/LangGraph, CrewAI, AutoGen, OpenAI Agents SDK).

The three things that are genuinely hard (not "execution risk")

  1. In-process policy enforcement that survives in seven foreign runtimes whose extension models are completely different. Salesforce wants Apex managed packages; Foundry wants Power Platform connectors; Bedrock wants Lambda layers; ServiceNow wants Flow Designer custom actions; UiPath wants Studio activity NuGets. Each is a separate engineering exercise + a separate certified-partner timeline. There is no shortcut and no shared abstraction. Sharding a small engineering team across seven of these in 18 months is the central engineering bet.
  1. A tamper-evident audit chain anchored to the customer's own infrastructure with externally-verifiable Merkle proofs. Auditors don't trust ARX, by design. They run arx-verify against the customer's S3 bucket, walk the Merkle chain, validate RFC 3161 timestamps from an independent timestamp authority, and reach an independent integrity verdict. Building this so it survives a regulator's adversarial review — and so the operational ergonomics of cross-account S3 writes don't make customers say no — is harder than it looks. The crypto is textbook; the operational deployment shape is not.
  1. Owner-binding tied to HR signals so a departing engineer's agents auto-orphan within minutes. This is the part of NHI nobody is doing well. SCIM deprovisioning is the lever; the gap is the auto-suspension policy + grace period + cohort-level cascade so that one leaver's 23 active agents don't stay live for a quarter waiting for someone to notice. The technology is simple; the workflow is not.

The three things that are commoditized — do not over-invest

  1. Workload identity issuance. SPIRE has a decade of production use at scale; fork it, add the per-tenant signing keys + RFC 8693 endpoint + SCIM-driven owner-binding, and stop there. The temptation to write a custom JWT issuer because Cedar+SPIRE has a learning curve is an organizational antipattern.
  1. Policy language. Cedar is open source, in-process, sub-millisecond, and designed for entity-based authorization. The temptation to write a custom DSL because "Cedar doesn't quite express X" is also an antipattern. Reshape the entity model; don't replace the language.
  1. Trace storage and visualization. OpenTelemetry GenAI semantic conventions emit the data; ClickHouse stores it; Grafana / Honeycomb / Datadog already give the customer a query and visualization experience that takes a startup three years to match. Be the policy + identity + evidence layer above the customer's existing observability stack; do not build a Datadog competitor.

The 18-month critical path

Q1: PDP + identity + LangChain SDK end-to-end. Q2: CrewAI + MCP gateway + audit chain to customer S3. Q3: first paying customer in production at $250K+ ARR, running on open frameworks + ServiceNow. Q4: SOC 2 + ISO 42001 + NIST AI RMF + EU AI Act evidence packages — CISO board demo at customer #1 — close customer #2 in financial services. Q5: Salesforce + Bedrock + UiPath native PEPs, $2M+ ARR, SOC 2 Type II report. Q6: Foundry + Gemini + watsonx + customer-VPC deployment, $4-6M ARR, 25K agents under governance.

The single quarter most likely to slip: Q5, because three commercial-platform integrations with separate ISV-review timelines stack in the same 90 days. Recovery: ship egress-proxy fallback for Salesforce in Q5, slide native Apex to Q6.

The two metrics that prove this is working by end of Q4

  1. Signed annual contract value ≥ $700K across customer #1 (production, ~$300K) + customer #2 (production-ready, ~$400K). Not pilot LOIs; paid contracts. Above-zero ACV for any third paid customer who chose ARX over an alternative is a bonus.
  1. A documented CISO sign-off at customer #1 stating that the ARX-generated quarterly compliance package — with auditor-verifiable Merkle-anchored evidence covering SOC 2 + ISO 42001 + NIST AI RMF — is suitable for board reporting and external audit submission. This is the deliverable the platform exists to produce; if a real CISO won't sign off, the thesis doesn't work no matter how good the engineering is.

The clearest reason a sophisticated investor should pass

Microsoft, AWS, and Google each have native first-party agent governance roadmaps. Microsoft's is closest — Defender for Cloud Apps is one Ignite announcement away from cross-cloud AI agent inventory + governance. The window for a third-party cross-platform governance vendor is real but bounded; if Microsoft ships their cross-cloud version 12 months from now, the addressable market collapses to F500 with multi-platform agent estates and a procurement preference for non-Microsoft tooling. That's a real market — at the lower end of the $100M ARR range — but it is not the $10B platform play the thesis suggests.

The investor question that separates the bull and bear cases: is "the only one that integrates with all eleven platforms" durably defensible against "Microsoft Defender governs the eight that matter most"? The bull says yes, because consolidation premium and standards-driven interop favor multi-vendor. The bear says no, because security buyers consolidate to the existing identity / endpoint / SIEM vendor every time consolidation is offered. There is no clean way to know in advance which is right.

The board funding decision is therefore a decision about appetite for category-creation risk in a market where the platform incumbents have native distribution. If the dual-path plan — $1B+ standalone if hyperscaler bundling is delayed, $100–300M strategic acquisition if it isn't — is acceptable, fund it. If only the standalone path is acceptable, do not.

---

Quick reference — the seven evaluation questions

  1. What exactly is being built, at the component level? See 01-component-decomposition.md. Six capabilities decomposed into ~20 components with topology, latency budgets, failure modes.
  2. What is the team buying vs. building, and why? See 02-build-buy-partner.md. Build the IP (~50%); build on open source where the substrate exists (~30%); partner / OEM where the workflow lives (~10%); punt to v2 (~10%).
  3. How does it integrate with the eleven platforms it must govern? See 03-integration-matrix.md. Five different PEP topologies; brittleness ratings 2–4; three platforms (Foundry, Gemini, OpenAI) carry the highest partner-relationship risk.
  4. What is the architectural split, and which decision is highest-stakes? See 04-architecture.md. Control plane vs. data plane with PDP in-process at every PEP. The single highest-stakes decision: the PDP must run in-process at every PEP — if customers refuse to install in-process PEPs, the latency story collapses.
  5. What ships in each of the next six quarters? See 05-build-sequence.md. First paying customer Q3, CISO-grade evidence Q4, multi-platform native PEPs Q5–Q6.
  6. What would kill the thesis, and what is the leading indicator? See 06-disqualifying-risks.md. Six risks; highest is Microsoft Defender extending into cross-cloud agent governance (leading indicator: Microsoft Ignite October 2026).
  7. Should the board fund this? Conditional yes — if the dual-path plan (standalone OR strategic acquisition) is acceptable. Single-path "$10B or zero" framings are not the right shape for this market.