02 — Build vs. buy vs. partner vs. fork

Project-Agent-trust-merge repo-root control-plane-evaluation/02-build-buy-partner.md

For each component identified in Phase 1: classification + justification. The default position is "build only what's a defensible differentiator and reuse everything else." Founders routinely overbuild commoditized infrastructure (especially identity and policy engines); this document is structured to make those defaults visible.

Legend:

B/scratch — build from scratch
B/oss — build on open source (fork or extend)
OEM/partner — embed a commercial component
Punt — defer to v2

---

Capability 1 — Discovery

| Component | Classification | Why | |---|---|---| | C1.1 Discovery broker | B/scratch (the framework) + B/oss (per-platform clients) | The framework that normalizes 11+ platforms into a DiscoveredAgent model is a differentiator — nobody else does it. The per-platform clients are scripted against published APIs (Microsoft Graph, AWS SDK boto3, Salesforce REST, Google Cloud SDK, etc.) and are reused open-source SDKs. No reason to fork those. | | C1.2 Shadow-agent egress sensor | B/oss — fork/extend Tetragon (Apache 2.0, maintainer: Isovalent/Cisco) | Cilium Tetragon already does eBPF-based process and network observability and ships with mature TLS-SNI extraction. Forking adds the LLM-vendor-hostname classifier + the ARX-API exporter. Maintainer risk: low (Tetragon is widely deployed, but Cisco acquired Isovalent — minor coupling risk to a competitor on the security adjacent stack). Alternative: partner with Sysdig/Falco instead of forking; Falco has rules-based detection that fits this use case and is CNCF-graduated. | | C1.3 Inventory store + change feed | B/oss — Postgres + Kafka (or Redpanda for self-hosted) | Postgres SCD-Type-2 is a 100-line schema. Kafka is operational baggage but unavoidable. Punt to Postgres LISTEN/NOTIFY for v1; add Kafka in Q4 once cardinality demands it. |

Honest pushback: the discovery framework looks like a differentiator but might not be — it's plausible that Wiz ships a competing inventory of agents in the next 12 months on top of their existing CSPM; Permiso Security is positioned for NHI discovery already and could extend to agents. Build the framework but design for if Wiz ships this, we depend on Wiz feeds + add value above them — i.e., the framework should be capable of consuming a Wiz inventory rather than re-discovering the same agents.

---

Capability 2 — Identity

| Component | Classification | Why | |---|---|---| | C2.1 Identity issuer (NHI authority) | B/oss — fork/extend SPIRE (Apache 2.0, maintainer: CNCF / SPIFFE TSC) | SPIRE is the reference SPIFFE implementation. The work to add: per-tenant signing keys backed by Vault Transit; JWT-SVID custom claims (human owner OIDC sub, manifest hash); RFC 8693 token-exchange endpoint; HA replica strategy. SPIRE's plugin model is designed for this. Building a workload-identity issuer from scratch is the canonical example of overbuilding commoditized infrastructure. | | C2.2 Credential broker (ZSP) | B/scratch (the orchestration layer) + B/oss (per-cloud STS clients) | The orchestration that mints + caches + revokes per-(agent, target) credentials with sub-50ms hot-path budget is differentiated work. The actual credential mints reuse cloud SDKs (AWS STS, GCP IAM Credentials API, Azure Federated Identity Credential, Salesforce OAuth named-credential exchange). | | C2.3 Owner-binding registry | B/scratch | Trivial table + UI. Build. The differentiation is in the workflow (HR-departure auto-orphaning), not the storage. |

Honest pushback: the temptation to write a custom JWT issuer "because Cedar+SPIRE is a learning curve" should be resisted. SPIRE has had a decade of production use at companies bigger than the realistic v1 customer base. Use it.

The non-obvious commodity item: OAuth 2.0 token exchange (RFC 8693) implementation. There are mature open-source libraries (oauthlib in Python, nimbus-oauth2-oidc-sdk in Java). Use them.

---

Capability 3 — Runtime policy enforcement

| Component | Classification | Why | |---|---|---| | C3.1 Policy decision point (Cedar evaluator) | B/oss — Cedar SDK (Apache 2.0, maintainer: AWS / Cedar TSC) | Cedar is open-source, the evaluator is in-process, and it's specifically designed for entity-based authorization. No reason to write a custom policy language. The work is the schema design (Agent / Action / Resource / Context entities) and policy patterns library. | | C3.2 PEP — in-process SDK flavor | B/scratch (per-framework wrapper) | Each of OpenAI Agents SDK, LangChain/LangGraph, CrewAI, AutoGen requires a hand-written guardrail/middleware. No shared abstraction across them yet (despite OpenInference and Logfire trying). | | C3.2 PEP — MCP gateway flavor | B/oss — fork the Anthropic MCP reference SDK (MIT, maintainer: Anthropic) + add proxy layer | The MCP gateway is a moderately-complex network proxy that speaks MCP wire protocol. Anthropic's reference SDK handles wire format; the proxy logic (intercept tools/call, query PDP, rewrite or reject) is ARX-original. | | C3.2 PEP — sidecar flavor | B/oss — Envoy + ext_authz plugin | Envoy is the obvious choice (CNCF graduated, ext_authz is a first-class extension point). The ARX work is a small ext_authz server that wraps the Cedar PDP. Operational pattern is well-understood (Istio, Linkerd, Consul Connect all use this). | | C3.2 PEP — platform-native action hook | B/scratch per-platform | Every commercial platform has its own extension model (Salesforce Apex managed package; ServiceNow Flow Designer custom action; UiPath Studio activity NuGet; watsonx skill wrapper). Each is bespoke engineering with that platform's certified-partner program (months of partner certification per target). Cannot be shortcut. | | C3.2 PEP — egress proxy | B/oss — fork mitmproxy or build on Envoy | Network egress proxy that classifies LLM-vendor hostnames + extracts what it can from TLS-terminated traffic. mitmproxy or Envoy both work; mitmproxy has better Python ergonomics for the LLM-traffic classifier. License: BSD (mitmproxy), Apache 2.0 (Envoy). | | C3.3 Policy authoring + bundle distribution | B/scratch (UI) + B/oss (S3 + signing) | UI is product surface (no shortcut). Bundle distribution is glorified S3 — use sigstore Cosign for bundle signing (Apache 2.0, OpenSSF) so customers can verify bundle integrity offline. |

Honest pushback: the temptation to write a custom policy DSL "because Cedar doesn't quite fit our agent semantics" should be resisted hard. Custom DSLs are a tarpit. If Cedar's entity model doesn't fit, reshape the entity model — don't replace the language. Cedar can express anything OPA/Rego can with more determinism.

The single largest "build from scratch because the team wants to" risk in this entire roadmap is C3.1. Founders building security platforms reliably overbuild policy engines. Hold the line on Cedar.

---

Capability 4 — Compliance evidence

| Component | Classification | Why | |---|---|---| | C4.1 Evidence emitter | B/scratch | This is the main IP. The mapping (event_type, condition) → control_id[] per framework + the source-line attribution via build-time annotation walking are differentiated work. There is no open-source library that does this for the six target frameworks. The frameworks themselves are public domain. | | C4.2 Audit chain (Merkle + RFC 3161) | B/oss | Merkle tree implementation is textbook (pymerkle, merklelib, or roll your own ~80 lines). RFC 3161 timestamp client: python-rfc3161-client, rfc3161ng. Customer-side arx-verify CLI is the only differentiated piece — make it open-source on PyPI so customers and auditors can read the source. | | C4.3 Evidence package builder | B/scratch | Templates per framework (Jinja2 → PDF via WeasyPrint or wkhtmltopdf). The framework knowledge is the IP, the rendering is the easy part. | | C4.4 Vendor-questionnaire renderer | B/scratch with OEM/partner fallback | SIG/CAIQ/HECVAT templates are publicly available. The smart move is to partner with Vanta or Drata to write directly into their questionnaire-management UI — meet the GRC-tool buyer where they already are. Vanta + Drata are also potential acquirers; partnering early reduces friction either way. |

Honest pushback: there's a strong "build a competitor to Vanta" pull on this capability set. Resist. Vanta's moat is the GRC workflow; ARX's moat is the underlying evidence with source-line attribution. Be the better evidence layer beneath every GRC tool, not a competing GRC tool.

---

Capability 5 — Observability + cost

| Component | Classification | Why | |---|---|---| | C5.1 OTLP receiver + storage | B/oss | OpenTelemetry Collector (Apache 2.0, CNCF) for ingest. ClickHouse (Apache 2.0) for span storage. Both are mature, both are operationally expensive — consider OEM/partner with Grafana Cloud or Honeycomb for the storage tier in early customers (lower ops burden, you self-host the receiver). | | C5.2 Cost attribution | B/scratch | The model-pricing table maintenance + tool-pricing table + the join logic is differentiated. There's no off-the-shelf for "cost per agent action across multi-vendor multi-tool." LangSmith / LangFuse / Helicone / Datadog LLM Observability all do per-LLM-call cost; the per-agent-action rollup with audit/enforcement correlation is the ARX angle. | | C5.3 Trace ↔ audit ↔ enforcement correlation | B/scratch | Same reasoning as C4.1 — the correlation is the differentiator. ClickHouse handles the joins. | | C5.4 Behavioral telemetry / drift | B/scratch (already built) | The drift detector (app/core/drift_detector.py) is real and works; needs scaling rather than replacement. |

Honest pushback: there's a real argument to OEM Honeycomb or Grafana Cloud as the trace tier and not own that storage — the difference between "best-in-class trace tooling" and "good-enough trace tooling" is a multi-year engineering investment. Customers already use one of those. Position ARX as the policy + identity + evidence layer that tags into the customer's existing OpenTelemetry pipeline rather than as another observability vendor. This is the most important build/buy decision in the entire roadmap.

Recommendation: do not build a query/visualization tier. Ship OpenTelemetry-native and let customers use whatever observability tool they already pay for. The ARX differentiated read is the join with audit/enforcement, exposed via a thin API.

---

Capability 6 — Kill switch

| Component | Classification | Why | |---|---|---| | C6.1 Kill-switch orchestrator | B/scratch | The saga pattern across 11+ platforms with partial-failure semantics is bespoke. Use Temporal (MIT, maintainer: Temporal Technologies) for the saga durability — orchestrating multi-step termination with retries, timeouts, and durable state is exactly Temporal's wheelhouse. Building this on naked async/await + retry decorators is a known anti-pattern. | | C6.2 Distributed revocation list | B/oss | Redis Cluster + Redis Streams. Bloom filter implementation: standard library or pybloom-live. Nothing custom. | | C6.3 Exit attestation generator | B/scratch | Tied to evidence emitter (C4.1) — same machinery, different report shape. |

Honest pushback: the temptation here is to write a custom workflow engine for the saga ("because Temporal is heavyweight"). Resist. Temporal's operational footprint is well-known; the cost of NOT using it is lost time on workflow correctness bugs that surface only in partial-failure scenarios. Pre-build the saga in Temporal in Q1.

---

Cross-cutting infrastructure

These don't belong to one capability but show up everywhere:

| Component | Classification | Why | |---|---|---| | Multi-tenant data plane | B/scratch on top of Postgres RLS + Kafka topic naming convention + Vault per-tenant transits | Already partially in place (arxsec-api/ uses Supabase RLS). Extend. Avoid building a custom isolation primitive. | | Customer-data residency (US/EU/APAC) | OEM/partner — use Cloudflare Workers + R2, AWS Outposts, or per-region cell deployments on AWS | Don't build region-routing primitives. The standard "deploy a cell per region with strict no-cross-region writes" pattern is well-known. | | CMEK / BYOK for tenant data at rest | B/oss — AWS KMS, Azure Key Vault, GCP Cloud KMS as backing | Standard envelope encryption. Already partially in place per docs/admin/cmek.md. | | SAML/OIDC/SCIM for human admins | B/oss — Authentik, Keycloak, or Auth0 (OEM) | Already partially built. Don't write any more custom identity code than absolutely necessary. The temptation to build "because we know identity" is high; the right answer is to outsource and concentrate on agent identity (C2.1), not human identity. | | GRC integration (Vanta, Drata, ServiceNow GRC, Workiva, AuditBoard) | OEM/partner | Native integrations, not in-house GRC features. | | Foundation-model abstraction (LLM router) | Already built + B/oss extend — OpenRouter or LiteLLM | The current app/llm/router.py is solid for two providers. To add Bedrock, Vertex, Foundry models without rewriting the abstraction, fork or layer on LiteLLM (MIT, maintainer: BerriAI — startup risk, but the project is widely used). Or, embed OpenRouter as a meta-provider (commercial). | | MCP server registry / catalog | B/oss — extend Anthropic's reference MCP registry | Once MCP server registries become standardized (likely Q3 2026 per current SIG cadence), participate in the standard. Don't build a parallel registry. |

---

Summary — what to build vs. what to reuse

Build these from scratch (the IP) — ~50% of total engineering effort:

The discovery framework that normalizes 11 platforms into a single inventory (C1.1)
The credential broker that mints scoped tokens per-(agent, target) on the hot path (C2.2)
The owner-binding registry tied to HR signals (C2.3)
Per-platform native action-hooks (C3.2 platform-native flavor — five different bespoke engineering exercises)
Per-framework guardrail packages for the four open frameworks (C3.2 in-process flavor)
The Cedar policy schema + policy library + the authoring UX (C3.3)
The evidence emitter with source-line attribution + framework mapping (C4.1)
The cost attribution engine with per-action rollup (C5.2)
The trace ↔ audit ↔ enforcement correlation API (C5.3)
The kill-switch orchestrator's saga (C6.1) — but written on Temporal, not custom

Build on open source (the substrate) — ~30%:

SPIRE for identity issuance (C2.1)
Cedar SDK for the PDP (C3.1)
Tetragon (or Falco) for shadow-agent eBPF sensing (C1.2)
Envoy + ext_authz for sidecar PEP (C3.2)
mitmproxy for egress-proxy PEP (C3.2)
Temporal for saga orchestration (C6.1)
OpenTelemetry Collector + ClickHouse for trace ingest/storage (C5.1)
LiteLLM for multi-provider LLM routing (cross-cutting)
Sigstore Cosign for bundle signing (C3.3)
RFC 3161 client lib for audit-chain timestamps (C4.2)

Partner / OEM — ~10%:

Vanta or Drata for vendor-questionnaire workflow integration (C4.4)
(Optional) Honeycomb or Grafana Cloud for trace storage/UI tier (C5.1) — strongly recommended
Auth0 or Authentik for human-admin SSO (cross-cutting)
AWS KMS / Azure Key Vault / GCP Cloud KMS for CMEK backing (cross-cutting)

Punt to v2 — ~10%:

A2A protocol support beyond W3C Trace Context propagation
eBPF kernel-level enforcement (the "Tier 2 in intune_ebpf.py" claim) — discovery via eBPF is realistic, enforcement via eBPF on production endpoints is a 2-year solo project not worth it for v1
First-party model evaluation harness beyond what app/llm/eval/ already sketches
Custom GRC workflow (delegate to Vanta/Drata)
Workforce planning / consultant analytics tier (ride on Snowflake or Looker + customer's own BI tool)

---

The two largest "build because we want to, not because the market requires" risks

Custom policy DSL instead of Cedar. Symptom: "Cedar doesn't quite let us express X." Mitigation: hard organizational rule that any "extend Cedar" proposal goes to an architecture-review board with the burden of proof on the proposer.

First-party observability tier instead of OpenTelemetry-native + partner. Symptom: "we need our own visualization for the demo." Mitigation: do the demo on Honeycomb or Grafana Cloud and prove that the cost is a reasonable engineering swap. If the demo is good enough on partner tooling, the customer will not notice or care.

A third less obvious one: building a Vanta-competitor. Symptom: "while we're emitting evidence, we should manage the GRC workflow too." This is a 5-year, $50M product line. Don't.