Public documentation for governed AI labor
SDKs/Governance/Connectors
Arx / Docs / Runbook: Webhook Retry Backlog

Documentation

Runbook: Webhook Retry Backlog

Project-Agent / docs/ops/runbooks/webhook-retry-backlog.md

Project-Agent operations docs/ops/runbooks/webhook-retry-backlog.md
  • Severity: SEV-2; escalate when dead letters accumulate for customer-critical integrations.
  • Page: support on-call.
  • Triage:
  • Inspect /v1/ops/webhook-deliveries and /v1/ops/queue-health.
  • Confirm whether failures are isolated to one endpoint or global.
  • Validate signature expectations and downstream status codes.
  • Mitigation:
  • Disable webhook fanout if a global issue is amplifying retries.
  • Replay dead letters only after the destination is verified healthy.
  • Notify affected customers if approvals or governance exports are delayed.