Documentation
Runbook: Webhook Retry Backlog
arxsec-site / docs/ops/runbooks/webhook-retry-backlog.md
- Severity:
SEV-2; escalate when dead letters accumulate for customer-critical integrations. - Page: support on-call.
- Triage:
- Inspect
/v1/ops/webhook-deliveriesand/v1/ops/queue-health. - Confirm whether failures are isolated to one endpoint or global.
- Validate signature expectations and downstream status codes.
- Mitigation:
- Disable webhook fanout if a global issue is amplifying retries.
- Replay dead letters only after the destination is verified healthy.
- Notify affected customers if approvals or governance exports are delayed.