Public documentation for governed AI labor
SDKs/Governance/Connectors
Arx / Docs / Runbook: Redis Unavailable

Documentation

Runbook: Redis Unavailable

Project-Agent / docs/ops/runbooks/redis-unavailable.md

Project-Agent operations docs/ops/runbooks/redis-unavailable.md
  • Severity: SEV-2 by default; escalate to SEV-1 if queueing, rate limiting, or LLM failover becomes unsafe.
  • Page: on-call engineer when /health/components shows Redis unhealthy for more than 2 minutes.
  • Triage:
  • Confirm impact on rate limiting, circuit breaker state, and retry queues.
  • Check whether webhooks and approval escalations are backing up.
  • Mitigation:
  • Disable webhook fanout if retries are compounding load.
  • Pause approval auto-escalation if state churn is unsafe.
  • Prefer restoring Redis before rolling out any new deployment wave.