Documentation
Runbook: Redis Unavailable
Project-Agent-trust-merge / docs/ops/runbooks/redis-unavailable.md
- Severity:
SEV-2by default; escalate toSEV-1if queueing, rate limiting, or LLM failover becomes unsafe. - Page: on-call engineer when
/health/componentsshows Redis unhealthy for more than 2 minutes. - Triage:
- Confirm impact on rate limiting, circuit breaker state, and retry queues.
- Check whether webhooks and approval escalations are backing up.
- Mitigation:
- Disable webhook fanout if retries are compounding load.
- Pause approval auto-escalation if state churn is unsafe.
- Prefer restoring Redis before rolling out any new deployment wave.