Documentation
Off-Hours Alert Routing
arxsec-site / library/workflows/off-hours-routing/README.md
Intelligently routes PagerDuty alerts that arrive outside business hours by checking Splunk for severity context, then either routes to the on-call team immediately or defers low-priority alerts to the morning queue.
Maturity: L3+ (Enforced and up) ยท See the 5-level maturity model for where this workflow fits in your program.
Time Saved
Before: ~10 minutes per overnight alert manually triaging and deciding whether to wake on-call or defer to morning.
After: Automated severity-based routing. On-call is only paged for confirmed high-severity events; low-priority alerts queue for morning review.
Connectors
| Connector | Operations | Risk | |-----------|-----------|------| | PagerDuty | incidents:read, incidents:update | HIGH | | Splunk | search:execute | LOW | | Slack | chat:write | LOW |
Overall Risk: HIGH -- PagerDuty incidents:update modifies incident state (snooze, re-route). Requires HITL approval.
How It Works
- During off-hours, fetch new PagerDuty triggered incidents.
- Run a Splunk severity check for each alert to determine true urgency.
- High-severity alerts are immediately routed to the on-call responder.
- Low-severity alerts are deferred by snoozing until the morning window.
- Post a Slack summary of routing decisions.
ARX Governance
- Risk Classification:
PagerDuty:incidents:read-- LOW -- read-only incident retrievalPagerDuty:incidents:update-- HIGH -- modifies incident state (snooze, re-route, escalation)Splunk:search:execute-- LOW -- read-only severity context lookupSlack:chat:write-- LOW -- informational routing summaries- HITL Gate: Enabled -- all PagerDuty
incidents:updateoperations require human approval. Snoozing and re-routing decisions are presented to the on-call lead for confirmation before execution. - Approval Channel:
#ops-approvals - Policy Rules:
- PERMITTED: Reading PagerDuty incidents and running Splunk severity checks
- PERMITTED: Posting Slack routing summaries
- ESCALATED (HITL required): Snoozing low-severity PagerDuty incidents until morning
- ESCALATED (HITL required): Re-routing high-severity PagerDuty incidents to on-call
- DENIED: Resolving or acknowledging PagerDuty incidents without human review
- Audit Trail: Every incident evaluated, Splunk severity score, routing decision (route vs. defer), HITL approval status, and final PagerDuty action are logged with timestamps.
- Config: See
arx.yamlfor connector permissions, HITL gate configuration, off-hours window, and approval channel.
Setup
Prerequisites
``bash pip install arx ``
Environment Variables
``bash export PAGERDUTY_API_KEY="your-pagerduty-api-key" export SPLUNK_URL="https://splunk.your-org.com:8089" export SPLUNK_TOKEN="your-splunk-bearer-token" export SLACK_BOT_TOKEN="xoxb-your-slack-token" export SLACK_OPS_CHANNEL="#ops-overnight" ``
Run
```bash
One-time execution
arx run workflow.py
Register on schedule (every 10 minutes during off-hours, 20:00-08:00 UTC)
arx register --config arx.yaml ```
Customization
- Adjust
off_hours_start,off_hours_end, andmorning_snooze_untiltimes - Configure severity thresholds for route-vs-defer decisions
- Change the HITL approval channel in
arx.yaml