Documentation
PagerDuty Auto-Triage
Project-Agent-trust-merge / library/workflows/pagerduty-auto-triage/README.md
Automatically triages PagerDuty alerts by correlating them with Splunk log data. Known false positives are auto-resolved; genuine alerts are escalated with enriched context.
Maturity: L3-4 (Enforced to Governed) ยท See the 5-level maturity model for where this workflow fits in your program.
Time Saved
Before: ~15 minutes per incident manually checking logs, identifying false positives, and resolving or escalating.
After: Automated triage with Splunk correlation. False positives are resolved instantly; genuine alerts reach responders with full log context.
Connectors
| Connector | Operations | Risk | |-----------|-----------|------| | PagerDuty | incidents:read, incidents:update | HIGH | | Splunk | search:execute | LOW | | Slack | chat:write | LOW |
Overall Risk: HIGH -- PagerDuty incidents:update auto-resolves incidents and modifies escalation state. Requires HITL approval for resolve actions.
How It Works
- Fetch new triggered PagerDuty incidents.
- Extract key indicators (host, service, error) from each incident.
- Run a Splunk lookup to check for known false positive patterns.
- Auto-resolve incidents matching false positive signatures (with HITL approval).
- Escalate remaining incidents with Splunk context added.
- Post a Slack summary of triage actions.
ARX Governance
- Risk Classification:
PagerDuty:incidents:read-- LOW -- read-only incident retrievalPagerDuty:incidents:update(resolve) -- HIGH -- auto-resolves incidents, suppressing alertsPagerDuty:incidents:update(escalate) -- MEDIUM -- adds context and escalates to respondersSplunk:search:execute-- LOW -- read-only log correlationSlack:chat:write-- LOW -- informational triage summaries- HITL Gate: Enabled -- PagerDuty incident resolution requires human approval. Escalation with context enrichment is auto-approved.
- Approval Channel:
#soc-approvals - Policy Rules:
- PERMITTED: Reading PagerDuty incidents and running Splunk correlation searches
- PERMITTED: Posting Slack triage summaries
- PERMITTED (auto-approved): Escalating incidents with enriched Splunk context
- ESCALATED (HITL required): Resolving PagerDuty incidents identified as false positives
- DENIED: Suppressing or silencing PagerDuty services or escalation policies
- Audit Trail: Every incident triaged, Splunk correlation results, false positive match details, HITL approval status for resolutions, and escalation actions are logged with incident IDs and timestamps.
- Config: See
arx.yamlfor connector permissions, HITL gate configuration, false positive patterns, and approval channel.
Setup
Prerequisites
``bash pip install arx ``
Environment Variables
``bash export PAGERDUTY_API_KEY="your-pagerduty-api-key" export SPLUNK_URL="https://splunk.your-org.com:8089" export SPLUNK_TOKEN="your-splunk-bearer-token" export SLACK_BOT_TOKEN="xoxb-your-slack-token" export SLACK_TRIAGE_CHANNEL="#soc-triage" ``
Run
```bash
One-time execution
arx run workflow.py
Register on schedule (every 5 minutes)
arx register --config arx.yaml ```
Customization
- Define false positive patterns in
false_positive_patternsconfig - Adjust Splunk correlation search for your log schema
- Change the HITL approval channel in
arx.yaml