Project-Agent-trust-merge repo-root library/workflows/pagerduty-auto-triage/README.md

Automatically triages PagerDuty alerts by correlating them with Splunk log data. Known false positives are auto-resolved; genuine alerts are escalated with enriched context.

Maturity: L3-4 (Enforced to Governed)  ·  See the 5-level maturity model for where this workflow fits in your program.

Time Saved

Before: ~15 minutes per incident manually checking logs, identifying false positives, and resolving or escalating.

After: Automated triage with Splunk correlation. False positives are resolved instantly; genuine alerts reach responders with full log context.

Connectors

| Connector | Operations | Risk | |-----------|-----------|------| | PagerDuty | incidents:read, incidents:update | HIGH | | Splunk | search:execute | LOW | | Slack | chat:write | LOW |

Overall Risk: HIGH -- PagerDuty incidents:update auto-resolves incidents and modifies escalation state. Requires HITL approval for resolve actions.

How It Works

Fetch new triggered PagerDuty incidents.
Extract key indicators (host, service, error) from each incident.
Run a Splunk lookup to check for known false positive patterns.
Auto-resolve incidents matching false positive signatures (with HITL approval).
Escalate remaining incidents with Splunk context added.
Post a Slack summary of triage actions.

ARX Governance

Risk Classification:
PagerDuty:incidents:read -- LOW -- read-only incident retrieval
PagerDuty:incidents:update (resolve) -- HIGH -- auto-resolves incidents, suppressing alerts
PagerDuty:incidents:update (escalate) -- MEDIUM -- adds context and escalates to responders
Splunk:search:execute -- LOW -- read-only log correlation
Slack:chat:write -- LOW -- informational triage summaries
HITL Gate: Enabled -- PagerDuty incident resolution requires human approval. Escalation with context enrichment is auto-approved.
Approval Channel: #soc-approvals
Policy Rules:
PERMITTED: Reading PagerDuty incidents and running Splunk correlation searches
PERMITTED: Posting Slack triage summaries
PERMITTED (auto-approved): Escalating incidents with enriched Splunk context
ESCALATED (HITL required): Resolving PagerDuty incidents identified as false positives
DENIED: Suppressing or silencing PagerDuty services or escalation policies
Audit Trail: Every incident triaged, Splunk correlation results, false positive match details, HITL approval status for resolutions, and escalation actions are logged with incident IDs and timestamps.
Config: See arx.yaml for connector permissions, HITL gate configuration, false positive patterns, and approval channel.

Setup

Prerequisites

``bash pip install arx ``

Environment Variables

``bash export PAGERDUTY_API_KEY="your-pagerduty-api-key" export SPLUNK_URL="https://splunk.your-org.com:8089" export SPLUNK_TOKEN="your-splunk-bearer-token" export SLACK_BOT_TOKEN="xoxb-your-slack-token" export SLACK_TRIAGE_CHANNEL="#soc-triage" ``

Run

```bash

One-time execution

arx run workflow.py

Register on schedule (every 5 minutes)

arx register --config arx.yaml ```

Customization

Define false positive patterns in false_positive_patterns config
Adjust Splunk correlation search for your log schema
Change the HITL approval channel in arx.yaml

PagerDuty Auto-Triage