Public documentation for governed AI labor
SDKs/Governance/Connectors
Arx / Docs / PagerDuty Auto-Triage

Documentation

PagerDuty Auto-Triage

Project-Agent-trust-merge / library/workflows/pagerduty-auto-triage/README.md

Project-Agent-trust-merge repo-root library/workflows/pagerduty-auto-triage/README.md

Automatically triages PagerDuty alerts by correlating them with Splunk log data. Known false positives are auto-resolved; genuine alerts are escalated with enriched context.

Maturity: L3-4 (Enforced to Governed)  ยท  See the 5-level maturity model for where this workflow fits in your program.

Time Saved

Before: ~15 minutes per incident manually checking logs, identifying false positives, and resolving or escalating.

After: Automated triage with Splunk correlation. False positives are resolved instantly; genuine alerts reach responders with full log context.

Connectors

| Connector | Operations | Risk | |-----------|-----------|------| | PagerDuty | incidents:read, incidents:update | HIGH | | Splunk | search:execute | LOW | | Slack | chat:write | LOW |

Overall Risk: HIGH -- PagerDuty incidents:update auto-resolves incidents and modifies escalation state. Requires HITL approval for resolve actions.

How It Works

  1. Fetch new triggered PagerDuty incidents.
  2. Extract key indicators (host, service, error) from each incident.
  3. Run a Splunk lookup to check for known false positive patterns.
  4. Auto-resolve incidents matching false positive signatures (with HITL approval).
  5. Escalate remaining incidents with Splunk context added.
  6. Post a Slack summary of triage actions.

ARX Governance

  • Risk Classification:
  • PagerDuty:incidents:read -- LOW -- read-only incident retrieval
  • PagerDuty:incidents:update (resolve) -- HIGH -- auto-resolves incidents, suppressing alerts
  • PagerDuty:incidents:update (escalate) -- MEDIUM -- adds context and escalates to responders
  • Splunk:search:execute -- LOW -- read-only log correlation
  • Slack:chat:write -- LOW -- informational triage summaries
  • HITL Gate: Enabled -- PagerDuty incident resolution requires human approval. Escalation with context enrichment is auto-approved.
  • Approval Channel: #soc-approvals
  • Policy Rules:
  • PERMITTED: Reading PagerDuty incidents and running Splunk correlation searches
  • PERMITTED: Posting Slack triage summaries
  • PERMITTED (auto-approved): Escalating incidents with enriched Splunk context
  • ESCALATED (HITL required): Resolving PagerDuty incidents identified as false positives
  • DENIED: Suppressing or silencing PagerDuty services or escalation policies
  • Audit Trail: Every incident triaged, Splunk correlation results, false positive match details, HITL approval status for resolutions, and escalation actions are logged with incident IDs and timestamps.
  • Config: See arx.yaml for connector permissions, HITL gate configuration, false positive patterns, and approval channel.

Setup

Prerequisites

``bash pip install arx ``

Environment Variables

``bash export PAGERDUTY_API_KEY="your-pagerduty-api-key" export SPLUNK_URL="https://splunk.your-org.com:8089" export SPLUNK_TOKEN="your-splunk-bearer-token" export SLACK_BOT_TOKEN="xoxb-your-slack-token" export SLACK_TRIAGE_CHANNEL="#soc-triage" ``

Run

```bash

One-time execution

arx run workflow.py

Register on schedule (every 5 minutes)

arx register --config arx.yaml ```

Customization

  • Define false positive patterns in false_positive_patterns config
  • Adjust Splunk correlation search for your log schema
  • Change the HITL approval channel in arx.yaml