Incident response flow
Topic: Monitoring basics
Summary
When an alert fires: acknowledge, assess impact, mitigate or fix, communicate, and write postmortem. Use when defining how to respond to incidents.
Intent: How-to
Quick answer
- Acknowledge alert. Assess impact and scope. Check runbook and dashboard. Mitigate or fix.
- Communicate to stakeholders. Status page or channel. Update as situation changes.
- Resolve. Write postmortem. Action items. Update runbook if needed.
Prerequisites
Steps
-
Triage and mitigate
Ack alert. Assess. Follow runbook. Mitigate or fix. Escalate if needed.
-
Communicate
Notify stakeholders. Status page or channel. Updates.
-
Resolve and learn
Resolve. Postmortem. Action items. Update runbook.
Summary
Triage and mitigate. Communicate. Resolve and postmortem. Update runbook.
Prerequisites
Steps
Step 1: Triage and mitigate
Ack. Assess. Runbook. Mitigate. Escalate.
Step 2: Communicate
Notify. Status. Updates.
Step 3: Resolve and learn
Resolve. Postmortem. Actions. Runbook.
Verification
- Incidents triaged and resolved. Postmortems and runbooks updated.
Troubleshooting
No runbook — Create from this incident. Poor comms — Define template and channel.