Incident response flow

Topic: Monitoring basics

Summary

When an alert fires: acknowledge, assess impact, mitigate or fix, communicate, and write postmortem. Use when defining how to respond to incidents.

Intent: How-to

Quick answer

  • Acknowledge alert. Assess impact and scope. Check runbook and dashboard. Mitigate or fix.
  • Communicate to stakeholders. Status page or channel. Update as situation changes.
  • Resolve. Write postmortem. Action items. Update runbook if needed.

Prerequisites

Steps

  1. Triage and mitigate

    Ack alert. Assess. Follow runbook. Mitigate or fix. Escalate if needed.

  2. Communicate

    Notify stakeholders. Status page or channel. Updates.

  3. Resolve and learn

    Resolve. Postmortem. Action items. Update runbook.

Summary

Triage and mitigate. Communicate. Resolve and postmortem. Update runbook.

Prerequisites

Steps

Step 1: Triage and mitigate

Ack. Assess. Runbook. Mitigate. Escalate.

Step 2: Communicate

Notify. Status. Updates.

Step 3: Resolve and learn

Resolve. Postmortem. Actions. Runbook.

Verification

  • Incidents triaged and resolved. Postmortems and runbooks updated.

Troubleshooting

No runbook — Create from this incident. Poor comms — Define template and channel.

Next steps

Continue to