Incident Response

Authored — not generated. Update directly in docs-portal/_authored/07-runbooks/incident-response.md.

Severity ladder

Severity	Definition	Page
SEV-1	Customer-impacting outage, data loss risk, security breach	Page on-call + SRE lead immediately
SEV-2	Degraded for some tenants, no data loss	Page on-call
SEV-3	Internal degradation, customer-invisible	Slack channel
SEV-4	Cosmetic	Ticket

Open the SLO board and the relevant service's runbook tab.
Capture the traceparent of a failing request — it threads through every log/metric.
Confirm tenant scope: single-tenant impact vs platform-wide.
Decide rollback vs forward-fix. Default is rollback if a deploy in the last 30 min is suspect.
Open the incident channel and pin the single source of truth doc.