Skip to main content

Auth Service — Failure Modes

Status: populated Owner: SRE Last updated: 2026-04-18

#FailureImpactMitigation
1PG downAll logins + token refresh failHA primary, read replicas, HPA scales down dependents
2Redis downJWKS cache miss; api-key lookup hotterFail-open with DB fallback
3Vault down at startupPod fails readinessRolling restart blocked; alert; use cached signing key on already-running pods
4Firebase outageFirebase login fails; password login still worksUsers fall back to password; status page note
5JWKS rotation bug (no overlap)Mass 401 until cache expiresRunbook: force-publish old key; rotation always schedules 10m overlap
6API key lookup endpoint slowKong 5xx backpressureRedis cache + HPA; Kong timeout 200ms
7Clock skew → JWT expired prematurelyIntermittent 401NTP; nbf grace 30s
8Leaked signing keyToken impersonationImmediate emergency rotation; revoke all refresh tokens; force re-login
9Password database leakCredential stuffingargon2id + rate limit at Kong + breach monitoring
10Lockout storm (credential stuffing)Legit users locked outIP-based lockout only; email lockout more conservative; CAPTCHA after N failures