Auth Service — Observability
Status: populated Owner: SRE + Security Last updated: 2026-04-18
1. SLIs / SLOs
| SLI | SLO |
|---|---|
/v1/auth/login P95 | ≤ 300 ms (Firebase latency included) |
/v1/api-keys/lookup P95 (Kong caller) | ≤ 30 ms |
/.well-known/jwks.json availability | ≥ 99.99% |
| Failed-login burst detection | < 60s to alert |
2. Metrics
auth_login_total{result="ok|invalid|locked|mfa_required"}
auth_login_duration_seconds_bucket{method=...}
auth_api_key_lookup_total{result="hit|miss|revoked"}
auth_jwks_rotation_total{kid=...}
auth_token_issue_total{type="access|refresh"}
auth_events_emitted_total{subject=...}
3. Alerts
AuthFailedLoginBurst:rate(auth_login_total{result='invalid'}[5m]) > 50AuthJwksStale: JWKS not rotated in > 31 dAuthApiKeyLookupHigh5xx: lookup 5xx > 1%/5m — Kong fails closed → incoming 401s spikeAuthDbDown,AuthRedisDown
4. Dashboards
- Auth Overview (logins, active sessions, lockouts)
- API Keys (lookups/s, revocations, issuance)
- JWKS (last rotation, fetches/s, cache hit)