Skip to main content

Auth Service — Observability

Status: populated Owner: SRE + Security Last updated: 2026-04-18

1. SLIs / SLOs

SLISLO
/v1/auth/login P95≤ 300 ms (Firebase latency included)
/v1/api-keys/lookup P95 (Kong caller)≤ 30 ms
/.well-known/jwks.json availability≥ 99.99%
Failed-login burst detection< 60s to alert

2. Metrics

auth_login_total{result="ok|invalid|locked|mfa_required"}
auth_login_duration_seconds_bucket{method=...}
auth_api_key_lookup_total{result="hit|miss|revoked"}
auth_jwks_rotation_total{kid=...}
auth_token_issue_total{type="access|refresh"}
auth_events_emitted_total{subject=...}

3. Alerts

  • AuthFailedLoginBurst: rate(auth_login_total{result='invalid'}[5m]) > 50
  • AuthJwksStale: JWKS not rotated in > 31 d
  • AuthApiKeyLookupHigh5xx: lookup 5xx > 1%/5m — Kong fails closed → incoming 401s spike
  • AuthDbDown, AuthRedisDown

4. Dashboards

  • Auth Overview (logins, active sessions, lockouts)
  • API Keys (lookups/s, revocations, issuance)
  • JWKS (last rotation, fetches/s, cache hit)