Auth Service — Jira-Ready Epics & User Stories
Status: populated Owner: Platform Engineering Last updated: 2026-04-18 Service prefix: AUTH Scope: New epics/stories covering Kong JWT/API-key integration (ADR-0001), JWKS publication, Firebase federation, session management, and account provisioning.
Epic Summary
| Epic ID | Title | Stories | Points |
|---|---|---|---|
| EP-AUTH-01 | Kong JWT Integration (JWKS Publisher) | US-AUTH-001 – US-AUTH-005 | 26 |
| EP-AUTH-02 | API Key Lifecycle & Kong Plugin Integration | US-AUTH-010 – US-AUTH-015 | 34 |
| EP-AUTH-03 | User & Account Management | US-AUTH-020 – US-AUTH-025 | 28 |
| EP-AUTH-04 | Session Management & Token Refresh | US-AUTH-030 – US-AUTH-034 | 20 |
| EP-AUTH-05 | Security Hardening & Observability | US-AUTH-040 – US-AUTH-044 | 18 |
EP-AUTH-01 · Kong JWT Integration (JWKS Publisher)
Context: Kong's
jwtplugin validates Bearer tokens using the auth-service JWKS endpoint. Auth-service must publish RS256 public keys at/.well-known/jwks.jsonwithout a Kong route (Kong polls it directly via cluster-internal DNS).
US-AUTH-001 · JWKS endpoint publication
Type: Feature | Points: 5
Description:
As Kong's jwt plugin, I need to poll /.well-known/jwks.json from auth-service so that RS256 JWT access tokens issued by auth-service can be validated at the edge without forwarding requests to auth-service.
Acceptance Criteria:
-
GET /.well-known/jwks.jsonreturns RFC 7517 JWK Set with all active RS256 public keys - Each JWK entry:
{ kty: "RSA", use: "sig", alg: "RS256", kid, n, e } - Endpoint reachable cluster-internally (no Kong route); NOT proxied through Kong
- Response cached in memory with 5-minute TTL to avoid Vault round-trips on every request
- Response includes
Cache-Control: public, max-age=300header - Unit test: JWKS response validates against RFC 7517 schema
US-AUTH-002 · RS256 key generation and Vault storage
Type: Feature | Points: 8
Description:
As the auth-service, I need RSA 2048-bit signing key pairs generated and stored in HashiCorp Vault so that private keys never touch application memory at rest.
Acceptance Criteria:
- Key pairs generated using Vault PKI or
transitengine; private key path:secret/auth/jwks/{kid}/private - Public key material stored in
auth.jwk_keystable alongsidekid,algorithm,status(ACTIVE/RETIRED),createdAt,retiredAt - Key
kid= UUID v4 generated at creation - Minimum 1 active key at all times; max 2 active keys during rotation
- Key material never logged or returned in API responses
US-AUTH-003 · JWT issuance with RS256
Type: Feature | Points: 5
Description:
As an authenticated user, I need JWT access tokens issued with RS256 so that Kong can validate them using the public JWKS without a secret shared between Kong and auth-service.
Acceptance Criteria:
- Access token:
alg: RS256,exp: now + 15min,iss: https://auth.ghasi.io,sub: userId,tenantId,roles: [],kid(matching active JWK) - Refresh token: opaque random string (32 bytes base62), stored hashed in
auth.refresh_tokens - Token pair returned from
POST /v1/auth/loginandPOST /v1/auth/refresh -
kidheader in JWT matches entry in JWKS response - Unit test: issued JWT verifiable using JWKS public key
US-AUTH-004 · JWKS key rotation
Type: Feature | Points: 5
Description:
As the platform, I need JWKS key rotation to work without downtime so that signing keys can be cycled periodically or on compromise without invalidating current sessions.
Acceptance Criteria:
- Rotation: generate new key pair, add to JWKS (status=ACTIVE), set old key to RETIRING
- RETIRING keys remain in JWKS response for 30 minutes (JWT TTL grace period)
- After 30 minutes: old key removed from JWKS, status set to RETIRED
- Rotation can be triggered via
POST /v1/internal/auth/rotate-jwks(admin mTLS, not via Kong) -
auth_key_rotation_totalPrometheus counter incremented on each rotation - Alert: if active key count < 1 → PagerDuty critical
US-AUTH-005 · Kong jwt plugin configuration for auth-service routes
Type: Configuration | Points: 3
Description:
As a platform operator, I need Kong's jwt plugin configured to verify RS256 tokens using auth-service's JWKS so that all protected routes validate tokens at the edge.
Acceptance Criteria:
-
jwtplugin configured globally or per-route withsecret_is_base64: false,algorithm: RS256 -
config.jwks_uriset tohttp://auth-service:3002/.well-known/jwks.json(cluster-internal) -
config.key_claim_name: "kid"to select correct JWK - 401 returned by Kong if token expired, invalid signature, or missing
- Declarative config in
services/api-gateway/kong/plugins/jwt.yaml
EP-AUTH-02 · API Key Lifecycle & Kong Plugin Integration
Context: Customers use long-lived API keys (no expiry by default) for programmatic access. Kong's
ghasi-api-key-lookupcustom plugin validates keys by calling auth-service's internal lookup endpoint.
US-AUTH-010 · API key creation endpoint
Type: Feature | Points: 5
Description:
As a customer, I need to create API keys from the customer portal so that I can authenticate programmatic requests to the platform.
Acceptance Criteria:
-
POST /v1/api-keysaccepts{ label: string, expiresAt?: ISO8601 } - Generates raw key:
ghasi_live_<24 chars base62>(format:ghasi_live_prefix + random) - Stores
sha256(rawKey)hash inauth.api_keys; raw key returned exactly once in response - Response:
{ id, label, key: "ghasi_live_...", createdAt, expiresAt }— key never returned again - Rate limit: max 10 active keys per account
- Unit test: verify stored hash matches
sha256(returnedKey)
US-AUTH-011 · API key listing and revocation
Type: Feature | Points: 3
Description:
As a customer, I need to list and revoke my API keys so that I can manage access credentials and respond to key compromise.
Acceptance Criteria:
-
GET /v1/api-keysreturns[{ id, label, keyPrefix (first 8 chars), createdAt, expiresAt, status }] - Raw key NOT returned in listing; only prefix for identification
-
DELETE /v1/api-keys/:idsetsstatus = REVOKED,revokedAt = now() - Revoked keys immediately fail validation (no TTL grace period)
-
auth_api_key_revoked_totalcounter incremented
US-AUTH-012 · Internal API key lookup endpoint (Kong plugin integration)
Type: Feature | Points: 8
Description:
As Kong's ghasi-api-key-lookup custom plugin, I need an internal endpoint that validates a hashed API key and returns the associated account so that Kong can authenticate API key requests at the edge.
Acceptance Criteria:
-
GET /v1/api-keys/lookup?hash=<sha256hex>returns{ accountId, tenantId, roles, status }for valid key - Returns
404for unknown hash - Returns
403for REVOKED or EXPIRED key with{ reason: "REVOKED" | "EXPIRED" } - Endpoint accessible cluster-internally ONLY (no Kong route; bind on internal interface)
- Response cached in Redis
auth:key:{hash}TTL 60s - mTLS client certificate required (Kong service account)
- P95 response time ≤ 5 ms (Redis cache hit path)
US-AUTH-013 · API key lookup Redis caching
Type: Feature | Points: 5
Description:
As the lookup endpoint, I need Redis caching for key lookups so that Kong's high-frequency validation calls don't overwhelm the auth-service database.
Acceptance Criteria:
- Cache key:
auth:key:{sha256hash}, TTL 60s, value:{ accountId, tenantId, roles, status } - Cache miss: query PG, populate cache
- Cache invalidated immediately on key revocation:
DEL auth:key:{hash} -
auth_key_cache_hit_totalandauth_key_cache_miss_totalcounters tracked - Cache hit path P95 ≤ 2 ms
US-AUTH-014 · API key expiry enforcement
Type: Feature | Points: 5
Description:
As the platform, I need API keys with expiresAt set to be rejected after their expiry time so that time-limited keys provide bounded access windows.
Acceptance Criteria:
- Lookup endpoint checks
expiresAt < now()→ returns403 { reason: "EXPIRED" } - Background job runs daily: updates
status = EXPIREDfor keys whereexpiresAt < now() AND status = ACTIVE - Expired keys not returned in active key listing
-
auth_api_key_expired_totalcounter incremented by daily job
US-AUTH-015 · Kong ghasi-api-key-lookup plugin
Type: Feature | Points: 8
Description:
As Kong, I need a custom Lua plugin that extracts the X-API-Key header, hashes it, calls auth-service lookup, and injects tenant context into upstream headers so that API key authentication works at the edge.
Acceptance Criteria:
- Plugin reads
X-API-Keyrequest header; returns401if absent - Computes
sha256(value)→ callshttp://auth-service:3002/v1/api-keys/lookup?hash={hash} - On 200: injects
X-Tenant-Id,X-Account-Id,X-Rolesheaders into upstream request - On 404/403: returns
401 Unauthorizedto client - Plugin timeout: 100ms; on timeout returns
503 Service UnavailablewithRetry-After: 1 - Plugin deployed to Kong as custom plugin; declarative config in
services/api-gateway/kong/plugins/
EP-AUTH-03 · User & Account Management
US-AUTH-020 · User registration and Firebase federation
Type: Feature | Points: 8
Description:
As a new user, I need to register via email/password with Firebase Auth so that I get a Ghasi platform account linked to my Firebase identity.
Acceptance Criteria:
-
POST /v1/auth/registeraccepts{ email, password, displayName, organizationName } - Creates Firebase user via Firebase Admin SDK; sets
customercustom claim - Creates
auth.accountsandauth.usersrows in a PG transaction - Publishes
auth.events: { type: "user.registered", userId, tenantId, email }to NATS - Password stored in Firebase only (never in auth-service DB)
- Returns
201 Created { userId, accountId, email, displayName } - Duplicate email →
409 Conflict
US-AUTH-021 · Login and token issuance
Type: Feature | Points: 5
Description:
As a registered user, I need to login with email/password and receive JWT + refresh token so that I can authenticate subsequent requests.
Acceptance Criteria:
-
POST /v1/auth/loginaccepts{ email, password }, validates via Firebase Admin SDK - Returns
{ accessToken (JWT RS256, 15m), refreshToken (opaque, 30d), expiresIn: 900 } - Refresh token stored as
argon2idhash inauth.refresh_tokens - Invalid credentials →
401 { code: "INVALID_CREDENTIALS" } - Brute force protection: 5 failed attempts per IP in 15m →
429 Too Many Requests
US-AUTH-022 · Token refresh
Type: Feature | Points: 3
Description:
As an authenticated client, I need to refresh my access token using a refresh token so that sessions persist beyond the 15-minute access token TTL.
Acceptance Criteria:
-
POST /v1/auth/refreshaccepts{ refreshToken }in body - Validates refresh token hash in
auth.refresh_tokens; checksexpiresAtandrevokedAt - Issues new access token (15m) and rotates refresh token (30d); old refresh token invalidated
- Returns same shape as login response
- Used/revoked refresh token →
401 { code: "INVALID_REFRESH_TOKEN" }
US-AUTH-023 · Account and user profile management
Type: Feature | Points: 5
Description:
As an authenticated user, I need to view and update my profile and account details so that I can manage my Ghasi account.
Acceptance Criteria:
-
GET /v1/users/mereturns{ userId, accountId, email, displayName, roles, createdAt } -
PUT /v1/users/meaccepts{ displayName }(email change requires re-auth) -
GET /v1/accounts/mereturns{ accountId, organizationName, tier, status } - All endpoints require valid JWT Bearer token (validated by Kong)
- Tenant isolation:
X-Tenant-Idheader injected by Kong; service verifiesuserIdbelongs totenantId
US-AUTH-024 · RBAC role assignment
Type: Feature | Points: 5
Description:
As a platform admin, I need to assign roles to users so that access control is enforced throughout the platform.
Acceptance Criteria:
-
POST /v1/admin/users/:id/rolesaccepts{ roles: string[] }(admin-only route) - Valid roles:
customer,admin,operator-admin,billing-admin - Role assignment updates Firebase custom claims via Admin SDK and
auth.user_rolestable atomically - Role changes take effect on next token refresh (access token carries stale claims for up to 15m)
- Audit log entry created in
auth.audit_logfor every role change
US-AUTH-025 · Account status management (suspend/activate)
Type: Feature | Points: 2
Description:
As a platform admin, I need to suspend and reactivate accounts so that compromised or non-paying accounts can be blocked.
Acceptance Criteria:
-
PUT /v1/admin/accounts/:id/statusaccepts{ status: "ACTIVE" | "SUSPENDED" } - Suspended accounts: API key lookup returns
403 { reason: "ACCOUNT_SUSPENDED" }; JWT validated but403returned by auth-service middleware - Reactivation restores normal access immediately
- Status change event published to
auth.events
EP-AUTH-04 · Session Management & Token Refresh
US-AUTH-030 · Logout and session revocation
Type: Feature | Points: 3
Description:
As an authenticated user, I need a logout endpoint that revokes my current session so that access cannot be continued after sign-out.
Acceptance Criteria:
-
POST /v1/auth/logoutrequires valid JWT; revokes the associated refresh token - Refresh token
revokedAtset tonow() - Firebase
revokeRefreshTokens(userId)called to invalidate all Firebase sessions - Returns
204 No Content - Subsequent refresh attempts with revoked token →
401
US-AUTH-031 · Refresh token rotation
Type: Feature | Points: 3
Description:
As the security model, I need refresh token rotation so that a stolen refresh token has a bounded exploit window.
Acceptance Criteria:
- Each
/v1/auth/refreshcall issues a new refresh token and invalidates the previous one - Old refresh token
revokedAtset; new token linked to samesessionId - Refresh token reuse detection: using an already-rotated token invalidates the entire session (
auth.refresh_tokensfor sessionId all revoked) -
auth_refresh_token_reuse_totalcounter incremented on reuse detection
US-AUTH-032 · Refresh token storage with argon2id
Type: Feature | Points: 3
Description:
As the security model, I need refresh tokens stored as argon2id hashes so that a database compromise does not expose usable tokens.
Acceptance Criteria:
- Refresh token stored as
argon2idhash:m=65536 (64MB), t=3, p=4 - Raw token (32 bytes base62) returned to client exactly once; never stored or logged
- Hash verification on each refresh:
argon2.verify(storedHash, incomingToken) - Unit test: hash → verify round trip passes; tampered token fails
US-AUTH-033 · Session listing and revocation (all devices)
Type: Feature | Points: 5
Description:
As a user, I need to list all active sessions and revoke all sessions so that I can respond to a suspected account compromise.
Acceptance Criteria:
-
GET /v1/auth/sessionsreturns[{ sessionId, createdAt, lastUsedAt, userAgent, ipAddress }] -
DELETE /v1/auth/sessionsrevokes all refresh tokens for the user -
DELETE /v1/auth/sessions/:sessionIdrevokes a single session - Firebase
revokeRefreshTokenscalled when all sessions revoked
US-AUTH-034 · Brute force protection
Type: Feature | Points: 6
Description:
As the security model, I need brute force rate limiting on login and token refresh endpoints so that credential stuffing attacks are mitigated.
Acceptance Criteria:
- Login: 5 failed attempts per IP per 15 minutes →
429withRetry-Afterheader - Login: 10 failed attempts per email per 15 minutes →
429 - Counters stored in Redis:
auth:ratelimit:login:ip:{ip}andauth:ratelimit:login:email:{email} - TTL = 900s on both counters
- Successful login resets counters
-
auth_login_rate_limited_totalcounter incremented on block
EP-AUTH-05 · Security Hardening & Observability
US-AUTH-040 · Password hashing with argon2id
Type: Feature | Points: 3
Description:
As the security model, I need passwords hashed with argon2id before sending to Firebase (or storing locally if Firebase offline) so that password exposure risk is minimised.
Acceptance Criteria:
- argon2id parameters:
m=65536, t=3, p=4(OWASP recommended minimum) - Password never stored in auth-service DB (Firebase is authoritative)
- Hash parameters stored as part of Firebase password hash configuration
- Unit test verifies argon2id hash/verify round-trip
US-AUTH-041 · Audit logging for security events
Type: Feature | Points: 5
Description:
As the compliance team, I need all security-relevant events recorded in an immutable audit log so that breach investigations have a reliable event trail.
Acceptance Criteria:
- Events logged: login success/failure, logout, token refresh, API key create/revoke, role change, account suspend/activate, JWKS rotation
- Each entry:
{ id, userId, accountId, action, outcome, ipAddress, userAgent, timestamp, metadata } - Stored in
auth.audit_log; no UPDATE or DELETE on audit rows (append-only enforced via PG trigger) - Audit log queryable by admin via
GET /v1/admin/audit-logwith pagination
US-AUTH-042 · Prometheus metrics for auth-service
Type: Feature | Points: 3
Description:
As Prometheus, I need /metrics from auth-service so that authentication health is monitored.
Acceptance Criteria:
- Metrics:
auth_login_total{outcome},auth_token_refresh_total,auth_api_key_lookup_total{result},auth_key_cache_hit_total,auth_key_cache_miss_total,auth_login_rate_limited_total - Histogram:
auth_api_key_lookup_duration_seconds -
/metricsendpoint not behind Kong (cluster-internal only)
US-AUTH-043 · Health and readiness endpoints
Type: Feature | Points: 2
Description:
As Kubernetes, I need health and readiness endpoints for auth-service so that pod lifecycle is managed correctly.
Acceptance Criteria:
-
GET /health/live→ 200 always if process running -
GET /health/ready→ 200 only if PG, Redis, Vault, Firebase Admin SDK all reachable -
GET /health/ready→ 503 with dependency map if any are down
US-AUTH-044 · Kubernetes deployment manifest with secret management
Type: DevOps | Points: 5
Description:
As the platform, I need auth-service deployed with secrets injected via Vault Agent sidecar so that no credentials are stored in Kubernetes ConfigMaps or env files.
Acceptance Criteria:
- Vault Agent sidecar annotation on Deployment pod template
- Firebase Admin SDK service account JSON injected at
/vault/secrets/firebase-admin.json - Vault token for Vault transit/PKI injected via Kubernetes ServiceAccount
- PostgreSQL credentials injected from
secret/auth/db - Redis password injected from
secret/auth/redis - HPA: min 2 replicas, scale on CPU > 70% or
auth_api_key_lookup_totalrate
EP-AUTH-06 · Tenant Sub-Org / Reseller Hierarchy + Cross-Tenant Token Revocation Propagation
Context: Enterprise tenants (telcos, agencies, banks) need to model sub-organisations under a single legal entity. The platform must enforce hierarchy in RBAC, scope token issuance to the right org, and propagate revocation across all sub-orgs of a parent.
US-AUTH-050 · Sub-org data model and parent-child constraint
Type: Feature | Points: 5
Description:
As the platform, I need a auth.tenant_orgs table representing the legal-entity → sub-org tree so that one customer can model multiple business units without operating multiple Ghasi accounts.
Acceptance Criteria:
- Table
auth.tenant_orgs(id, tenantId, parentId NULLABLE, name, kind ENUM(LEGAL_ENTITY,BUSINESS_UNIT,AGENCY_CLIENT), createdAt). - Constraint: max depth 3 (legal entity → business unit → agency client).
- FK on
auth.users.orgId; backfill defaults toparent = tenant.rootOrg. - Cycle prevention: trigger rejects parent set to a descendant.
US-AUTH-051 · Org-scoped JWT claim
Type: Feature | Points: 3
Description:
The platform JWT must carry orgId so downstream services can scope queries by org without an extra DB lookup.
Acceptance Criteria:
- JWT payload includes
tenantId,orgId,orgPath: ["legalId","buId","clientId"]. - Downstream services use
orgPathfor ancestor-based authorisation. - OpenAPI spec updated; contract test passes.
US-AUTH-052 · Cross-org admin role with explicit scope
Type: Feature | Points: 5
Description: A legal-entity admin must be able to administer descendant orgs but never sibling orgs.
Acceptance Criteria:
- Role
org.adminis scoped to a specificorgIdand applies to that org plus all descendants. - Role assignment endpoint validates the assigner is admin of an ancestor org.
- Negative tests: sibling-org admin cannot read another sibling's data (RLS verified).
US-AUTH-053 · Token revocation propagation across descendants
Type: Feature | Points: 5
Description: When an org is suspended, every active session and API key for that org and its descendants must be revoked within 60 s.
Acceptance Criteria:
-
POST /v1/admin/orgs/:orgId/suspendtriggers cascade revocation. - All matching
auth.refresh_tokensflaggedrevoked_at = now(). - All matching
auth.api_keys.status = REVOKED. - Redis cache
auth:rbac:{userId}invalidated. -
auth.org.suspended.v1NATS event published with descendant org IDs. - Integration test: suspend root org → verify child-org user gets 401 within 60 s on next call.
US-AUTH-054 · Per-org API key quota and rate limits
Type: Feature | Points: 3
Description: Quotas (max API keys, max sender IDs, max RPS) must be inherited from parent org but overridable downward (parent ≥ child).
Acceptance Criteria:
- Quota table
auth.org_quotaswith hierarchical resolution (child quota ≤ parent). - Quota check on API-key creation; 422 with
code: "QUOTA_EXCEEDED"on breach. - Telemetry:
auth_quota_breach_total{resource,orgId}counter.
US-AUTH-055 · Org-scoped audit log queries
Type: Feature | Points: 3
Description: Auditors of an ancestor org must be able to query audit events of all descendants but never of siblings.
Acceptance Criteria:
-
GET /v1/audit?orgId=:idreturns rows for:idand descendants only. - RLS policy enforces ancestor-or-self.
- Negative test: non-ancestor cannot read.
US-AUTH-056 · Org transfer and re-parenting workflow
Type: Feature | Points: 5
Description: Platform admins must be able to move an agency-client org from one business-unit parent to another (e.g., when a customer restructures).
Acceptance Criteria:
-
POST /v1/admin/orgs/:id/reparentaccepts{ newParentId }; validates depth and cycle constraints. - Existing tokens for the moved org are re-issued with the new
orgPathon next refresh. -
auth.org.reparented.v1event includes old and new path. - Audit entry mandatory; reason field required.
EP-AUTH-07 · HSM-Backed JWT Signing (replaces Vault-only key handling)
Context: Per
EP-PLAT-NB-04(HSM-backed key custody), the platform JWT signing root must move from Vault-managed RSA keys to a PKCS#11 HSM (FIPS 140-2 L3). This epic replaces the key generation/storage stories inEP-AUTH-01(US-AUTH-002) for production environments while keeping local-dev fallback.
US-AUTH-060 · PKCS#11 client and HSM connection pool
Type: Feature | Points: 5
Description: As auth-service, I need a PKCS#11 client integrated so that JWT signing operations call into the HSM rather than holding private key material in process memory.
Acceptance Criteria:
- PKCS#11 library wrapper (
@ghasi/hsm-client) loaded; supports Thales nShield, Entrust, and SoftHSM2 (for local-dev). - Connection pool of 4 sessions per pod; health-checked every 10 s.
-
HSM_PROVIDERenv var selects backend; defaultsofthsm2for local-dev,nshieldfor prod. - Sign latency P99 ≤ 5 ms when HSM is colocated; P99 ≤ 25 ms across-region.
US-AUTH-061 · Migrate existing JWKS keys into HSM
Type: Migration | Points: 5
Description: Existing Vault-stored RSA keys must be migrated into the HSM partition without invalidating active sessions.
Acceptance Criteria:
- Migration script generates new HSM-resident key pair with new
kid; adds to JWKS asACTIVEalongside the existing Vault-stored key (nowRETIRING). - Both keys served via JWKS during the rotation grace window (30 min default).
- After grace period, Vault key set to
RETIREDand removed from JWKS. - Vault
secret/auth/jwks/{kid}/privatekeys are wiped and audit-logged. - Rollback plan: keep Vault key
ACTIVEuntil HSM key has produced ≥ 10 000 successful verifications.
US-AUTH-062 · HSM-aware key rotation cron
Type: Feature | Points: 5
Description: Key rotation must continue to work end-to-end with HSM as the issuer.
Acceptance Criteria:
-
POST /v1/internal/auth/rotate-jwksgenerates a new HSM-resident key pair (PKCS#11C_GenerateKeyPair). - New key added as
ACTIVE; previous key markedRETIRINGfor 30 min. - Cron
0 0 1 * *(monthly) auto-rotates with PagerDuty notification. - Metric
auth_hsm_rotation_totalandauth_hsm_rotation_duration_seconds. - Alert: rotation failure →
AuthHsmRotationFailed(Critical).
US-AUTH-063 · HSM unavailability fail-fast and circuit breaker
Type: Feature | Points: 3
Description: HSM unavailability must fail fast (no silent fallback to in-process keys) so that a hardware fault is loud and visible.
Acceptance Criteria:
- HSM error during sign → 503 with
code: "HSM_UNAVAILABLE"; no in-memory key fallback ever. - Circuit breaker opens after 3 consecutive HSM errors; half-open after 30 s.
- When circuit open,
/health/readyreturns 503 (pod removed from Service endpoints). - Alert
AuthHsmUnavailable(Critical) with runbook link.
EP-AUTH-08 · Break-Glass Admin Access + WebAuthn for Platform Staff
Context: Per [13-security-compliance-tenancy.md §2.2], break-glass platform-admin accounts bypass tenant IdPs and must use hardware WebAuthn. This epic implements that capability end-to-end with audit and dual-control.
US-AUTH-070 · Break-glass account model and WebAuthn enrolment
Type: Feature | Points: 5
Description: Platform staff with break-glass authority must enrol at least one FIDO2 hardware authenticator (YubiKey or equivalent).
Acceptance Criteria:
-
auth.break_glass_userstable (userId, role, status ENUM(ACTIVE,SUSPENDED,RETIRED)). - WebAuthn enrolment ceremony at
/v1/admin/break-glass/enroll; minimum 2 authenticators per user. - Authenticators stored as
auth.webauthn_credentials(credentialId, publicKey, signCount, createdAt). - Enrolment requires existing platform admin approval (dual-control).
US-AUTH-071 · Break-glass login flow
Type: Feature | Points: 5
Description: Break-glass login must bypass all tenant IdPs and require WebAuthn assertion + a dual-approver acknowledgement.
Acceptance Criteria:
-
POST /v1/admin/break-glass/login/init→ returns WebAuthn challenge. -
POST /v1/admin/break-glass/login/finish→ verifies assertion, then notifies a dual-approver (PagerDuty incident + Slack). - Dual-approver clicks
/v1/admin/break-glass/approve/:requestId; access is granted only after approval (< 5 min window) AND a justification is supplied. - Issued JWT TTL is 15 min, non-refreshable, scope
platform.break_glass. - Audit event
auth.break_glass.granted.v1includes initiator, approver, justification, source IP.
US-AUTH-072 · Break-glass session monitoring
Type: Feature | Points: 3
Description: Every API call made under a break-glass token must be logged in real time to a SIEM-forwarded stream.
Acceptance Criteria:
- All break-glass requests carry
X-BreakGlass: trueand are mirrored to NATS subjectauth.break_glass.activity.v1. -
regulator-portal-serviceSIEM forwarder (perEP-REG-02) ingests this stream. - Real-time NOC banner shown when any break-glass session is active.
US-AUTH-073 · Break-glass quarterly access review
Type: Feature | Points: 3
Description: Break-glass authority must be reviewed quarterly; users not reviewed are auto-suspended.
Acceptance Criteria:
- Cron
0 0 1 */3 *listsauth.break_glass_usersand emits review tickets. - Users not re-attested within 30 d → status
SUSPENDED. - Re-attestation requires CISO + CTO sign-off, recorded in audit.