iam-service — Risk Register
Catalog · SECURITY_MODEL · FAILURE_MODES · SERVICE_READINESS
Tracks the strategic risks for iam-service: the things that could materially harm Melmastoon if they happen, distinct from operational failure modes (covered in FAILURE_MODES). Reviewed quarterly with security + leadership.
1. Risk Scoring
| Likelihood | Description |
|---|---|
| Rare | < 5 % per year |
| Unlikely | 5–25 % per year |
| Possible | 25–50 % per year |
| Likely | 50–75 % per year |
| Almost certain | > 75 % per year |
| Impact | Description |
|---|---|
| Low | Minor degradation, no customer comms |
| Moderate | Single-tenant or short outage; customer comms required |
| High | Multi-tenant outage > 1 h, security event with limited blast radius |
| Critical | Platform-wide outage, breach, or compliance incident |
| Catastrophic | Existential — fundamental loss of trust, regulator action |
Inherent risk = Likelihood × Impact pre-mitigation. Residual risk = post-mitigation. Acceptance = "Acceptable / Watch / Reduce / Transfer / Avoid".
2. Risk Register
R-IAM-01 — KMS regional outage
| Field | Value |
|---|---|
| Description | Cloud KMS in region becomes unavailable; iam cannot sign new JWTs. |
| Inherent | Unlikely × Critical |
| Mitigation | Cross-region KMS replica (M2); circuit breaker; runbook; existing access tokens valid for ≤ 15 min; status page + comms playbook |
| Residual | Rare × High |
| Owner | SRE |
| Review | Quarterly |
| Acceptance | Watch |
| Linked runbook | runbooks/iam/kms-outage.md |
R-IAM-02 — Refresh-token theft from compromised device
| Field | Value |
|---|---|
| Description | Malware / XSS exfiltrates refresh token from a user device. |
| Inherent | Likely × High |
| Mitigation | Rotating refresh + reuse-detection family revoke; short access TTL (15 min); device binding for desktop; adaptive MFA on suspicious refresh; CSP + HTTPS-only cookies (browser); user "active sessions" UI to self-revoke |
| Residual | Possible × Moderate |
| Owner | Security + iam team |
| Review | Quarterly |
| Acceptance | Watch |
| Linked runbook | runbooks/iam/token-theft.md |
R-IAM-03 — SSO IdP single-point-of-failure for chain customers
| Field | Value |
|---|---|
| Description | Large chain customer mandates SSO; their IdP outage locks out all staff. |
| Inherent | Possible × High |
| Mitigation | Tenant policy can permit emergency password / magic-link fallback; clear SLA contract terms; document risk in onboarding; "break-glass" platform-admin path |
| Residual | Possible × Moderate |
| Owner | iam team + Customer Success |
| Review | Quarterly |
| Acceptance | Watch |
| Linked runbook | runbooks/iam/sso-outage.md |
R-IAM-04 — Breach-list provider lock-in (HIBP)
| Field | Value |
|---|---|
| Description | HIBP API price/availability change; we can't easily swap. |
| Inherent | Unlikely × Moderate |
| Mitigation | Provider abstracted behind BreachList port; alternate providers prototyped; fail-open mode preserves availability |
| Residual | Rare × Low |
| Owner | iam team |
| Review | Annually |
| Acceptance | Acceptable |
R-IAM-05 — Regulatory MFA mandates per jurisdiction
| Field | Value |
|---|---|
| Description | New regulation (e.g. EU NIS2, KSA NCA) requires hardware MFA for certain roles. |
| Inherent | Possible × Moderate |
| Mitigation | Tenant policy framework already supports per-role MFA mandate; WebAuthn (FIDO2) supported; quarterly compliance scan |
| Residual | Possible × Low |
| Owner | Compliance + iam team |
| Review | Quarterly |
| Acceptance | Watch |
R-IAM-06 — JWT signing key compromise
| Field | Value |
|---|---|
| Description | A signing key is somehow extracted (impossible if KMS HSM holds correctly, but assume residual). |
| Inherent | Rare × Catastrophic |
| Mitigation | KMS HSM (non-extractable); strict IAM; rotation cadence; emergency rotation runbook; mandatory kid rotation on suspicion; mass session revoke procedure |
| Residual | Rare × High |
| Owner | Security |
| Review | Quarterly |
| Acceptance | Reduce (continuously) |
| Linked runbook | runbooks/iam/jwt-emergency-rotation.md |
R-IAM-07 — Tenant CA compromise
| Field | Value |
|---|---|
| Description | Tenant intermediate CA private key misused → unauthorized device certs. |
| Inherent | Rare × Critical |
| Mitigation | KMS-held; least-privilege issuer service identity; audited signing; per-tenant blast radius (one tenant only); rapid CA rotation runbook |
| Residual | Rare × Moderate |
| Owner | Security |
| Review | Quarterly |
| Acceptance | Watch |
R-IAM-08 — Adaptive MFA AI bias
| Field | Value |
|---|---|
| Description | AI-suggested MFA escalations (or locks) systematically affect a region/persona unfairly. |
| Inherent | Possible × Moderate |
| Mitigation | AI can only raise the bar (never lower); HITL on locks; quarterly fairness review per AI_INTEGRATION §11; appeal path for users; provenance fully logged |
| Residual | Unlikely × Low |
| Owner | iam team + AI platform |
| Review | Quarterly |
| Acceptance | Watch |
R-IAM-09 — Lockout DoS at scale
| Field | Value |
|---|---|
| Description | Coordinated attack triggers mass lockouts on real user accounts. |
| Inherent | Possible × Moderate |
| Mitigation | IP-scoped lockouts when IP reputation unknown; magic-link self-recovery; admin-lift; tenant policy auto-unlock after 15 min |
| Residual | Possible × Low |
| Owner | iam team |
| Review | Annually |
| Acceptance | Watch |
R-IAM-10 — Device-cert mass expiry
| Field | Value |
|---|---|
| Description | Many offline desktops lose ability to refresh simultaneously (e.g. CA rotation without overlap). |
| Inherent | Possible × Moderate |
| Mitigation | CA rotation always overlapped; T-24h client-side renewal; mass-renew batch tool; in-app warning at T-72h |
| Residual | Unlikely × Low |
| Owner | iam team |
| Review | Annually |
| Acceptance | Acceptable |
| Linked runbook | runbooks/iam/device-cert-expiry.md |
R-IAM-11 — GDPR erasure incomplete
| Field | Value |
|---|---|
| Description | iam misses identity rows during erasure; regulator finds residual data. |
| Inherent | Unlikely × High |
| Mitigation | Saga participation tested end-to-end; reconciliation job compares emitted-erasure vs persisted-state; idempotent retries; audit log preserved as legal-hold (Art 17(3)(b)) |
| Residual | Rare × Moderate |
| Owner | Compliance + iam team |
| Review | Quarterly |
| Acceptance | Watch |
R-IAM-12 — Vendor lock-in (GCP-wide)
| Field | Value |
|---|---|
| Description | KMS, Cloud SQL, Pub/Sub, Cloud Run all GCP-specific; abstracted poorly → migration cost prohibitive. |
| Inherent | Possible × Moderate |
| Mitigation | Ports-and-adapters keeps domain pure; database-engine choices (Postgres / Redis) are open standards; Pub/Sub abstracted via EventPublisher port; KMS via TokenSigner port; multi-cloud not in roadmap but possible |
| Residual | Possible × Low |
| Owner | Architecture |
| Review | Annually |
| Acceptance | Acceptable |
R-IAM-13 — Data residency violation
| Field | Value |
|---|---|
| Description | iam writes data to wrong region for a residency-flagged tenant. |
| Inherent | Unlikely × High |
| Mitigation | Tenant residency in tenant.created.v1; iam region-routes at write; runtime guard: any write outside tenant's region throws; quarterly audit |
| Residual | Rare × Moderate |
| Owner | iam team + Compliance |
| Review | Quarterly |
| Acceptance | Watch |
R-IAM-14 — Operational toil from SSO onboarding per chain customer
| Field | Value |
|---|---|
| Description | Each enterprise SSO setup takes too much manual config; team can't scale. |
| Inherent | Likely × Moderate |
| Mitigation | Self-serve OIDC + SAML configuration via tenant admin UI; metadata URL auto-refresh; templated onboarding doc; Customer Success training |
| Residual | Possible × Low |
| Owner | iam team + Customer Success |
| Review | Annually |
| Acceptance | Acceptable |
R-IAM-15 — Talent / on-call concentration
| Field | Value |
|---|---|
| Description | Few engineers know iam in depth → on-call burnout, knowledge silo. |
| Inherent | Possible × Moderate |
| Mitigation | This bundle (17 docs); recorded incident reviews; runbook completeness gate; quarterly fire drills; rotation across services |
| Residual | Unlikely × Low |
| Owner | EM |
| Review | Annually |
| Acceptance | Acceptable |
R-IAM-16 — Argon2id parameter obsolescence
| Field | Value |
|---|---|
| Description | Hardware advances make current params insufficient; existing hashes weakened. |
| Inherent | Possible × Moderate |
| Mitigation | hash_version field; rehash-on-login; annual review against OWASP guidance; offline rehash-job for inactive users on parameter bump |
| Residual | Unlikely × Low |
| Owner | Security + iam team |
| Review | Annually |
| Acceptance | Acceptable |
R-IAM-17 — Pub/Sub event loss → downstream divergence
| Field | Value |
|---|---|
| Description | An iam event is lost; audit-service / gdpr-service / tenant-service drift from iam reality. |
| Inherent | Rare × High |
| Mitigation | Transactional outbox; Pub/Sub at-least-once; retention 7 d; DLQ; daily reconciliation job between audit_events and outbox row count |
| Residual | Rare × Low |
| Owner | iam team |
| Review | Quarterly |
| Acceptance | Watch |
R-IAM-18 — Backward-compat break in JWT claims
| Field | Value |
|---|---|
| Description | We change a claim shape; consumers break. |
| Inherent | Possible × High |
| Mitigation | Claim contract documented in API_CONTRACTS; only additive changes inside v1; breaking changes require vN+1 rollout per MIGRATION_PLAN §4; consumer Pact tests block |
| Residual | Unlikely × Moderate |
| Owner | iam team + every consumer team |
| Review | Per change |
| Acceptance | Watch |
R-IAM-19 — Insider threat
| Field | Value |
|---|---|
| Description | Engineer with elevated access misuses iam admin endpoints. |
| Inherent | Rare × Critical |
| Mitigation | Least-privilege IAM; admin actions emit audit events with actor; quarterly access review; production access requires JIT approval; KMS HSM means no individual can extract keys |
| Residual | Rare × High |
| Owner | Security |
| Review | Quarterly |
| Acceptance | Watch |
R-IAM-20 — Cost runaway from AI risk classification
| Field | Value |
|---|---|
| Description | Login surge multiplies AI orchestrator calls → unexpected bill. |
| Inherent | Possible × Moderate |
| Mitigation | 60-s cache by (userId, ipMasked); budget controls; per-tenant cost dashboard; circuit breaker on AI cost per minute; rules-only fallback always available |
| Residual | Unlikely × Low |
| Owner | iam team + Finance ops |
| Review | Quarterly |
| Acceptance | Acceptable |
3. Risk Heatmap (residual)
Likelihood →
Rare Unlikely Possible Likely Almost-cert
Critical R-06
High R-01,R-19 R-02,R-03
Moderate R-07,R-11,R-13 R-09,R-14,R-17 R-18
Low R-04,R-08,R-10,R-12,R-15,R-16,R-20 R-05
Any cell crossing the High / Possible quadrant requires explicit acceptance by Security + EM at quarterly review.
4. Top-3 Watch List (this quarter)
- R-IAM-02 (refresh token theft) — push WebAuthn adoption for staff in Q-current.
- R-IAM-03 (SSO IdP SPOF for chains) — formalize break-glass path; document tenant SLAs.
- R-IAM-18 (JWT claim breaking change) — automate consumer Pact verification; add
Sunsetheader process.
5. Risk Treatment Workflow
- New risk identified → opened in
risks/iam/<id>.mdwith template. - Triage in next iam team standup; severity assigned.
- If High+ → 7-day mitigation plan due; tracked in Linear.
- Monthly review: residual rating updated.
- Quarterly review: register published; leadership signs off acceptance posture.
- Closed risks remain in register marked CLOSED with date.
6. Cross-References
- Operational failures → FAILURE_MODES
- Crypto + threat model → SECURITY_MODEL
- Readiness gates → SERVICE_READINESS
- Migration / change risk → MIGRATION_PLAN
- Platform risk register →
../../docs/strategic/RISK_REGISTER.md(if present)