Skip to main content

iam-service — Risk Register

Catalog · SECURITY_MODEL · FAILURE_MODES · SERVICE_READINESS

Tracks the strategic risks for iam-service: the things that could materially harm Melmastoon if they happen, distinct from operational failure modes (covered in FAILURE_MODES). Reviewed quarterly with security + leadership.

1. Risk Scoring

LikelihoodDescription
Rare< 5 % per year
Unlikely5–25 % per year
Possible25–50 % per year
Likely50–75 % per year
Almost certain> 75 % per year
ImpactDescription
LowMinor degradation, no customer comms
ModerateSingle-tenant or short outage; customer comms required
HighMulti-tenant outage > 1 h, security event with limited blast radius
CriticalPlatform-wide outage, breach, or compliance incident
CatastrophicExistential — fundamental loss of trust, regulator action

Inherent risk = Likelihood × Impact pre-mitigation. Residual risk = post-mitigation. Acceptance = "Acceptable / Watch / Reduce / Transfer / Avoid".

2. Risk Register

R-IAM-01 — KMS regional outage

FieldValue
DescriptionCloud KMS in region becomes unavailable; iam cannot sign new JWTs.
InherentUnlikely × Critical
MitigationCross-region KMS replica (M2); circuit breaker; runbook; existing access tokens valid for ≤ 15 min; status page + comms playbook
ResidualRare × High
OwnerSRE
ReviewQuarterly
AcceptanceWatch
Linked runbookrunbooks/iam/kms-outage.md

R-IAM-02 — Refresh-token theft from compromised device

FieldValue
DescriptionMalware / XSS exfiltrates refresh token from a user device.
InherentLikely × High
MitigationRotating refresh + reuse-detection family revoke; short access TTL (15 min); device binding for desktop; adaptive MFA on suspicious refresh; CSP + HTTPS-only cookies (browser); user "active sessions" UI to self-revoke
ResidualPossible × Moderate
OwnerSecurity + iam team
ReviewQuarterly
AcceptanceWatch
Linked runbookrunbooks/iam/token-theft.md

R-IAM-03 — SSO IdP single-point-of-failure for chain customers

FieldValue
DescriptionLarge chain customer mandates SSO; their IdP outage locks out all staff.
InherentPossible × High
MitigationTenant policy can permit emergency password / magic-link fallback; clear SLA contract terms; document risk in onboarding; "break-glass" platform-admin path
ResidualPossible × Moderate
Owneriam team + Customer Success
ReviewQuarterly
AcceptanceWatch
Linked runbookrunbooks/iam/sso-outage.md

R-IAM-04 — Breach-list provider lock-in (HIBP)

FieldValue
DescriptionHIBP API price/availability change; we can't easily swap.
InherentUnlikely × Moderate
MitigationProvider abstracted behind BreachList port; alternate providers prototyped; fail-open mode preserves availability
ResidualRare × Low
Owneriam team
ReviewAnnually
AcceptanceAcceptable

R-IAM-05 — Regulatory MFA mandates per jurisdiction

FieldValue
DescriptionNew regulation (e.g. EU NIS2, KSA NCA) requires hardware MFA for certain roles.
InherentPossible × Moderate
MitigationTenant policy framework already supports per-role MFA mandate; WebAuthn (FIDO2) supported; quarterly compliance scan
ResidualPossible × Low
OwnerCompliance + iam team
ReviewQuarterly
AcceptanceWatch

R-IAM-06 — JWT signing key compromise

FieldValue
DescriptionA signing key is somehow extracted (impossible if KMS HSM holds correctly, but assume residual).
InherentRare × Catastrophic
MitigationKMS HSM (non-extractable); strict IAM; rotation cadence; emergency rotation runbook; mandatory kid rotation on suspicion; mass session revoke procedure
ResidualRare × High
OwnerSecurity
ReviewQuarterly
AcceptanceReduce (continuously)
Linked runbookrunbooks/iam/jwt-emergency-rotation.md

R-IAM-07 — Tenant CA compromise

FieldValue
DescriptionTenant intermediate CA private key misused → unauthorized device certs.
InherentRare × Critical
MitigationKMS-held; least-privilege issuer service identity; audited signing; per-tenant blast radius (one tenant only); rapid CA rotation runbook
ResidualRare × Moderate
OwnerSecurity
ReviewQuarterly
AcceptanceWatch

R-IAM-08 — Adaptive MFA AI bias

FieldValue
DescriptionAI-suggested MFA escalations (or locks) systematically affect a region/persona unfairly.
InherentPossible × Moderate
MitigationAI can only raise the bar (never lower); HITL on locks; quarterly fairness review per AI_INTEGRATION §11; appeal path for users; provenance fully logged
ResidualUnlikely × Low
Owneriam team + AI platform
ReviewQuarterly
AcceptanceWatch

R-IAM-09 — Lockout DoS at scale

FieldValue
DescriptionCoordinated attack triggers mass lockouts on real user accounts.
InherentPossible × Moderate
MitigationIP-scoped lockouts when IP reputation unknown; magic-link self-recovery; admin-lift; tenant policy auto-unlock after 15 min
ResidualPossible × Low
Owneriam team
ReviewAnnually
AcceptanceWatch

R-IAM-10 — Device-cert mass expiry

FieldValue
DescriptionMany offline desktops lose ability to refresh simultaneously (e.g. CA rotation without overlap).
InherentPossible × Moderate
MitigationCA rotation always overlapped; T-24h client-side renewal; mass-renew batch tool; in-app warning at T-72h
ResidualUnlikely × Low
Owneriam team
ReviewAnnually
AcceptanceAcceptable
Linked runbookrunbooks/iam/device-cert-expiry.md

R-IAM-11 — GDPR erasure incomplete

FieldValue
Descriptioniam misses identity rows during erasure; regulator finds residual data.
InherentUnlikely × High
MitigationSaga participation tested end-to-end; reconciliation job compares emitted-erasure vs persisted-state; idempotent retries; audit log preserved as legal-hold (Art 17(3)(b))
ResidualRare × Moderate
OwnerCompliance + iam team
ReviewQuarterly
AcceptanceWatch

R-IAM-12 — Vendor lock-in (GCP-wide)

FieldValue
DescriptionKMS, Cloud SQL, Pub/Sub, Cloud Run all GCP-specific; abstracted poorly → migration cost prohibitive.
InherentPossible × Moderate
MitigationPorts-and-adapters keeps domain pure; database-engine choices (Postgres / Redis) are open standards; Pub/Sub abstracted via EventPublisher port; KMS via TokenSigner port; multi-cloud not in roadmap but possible
ResidualPossible × Low
OwnerArchitecture
ReviewAnnually
AcceptanceAcceptable

R-IAM-13 — Data residency violation

FieldValue
Descriptioniam writes data to wrong region for a residency-flagged tenant.
InherentUnlikely × High
MitigationTenant residency in tenant.created.v1; iam region-routes at write; runtime guard: any write outside tenant's region throws; quarterly audit
ResidualRare × Moderate
Owneriam team + Compliance
ReviewQuarterly
AcceptanceWatch

R-IAM-14 — Operational toil from SSO onboarding per chain customer

FieldValue
DescriptionEach enterprise SSO setup takes too much manual config; team can't scale.
InherentLikely × Moderate
MitigationSelf-serve OIDC + SAML configuration via tenant admin UI; metadata URL auto-refresh; templated onboarding doc; Customer Success training
ResidualPossible × Low
Owneriam team + Customer Success
ReviewAnnually
AcceptanceAcceptable

R-IAM-15 — Talent / on-call concentration

FieldValue
DescriptionFew engineers know iam in depth → on-call burnout, knowledge silo.
InherentPossible × Moderate
MitigationThis bundle (17 docs); recorded incident reviews; runbook completeness gate; quarterly fire drills; rotation across services
ResidualUnlikely × Low
OwnerEM
ReviewAnnually
AcceptanceAcceptable

R-IAM-16 — Argon2id parameter obsolescence

FieldValue
DescriptionHardware advances make current params insufficient; existing hashes weakened.
InherentPossible × Moderate
Mitigationhash_version field; rehash-on-login; annual review against OWASP guidance; offline rehash-job for inactive users on parameter bump
ResidualUnlikely × Low
OwnerSecurity + iam team
ReviewAnnually
AcceptanceAcceptable

R-IAM-17 — Pub/Sub event loss → downstream divergence

FieldValue
DescriptionAn iam event is lost; audit-service / gdpr-service / tenant-service drift from iam reality.
InherentRare × High
MitigationTransactional outbox; Pub/Sub at-least-once; retention 7 d; DLQ; daily reconciliation job between audit_events and outbox row count
ResidualRare × Low
Owneriam team
ReviewQuarterly
AcceptanceWatch

R-IAM-18 — Backward-compat break in JWT claims

FieldValue
DescriptionWe change a claim shape; consumers break.
InherentPossible × High
MitigationClaim contract documented in API_CONTRACTS; only additive changes inside v1; breaking changes require vN+1 rollout per MIGRATION_PLAN §4; consumer Pact tests block
ResidualUnlikely × Moderate
Owneriam team + every consumer team
ReviewPer change
AcceptanceWatch

R-IAM-19 — Insider threat

FieldValue
DescriptionEngineer with elevated access misuses iam admin endpoints.
InherentRare × Critical
MitigationLeast-privilege IAM; admin actions emit audit events with actor; quarterly access review; production access requires JIT approval; KMS HSM means no individual can extract keys
ResidualRare × High
OwnerSecurity
ReviewQuarterly
AcceptanceWatch

R-IAM-20 — Cost runaway from AI risk classification

FieldValue
DescriptionLogin surge multiplies AI orchestrator calls → unexpected bill.
InherentPossible × Moderate
Mitigation60-s cache by (userId, ipMasked); budget controls; per-tenant cost dashboard; circuit breaker on AI cost per minute; rules-only fallback always available
ResidualUnlikely × Low
Owneriam team + Finance ops
ReviewQuarterly
AcceptanceAcceptable

3. Risk Heatmap (residual)

Likelihood →
Rare Unlikely Possible Likely Almost-cert
Critical R-06
High R-01,R-19 R-02,R-03
Moderate R-07,R-11,R-13 R-09,R-14,R-17 R-18
Low R-04,R-08,R-10,R-12,R-15,R-16,R-20 R-05

Any cell crossing the High / Possible quadrant requires explicit acceptance by Security + EM at quarterly review.

4. Top-3 Watch List (this quarter)

  1. R-IAM-02 (refresh token theft) — push WebAuthn adoption for staff in Q-current.
  2. R-IAM-03 (SSO IdP SPOF for chains) — formalize break-glass path; document tenant SLAs.
  3. R-IAM-18 (JWT claim breaking change) — automate consumer Pact verification; add Sunset header process.

5. Risk Treatment Workflow

  1. New risk identified → opened in risks/iam/<id>.md with template.
  2. Triage in next iam team standup; severity assigned.
  3. If High+ → 7-day mitigation plan due; tracked in Linear.
  4. Monthly review: residual rating updated.
  5. Quarterly review: register published; leadership signs off acceptance posture.
  6. Closed risks remain in register marked CLOSED with date.

6. Cross-References