Skip to main content

Config Service — Service Risk Register

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template


1. Risk Register

IDRiskProbabilityImpactSeverityOwnerMitigation
RISK-CONFIG-01Resolution SLO breach — pipeline latency exceeds 500 ms p95 due to BFS depth or upstream slownessMediumHighHighPlatform SRECircuit breakers on all upstreams; Redis cache hit target ≥ 85 %; BFS depth limit 10; resolution timeout 504
RISK-CONFIG-02Cross-tenant data leak — RLS misconfiguration exposes config of Tenant B to Tenant ALowCriticalCriticalSecurity teamRLS policies enforced + automated tenant-isolation test in CI; any failure blocks deploy
RISK-CONFIG-03Stale cache after missed eviction event — NATS message lost before eviction; user retains denied/granted access longer than TTLLow-MediumMediumMediumPlatform teamShort TTL (60 s); DLQ alert on missed events; manual full-tenant cache flush tool
RISK-CONFIG-04DAG cycle in production — circular role or config node reference introduced via direct DB manipulationLowMediumMediumDBAsApplication-level cycle detection; no direct DB write access in prod; migrations only
RISK-CONFIG-05ExplicitAllow override abuse — Tenant Admin grants broad override without adequate reviewMediumHighHighCompliance teamjustification mandatory; override events audited; override expiry required; audit alerts for broad nodeId scope
RISK-CONFIG-06BFS role graph explosion — extremely deep role hierarchy degrades resolution performanceLowHighMediumPlatform teamMax depth 10 enforced at definition time; rejected at CIRCULAR_ROLE_INHERITANCE or depth > 10
RISK-CONFIG-07facility-service coupling — config-service cannot resolve without hierarchy spine; if facility-service is down, all resolutions failMediumHighHighPlatform SREFail closed (deny); cache for unexpired spines; facility-service SLO aligned with config-service SLO
RISK-CONFIG-08Design token bloat — tenants create thousands of token overrides; token merge becomes slowLowLowLowProductPaginate token list API; compress token map in Redis; alert on token count > 500 per tenant
RISK-CONFIG-09System role mutation during incident — SUPER_ADMIN modifies system roles under pressure; unintended access grantedLowCriticalHighPlatform leadSUPER_ADMIN role changes require change-management ticket; audit event; 4-eyes approval (procedural)

2. Accepted Risks

IDAccepted riskRationale
RISK-CONFIG-03Short window of stale cache (≤ 60 s) after event lossTTL provides self-healing; 60 s window is clinically acceptable for config metadata
RISK-CONFIG-08Token count growthToken operations are infrequent; mitigation deferred to v2