Operator Management Service — Service Risk Register
Status: populated Owner: Platform Engineering Last updated: 2026-04-18
| ID | Risk | Likelihood | Impact | Mitigation | Owner |
|---|---|---|---|---|---|
| R-OPS-01 | Vault outage during operator create leaves PG row without credentials | Low | High | Compensating delete of PG row on Vault failure; alert on OpsVaultErrors | Engineering |
| R-OPS-02 | SMPP password exposed in logs during error handling | Medium | Critical | Pino redaction on password field; CI log scanner; code review rule | Security |
| R-OPS-03 | Stale health cache causes routing to degraded operator | Medium | High | Redis TTL 60 s; smpp-connector publishes heartbeat every 10 s; worst-case 70 s stale | Engineering + SRE |
| R-OPS-04 | Legacy migration introduces duplicate operators (same host/port/systemId under different names) | Medium | Medium | Migration script dry-run mode; duplicate guard rejects; ops team reviews report | Engineering |
| R-OPS-05 | Routing rule prefix conflict not caught (concurrent create) | Low | High | Serializable PG transaction + unique index; conflict checker unit tested | Engineering |
| R-OPS-06 | Vault K8s SA token expires and is not renewed | Low | High | Vault Agent sidecar renews at 50% TTL; alert on auth failure | SRE |
| R-OPS-07 | mTLS cert expiry blocks smpp-connector credential refresh | Low | High | cert-manager auto-renews 30 days before; alert at 14 days | SRE |
| R-OPS-08 | NATS config event published but routing-engine misses it (consumer restart gap) | Low | Medium | Durable NATS consumer resumes from offset; routing-engine bootstraps from REST API on cold start | Engineering |
| R-OPS-09 | Admin with ops:admin scope creates malicious routing rule (insider threat) | Low | High | All admin actions audit-logged; anomaly detection on routing rule changes (future) | Security |
| R-OPS-10 | Vault path policy too broad (lateral access to other services' secrets) | Low | Critical | Policy scoped to secret/ops/operators/* only; Vault policy test in CI | Security |
| R-OPS-11 | Soft-delete bypassed by direct PG write | Low | High | No service has PG write access to ops schema except OMS; DB user policy enforced | Security + DBA |