SMS Orchestrator — Service Risk Register
Status: populated Owner: Platform Engineering Last updated: 2026-04-18
| ID | Risk | Likelihood | Impact | Mitigation | Owner |
|---|---|---|---|---|---|
| R-ORCH-01 | Idempotency key collision across accounts | Low | High | Hash includes accountId; unit test covers collision | Engineering |
| R-ORCH-02 | Redis fail-open on pipeline dedupe causes double SMS send | Medium | High | NATS AckWait (30s) bounds duplicates; PG unique on (tenant_id, message_id) as second line | Engineering |
| R-ORCH-03 | Clock skew between pods causes stale idempotency TTL | Low | Low | NTP + tolerance in SLO computation | SRE |
| R-ORCH-04 | routing-engine deploys a breaking gRPC change | Medium | High | Pact contract tests block merge; versioned proto | Engineering |
| R-ORCH-05 | PG partition not created in advance | Low | Medium | pg_partman or monthly cron; alert on missing partition | SRE |
| R-ORCH-06 | DLQ publish fails silently leaving message lost | Low | High | Secondary write to orch.dead_letters table before ACK; alert on table growth | Engineering |
| R-ORCH-07 | Cutover from custom api-gateway introduces regression | Medium | High | Dual-run period; replay compare; canary | SRE + Engineering |
| R-ORCH-08 | High segment count message causes cost spike | Medium | Medium | Pre-submit quota check (future); ops alert on segment count P99 | Product + Engineering |
| R-ORCH-09 | PII in Pino logs during development flag misuse | Medium | High | Pino transport redaction enforced; CI log-scanner | Security |
| R-ORCH-10 | Kong misroute sends traffic to wrong service | Low | High | Kong route lint in CI against OpenAPI; smoke tests post-deploy | SRE |
| R-ORCH-11 | attempt_count drift on restart causes extra SMS | Low | Medium | Pipeline reads PG state on redelivery before deciding retry | Engineering |
| R-ORCH-12 | Zod schema too strict, rejects valid E.164 edge cases | Low | Medium | Property-based tests; fuzz corpus | Engineering |