Skip to main content

SMS Orchestrator — Service Risk Register

Status: populated Owner: Platform Engineering Last updated: 2026-04-18

IDRiskLikelihoodImpactMitigationOwner
R-ORCH-01Idempotency key collision across accountsLowHighHash includes accountId; unit test covers collisionEngineering
R-ORCH-02Redis fail-open on pipeline dedupe causes double SMS sendMediumHighNATS AckWait (30s) bounds duplicates; PG unique on (tenant_id, message_id) as second lineEngineering
R-ORCH-03Clock skew between pods causes stale idempotency TTLLowLowNTP + tolerance in SLO computationSRE
R-ORCH-04routing-engine deploys a breaking gRPC changeMediumHighPact contract tests block merge; versioned protoEngineering
R-ORCH-05PG partition not created in advanceLowMediumpg_partman or monthly cron; alert on missing partitionSRE
R-ORCH-06DLQ publish fails silently leaving message lostLowHighSecondary write to orch.dead_letters table before ACK; alert on table growthEngineering
R-ORCH-07Cutover from custom api-gateway introduces regressionMediumHighDual-run period; replay compare; canarySRE + Engineering
R-ORCH-08High segment count message causes cost spikeMediumMediumPre-submit quota check (future); ops alert on segment count P99Product + Engineering
R-ORCH-09PII in Pino logs during development flag misuseMediumHighPino transport redaction enforced; CI log-scannerSecurity
R-ORCH-10Kong misroute sends traffic to wrong serviceLowHighKong route lint in CI against OpenAPI; smoke tests post-deploySRE
R-ORCH-11attempt_count drift on restart causes extra SMSLowMediumPipeline reads PG state on redelivery before deciding retryEngineering
R-ORCH-12Zod schema too strict, rejects valid E.164 edge casesLowMediumProperty-based tests; fuzz corpusEngineering