smpp-connector — Service Risk Register
Status: populated | Last updated: 2026-04-18
Risk Register
| ID | Risk | Likelihood | Impact | Severity | Mitigation | Owner |
|---|---|---|---|---|---|---|
| R1 | Silent SMPP session death — enquire_link succeeds but MNO drops submit_sm responses silently (a known issue with certain MNO SMPP implementations). Messages timeout with uncertain delivery state. | Medium | High | High | 30 s enquire_link heartbeat; 30 s submit_sm_resp timeout with explicit NAK; pendingPduMap TTL eviction; sms-orchestrator DLR timeout handling; MNO-specific protocol flags configurable per operator | Platform Engineering |
| R2 | TPS overshoot due to Redis unavailability — Redis goes down; TPS enforcement disabled (fail-open); smpp-connector exceeds operator TPS contract; MNO returns ESME_RTHROTTLED for all PDUs. | Low | High | High | Alert on Redis unavailability; fail-open is intentional to avoid message loss; ESME_RTHROTTLED handled with NATS NAK + backoff; MNO-level throttle acts as backstop | Platform Engineering |
| R3 | SMPP credential leakage — a bug in logging, error reporting, or a core dump exposes the SMPP password for one or more operators. | Low | Critical | High | Credentials held in memory only during bind; never logged (enforced by code review + lint rule prohibiting password in log objects); Vault rotation on any suspected exposure; regular security review | Security / Platform Engineering |
| R4 | Message loss during pod restart — smpp-connector pod is restarted (OOMKill, rolling update) while PDUs are in-flight. pendingPduMap is lost; submit_sm_resp never processed; correlation records remain SUBMITTED. | Medium | Medium | Medium | sms-orchestrator implements a DLR timeout (e.g. 72 h); messages with SUBMITTED status after timeout are re-dispatched or moved to DLQ; NATS ACK only after submit_sm_resp reduces window of loss | Platform Engineering |
| R5 | Long message encoding mismatch — some MNOs do not support the message_payload TLV optional parameter, causing silent message drops or ESME_ROPTPARNOTALLWD errors for long messages. | Medium | Medium | Medium | Per-operator longMessageStrategy configuration (CSMS vs TLV) managed via operator-management-service; integration test verifies encoding against each MNO's test endpoint before production binding | Platform Engineering / Partnerships |