smpp-connector — Epics & User Stories
Last updated: 2026-04-18 Story point scale: 1 (trivial) · 2 (small) · 3 (medium) · 5 (large) · 8 (XL)
EP-SC-01: SMPP Session Management
Description: Establish, maintain, and recover persistent SMPP 3.4 sessions with MNO SMPP servers. Includes bind modes, enquire_link heartbeat, exponential backoff reconnection, and health event publishing.
US-SC-001 — Bootstrap NestJS SMPP application
Title: As a platform engineer, I want a NestJS application bootstrapped with NATS JetStream consumer transport and an SMPP session manager so that the service can start connecting to MNOs.
Description: Create the NestJS app with the NATS microservice transport and an initial SmppSessionManager module. Include configuration loading for OPERATOR_IDS and basic startup logging.
Acceptance Criteria:
- Service starts and logs all configured
operatorIdvalues - NATS connection established with NKey credentials on startup
-
/healthreturns 200 immediately after process start - ESLint and TypeScript compiler report 0 errors
Story Points: 3
US-SC-002 — SMPP bind_transceiver with credential fetch
Title: As a platform engineer, I want smpp-connector to fetch SMPP credentials from operator-management-service and establish a bind_transceiver session so that outbound messages can be transmitted.
Description: Implement SmppSessionManager.connect(operatorId). Fetch credentials via HTTP, open TCP socket, send bind_transceiver PDU, await bind_transceiver_resp, transition state to BOUND.
Acceptance Criteria:
- Credentials fetched via
GET /internal/operators/{id}/smpp-credentials -
bind_transceiverPDU sent with correctsystem_id,password,system_type - State transitions DISCONNECTED → CONNECTING → BOUND on success
-
operator.health BOUNDevent published on successful bind - Integration test: mock SMPP server accepts bind; verify state = BOUND
Story Points: 5
US-SC-003 — Fallback to bind_transmitter + bind_receiver
Title: As a platform engineer, I want smpp-connector to fall back to separate TX/RX bind modes when bind_transceiver is rejected so that compatibility with older MNO SMPP servers is maintained.
Acceptance Criteria:
- On
ESME_RINVCMDIDresponse tobind_transceiver→ retry withbind_transmitter - Open separate
bind_receiverconnection for DLR receipts in TX-only mode - Both connections tracked in
sessionMap - Integration test: mock SMPP server rejects transceiver; verify TX+RX fallback binds
Story Points: 3
US-SC-004 — enquire_link heartbeat and timeout handling
Title: As a platform engineer, I want smpp-connector to send enquire_link PDUs every 30 s and mark the session UNBOUND after 10 s timeout so that silent session deaths are detected quickly.
Acceptance Criteria:
-
enquire_linkPDU sent every 30 s per active session - 10 s response timeout configured per heartbeat
- On timeout: session state → UNBOUND;
operator.health UNBOUNDevent published - On timeout: reconnect sequence (UC-01) initiated
- Metric
smpp_connector_enquire_link_timeout_totalincremented - Unit test verifies 10 s timer fires and triggers disconnect
Story Points: 3
US-SC-005 — Exponential backoff reconnection
Title: As a platform engineer, I want smpp-connector to reconnect with exponential backoff (5 s → 10 s → 20 s → 40 s → 60 s max) so that MNO SMPP servers are not overwhelmed during outages.
Acceptance Criteria:
- Backoff delay sequence: 5 000, 10 000, 20 000, 40 000, 60 000 ms (capped)
-
reconnectAttemptscounter increments on each failed attempt -
reconnectAttemptsresets to 0 after successful bind -
smpp_connector_reconnect_attempts_totalcounter increments - Unit tests validate delay calculation for attempts 1–6
Story Points: 2
EP-SC-02: PDU Transmission
Description: Consume NATS dispatch commands and transmit submit_sm PDUs with correct encoding and segmentation for both single and long messages.
US-SC-006 — NATS consumer for smpp.operator.{operatorId}
Title: As a platform engineer, I want smpp-connector to consume SmsDispatchCommand messages from NATS so that outbound SMS can be dispatched.
Description: Configure a durable JetStream consumer per operator. Validate incoming SmsDispatchCommand payload. If session is not BOUND, NAK the message.
Acceptance Criteria:
- Consumer group:
smpp-connector-{operatorId}(durable) -
AckExplicitpolicy — manual ACK only after successfulsubmit_sm_resp - NAK (requeue) if session state is not BOUND
- Invalid payload (missing
to,text,messageId) → ACK + DLQ event (do not requeue invalid messages) - Integration test: publish 5 commands; verify all 5
submit_smPDUs sent to mock SMPP server
Story Points: 3
US-SC-007 — GSM7 and UCS2 submit_sm encoding
Title: As a platform engineer, I want smpp-connector to correctly encode submit_sm PDUs in GSM7 or UCS2 so that messages are delivered with the correct character set.
Acceptance Criteria:
- GSM7 messages use
data_coding = 0x00; encoded with GSM 7-bit alphabet - UCS2 messages use
data_coding = 0x08; encoded as UTF-16BE - Non-GSM7 characters in text automatically trigger UCS2 encoding
-
encodingfield in dispatch command overrides auto-detection - Unit tests cover: pure ASCII, GSM7 extended chars, Arabic text (UCS2)
Story Points: 3
US-SC-008 — Long message support: CSMS segmentation
Title: As a platform engineer, I want smpp-connector to split long messages into concatenated SMS segments with SAR UDH headers so that MNOs that do not support TLV receive long messages correctly.
Acceptance Criteria:
- GSM7 messages > 160 chars split into 153-char segments (SAR UDH overhead)
- UCS2 messages > 70 chars split into 67-char segments
- Each segment includes correct SAR reference number, total parts, part index in UDH
- All segments sent as separate
submit_smPDUs withesm_class = 0x40 -
smpp_connector_long_message_segments_total{strategy="csms"}incremented - Unit test: 161-char GSM7 message → exactly 2 segments with correct UDH
Story Points: 5
US-SC-009 — Long message support: message_payload TLV
Title: As a platform engineer, I want smpp-connector to send long messages as a single submit_sm PDU with the message_payload optional parameter so that MNOs supporting TLV receive undivided messages.
Acceptance Criteria:
- When
longMessageStrategy = TLV: singlesubmit_smwithmessage_payloadTLV (tag0x0424) -
short_messagefield left empty when TLV is used -
smpp_connector_long_message_segments_total{strategy="tlv"}incremented - Integration test: mock SMPP server verifies TLV parameter in PDU
Story Points: 3
EP-SC-03: DLR Handling
Description: Receive deliver_sm DLRs from MNOs, correlate them to internal message IDs, and publish sms.dlr.inbound events to NATS.
US-SC-010 — Message correlation persistence
Title: As a platform engineer, I want smpp-connector to write a MessageCorrelation record for each successfully sent submit_sm so that incoming DLRs can be matched to internal message IDs.
Acceptance Criteria:
-
INSERT INTO smpp.message_correlationsonsubmit_sm_resp ESME_ROK -
operator_message_idtaken fromsubmit_sm_resp.message_id -
expires_atset tosubmitted_at + 72 hours - Write failure → NATS NAK (message re-dispatched); no silent loss
- Integration test: submit → verify DB record created with correct fields
Story Points: 3
US-SC-011 — deliver_sm DLR parsing and publishing
Title: As a platform engineer, I want smpp-connector to parse deliver_sm DLR PDUs and publish sms.dlr.inbound events so that dlr-processor can update message delivery status.
Acceptance Criteria:
-
deliver_smreceived →receipted_message_idextracted from UDH orshort_messagetext - Correlation lookup:
SELECT FROM smpp.message_correlations WHERE operator_id AND operator_message_id -
DlrInboundEventpublished tosms.dlr.inboundwith correctmessageIdandstatus -
deliver_sm_resp ESME_ROKsent to MNO after successful publish - SMPP
message_state→ internalDlrStatusmapping covers all 8 values - Integration test: submit → receive mock DLR → verify
sms.dlr.inboundevent on NATS
Story Points: 5
EP-SC-04: TPS Throttling and Failover
Description: Enforce per-operator TPS limits via Redis, implement primary/backup operator failover, and validate observability coverage.
US-SC-012 — Redis TPS sliding-window throttling
Title: As a platform engineer, I want smpp-connector to enforce per-operator TPS limits using a Redis sliding-window counter so that the platform does not breach MNO TPS contracts.
Acceptance Criteria:
-
INCR tps:{operatorId}:{windowStart}+EXPIRE ... 2pipeline per submit attempt -
INCRresult >tpsLimit→ NAK NATS message with 500 ms delay -
INCRresult ≤tpsLimit→ proceed withsubmit_sm - Redis unavailability → fail-open (allow submit);
warnlog; alert fires -
smpp_connector_tps_throttle_totalcounter incremented on throttle - Integration test: tpsLimit=10, send 15 messages in 1 s → verify 5 NAKed
Story Points: 3
US-SC-013 — Primary/backup operator failover
Title: As a platform engineer, I want smpp-connector to initiate a bind with the backup operator when the primary operator becomes UNBOUND so that message delivery continues during primary operator outages.
Acceptance Criteria:
- Primary UNBOUND → read
backupOperatorIdfrom credentials response - Initiate bind with backup operator if
backupOperatorIdis configured -
operator.health UNBOUNDpublished for primary;operator.health BOUNDpublished for backup - On primary recovery (successful rebind):
operator.health FAILBACKpublished - Integration test: primary mock SMPP server drops → verify backup bind + health events
Story Points: 5
US-SC-014 — Complete Prometheus metrics instrumentation
Title: As a platform engineer, I want all defined Prometheus metrics emitting correctly so that I can monitor SMPP session health and throughput in Grafana.
Acceptance Criteria:
- All 12 metrics from OBSERVABILITY.md registered and emitting
-
smpp_connector_sessions_totalgauge correctly reflects BOUND/UNBOUND/CONNECTING counts -
SmppConnectorSessionUnboundalert tested and fires in staging - Grafana dashboard
dashboards/smpp-connector.jsoncreated with session state + TPS panels
Story Points: 3
US-SC-015 — Kubernetes deployment with readiness and liveness probes
Title: As a platform engineer, I want smpp-connector deployed to Kubernetes with correct health probes so that Kubernetes correctly manages pod lifecycle during SMPP session establishment.
Description: The /ready endpoint must reflect SMPP session state. A pod should not receive NATS messages until at least one operator session is BOUND.
Acceptance Criteria:
-
/readyreturns 503 until at least one operator session is BOUND -
/readyreturns 200 when ≥ 1 operator session is BOUND - Rolling update completes without dropping in-flight NATS messages (verified with load test)
- NetworkPolicy egress allows TCP to each MNO's SMPP IP:port
- Pre-stop hook sends
unbindPDU to gracefully close sessions before pod termination
Story Points: 3
EP-SC-05: Per-MNO × Per-Direction Connector Pool with Bind Affinity
Description: Replace the single StatefulSet model with one Deployment per (MNO × bind-direction) so that each pod owns exactly one bind, with stable identity, anti-affinity, and dedicated egress IPs whitelisted by the MNO. Aligns with ADR-0004 §7.
US-SC-016 — Per-MNO Deployment manifests with stable bind identity
Title: As a platform engineer, I want one Deployment per smpp-connector-{mno}-{direction} so that bind-affinity and per-MNO scaling are independent.
Acceptance Criteria:
- Deployments named
smpp-connector-awcc-tx,-awcc-rx,-awcc-trx, etc. for each MNO × direction. - Each pod reads its
MNO_IDandBIND_DIRECTIONfrom env (set by Deployment). - Pod anti-affinity: same
(mno, direction)cannot colocate on the same node. - Each pod registers itself in a NATS KV bucket
smpp.bind.owners.{mno}.{direction}with TTL 30 s; only one owner at a time per (mno, direction, bindId). - Helm/Kustomize chart parameterises MNO list; new MNO requires only adding to
values.yaml.
Story Points: 8
US-SC-017 — Bind-id partitioning across replicas
Title: As a platform engineer, I want bind-ids partitioned across replicas of the same Deployment so that horizontal scale-out doesn't cause bind collisions.
Acceptance Criteria:
- Each pod claims a bindId range based on its
POD_ORDINAL(StatefulSet) or NATS KV lock (Deployment). - Bind-id reclaim on pod loss within 30 s (TTL).
- Metric
smpp_bind_owner_changes_total{mno,direction}increments on bind ownership change. - Integration test: scale 2→4 pods; verify all 4 bindIds are bound across distinct pods.
Story Points: 5
US-SC-018 — Lane-specific bind subjects
Title: As the smpp-connector pool, I want lane-specific NATS subjects so that priority-lane pods bind separately and don't share queue heads.
Acceptance Criteria:
- Subjects:
lane.p0.smpp.{mno}.{direction},lane.p1.smpp.{mno}.{direction}, etc. - Per-lane consumer ack/lag metrics independent.
- Pod owns one or more lanes via env
LANES=P0,P1(default: all lanes). - Reserved P0/P1 pods may exist with
LANES=P0,P1only, separate Deployment.
Story Points: 5
US-SC-019 — Bind-affinity NATS queue group of 1
Title: As the smpp-connector pool, I want each (mno, direction, bindId) NATS subject to be consumed by exactly one pod so that PDU sequence order is preserved per bind.
Acceptance Criteria:
- Subject
smpp.{mno}.{direction}.{bindId}.dispatchconsumer queue groupqg-{mno}-{direction}-{bindId}(size 1 enforced). - Pod ownership tied to bind ownership (US-SC-017).
- On bind ownership transfer, JetStream consumer is recreated atomically to avoid double-consumption.
Story Points: 5
US-SC-020 — Bind ownership reconciliation cron
Title: As an SRE, I want a reconciler that periodically checks no (mno, direction, bindId) is owned by zero or multiple pods.
Acceptance Criteria:
- Cron every 30 s: list NATS KV entries; reconcile against Deployment pods.
- Orphan ownership (TTL expired but not cleaned) is forcibly deleted.
- Metric
smpp_bind_orphan_totalincrements on cleanup. - Alert
SmppBindOrphanif > 0 sustained for 5 min.
Story Points: 3
US-SC-021 — Per-MNO connector pool dashboard
Title: As the NOC, I want a Grafana panel showing each (mno, direction) bind status, owner pod, lag, and TPS.
Acceptance Criteria:
- Panel displays binding state machine per MNO.
- Drill-down to per-bind metrics.
- Linked alert
SmppMnoBindAllUnbound{mno}if all binds for an MNO are UNBOUND for 60 s.
Story Points: 3
EP-SC-06: Per-Bind Submit Window, Sequence-Number Manager, and Concatenation Buffer
Description: Each SMPP bind requires its own (a) submit window (max in-flight PDUs awaiting submit_sm_resp), (b) monotonic sequence-number counter (survives pod restart with 60 s warm-up), (c) concatenation buffer (UDH 0x00 / 0x08) for inbound MO concat reassembly.
US-SC-022 — Per-bind sequence-number manager (Redis-backed)
Title: As an smpp-connector pod, I want a per-bind monotonic sequence-number counter persisted in Redis so that pod restart doesn't reuse sequence numbers within the SMPP bind_seq window.
Acceptance Criteria:
- Key
smpp:seq:{mno}:{direction}:{bindId}isINCR-incremented in Redis. - Value modulo 0x7FFFFFFF (per SMPP spec).
- On pod start, current Redis value loaded; in-process counter is
Redis + 1000to avoid races during 60 s warm-up. - Sequence reset only on
unbind/rebind. - Metric
smpp_seq_skip_total{mno,direction,bindId}(should always be 0; skip indicates overflow).
Story Points: 5
US-SC-023 — Per-bind submit window enforcement
Title: As an smpp-connector pod, I want a per-bind submit window of N (default 100, configurable per MNO) so that the platform doesn't overwhelm an MNO peer.
Acceptance Criteria:
- In-flight counter incremented on each
submit_sm; decremented onsubmit_sm_respor timeout (30 s). - When counter ≥ window: NATS message NAKed with delay 100 ms.
- Window value loaded from
ops.smpp_binds.windowper bind; reload every 60 s. - Metric
smpp_window_inflight{mno,direction,bindId}gauge. - Integration test: window=10; submit 20 messages → first 10 sent, next 10 NAKed.
Story Points: 5
US-SC-024 — Adaptive window learning
Title: As an smpp-connector pod, I want the submit window to auto-tune downward when the MNO returns ESME_RTHROTTLED so that we converge to the MNO's actual capacity.
Acceptance Criteria:
- On
ESME_RTHROTTLEDresponse, in-process window halved (min 10). - Window grows additively (+5 every 60 s) until reaching configured max.
- Window value reported via metric
smpp_window_dynamic{mno,direction,bindId}. - Tests cover: throttle event halves, growth back to baseline.
Story Points: 5
US-SC-025 — Inbound MO concatenation buffer (UDH 0x00 + 0x08)
Title: As an smpp-connector pod (RX/TRX), I want to reassemble multi-segment inbound MO messages so that downstream consumers see a single logical MO.
Acceptance Criteria:
- Concat buffer keyed by
(originator, refNum, totalParts)with TTL =concat_window_seconds(default 60). - Both UDH formats supported:
0x00(8-bit refNum) and0x08(16-bit refNum). - On all parts received → publish single reassembled MO to
sms.mo.inbound. - On TTL expiry with missing parts → publish partial MO with
incomplete: trueflag. - Metric
smpp_concat_complete_total,smpp_concat_partial_total.
Story Points: 8
US-SC-026 — Outbound concatenation reference-number manager
Title: As an smpp-connector pod, I want unique 8-bit (or 16-bit) reference numbers per concatenated outbound message so that recipient handsets reassemble correctly.
Acceptance Criteria:
- Per-bind
INCRRedis countersmpp:concat:ref:{bindId}modulo 256 (8-bit). - 16-bit ref allocated when payload
> 655358-bit refs in 60 s. - All segments of one outbound message share the same ref; carried in UDH.
- Test: send a 4-segment GSM7 message; verify all 4 PDUs share one refNum.
Story Points: 3
US-SC-027 — Concat-buffer pressure metric and alert
Title: As the NOC, I want a metric on the concat-buffer fill so that runaway buffer growth is detected.
Acceptance Criteria:
- Gauge
smpp_concat_buffer_size{mno,bindId}. - Alert
SmppConcatBufferLeakif > 10 000 sustained for 5 min.
Story Points: 2
EP-SC-07: ESME_R* Error Taxonomy
Description: Comprehensive handling of MNO error responses with appropriate behaviour per error class.
US-SC-028 — ESME_RTHROTTLED back-off (linked to US-SC-024)
Title: As an smpp-connector pod, I want ESME_RTHROTTLED to halve the window AND requeue the failed message after 500 ms.
Acceptance Criteria:
- On RTHROTTLED → window halved (US-SC-024); failing message NAKed with 500 ms delay.
- Repeated RTHROTTLED in 60 s → emit
smpp.bind.degradedevent.
Story Points: 3
US-SC-029 — ESME_RSUBMITFAIL half-close
Title: As an smpp-connector pod, I want ESME_RSUBMITFAIL on consecutive submits to half-close the bind (stop submitting; keep RX) so we don't hammer a confused MNO.
Acceptance Criteria:
- 3 consecutive RSUBMITFAIL → bind enters
HALF_CLOSED_TXstate. - Half-closed bind continues to receive DLR but stops submitting.
- Recovery: 60 s test-submit at 1/min until success → restore to BOUND.
- Metric
smpp_bind_half_closed_total.
Story Points: 3
US-SC-030 — ESME_RMSGQFUL drain mode
Title: As an smpp-connector pod, I want ESME_RMSGQFUL (queue full at MNO) to enter drain mode (stop submitting, wait for capacity) so we don't lose messages to repeated rejection.
Acceptance Criteria:
- On RMSGQFUL → bind state =
DRAINING; in-flight finishes. - Resume submit when in-flight = 0 + 30 s wait.
- Customer-portal alert if drain duration > 5 min.
Story Points: 3
US-SC-031 — Comprehensive error → metric + log mapping
Title: As an SRE, I want every SMPP error code mapped to a Prometheus label and logged with structured context so that incidents can be triaged quickly.
Acceptance Criteria:
- Counter
smpp_response_status_total{mno,direction,bindId,esme_status}. - Pino log with
traceId,messageId,esme_status,pduSeqNum. - Runbook entry per error class.
Story Points: 2
EP-SC-08: Dedicated Egress IP Pools per MNO
Description: MNOs whitelist source IPs. The platform must present a stable per-MNO egress and prevent collateral damage from one MNO's traffic affecting another.
US-SC-032 — Per-MNO egress IP allocation
Title: As a platform engineer, I want each MNO bind to egress through a dedicated IP from a per-MNO pool so that an MNO can rate-limit our IPs without affecting other MNOs.
Acceptance Criteria:
- NodePool
np-datahas dedicated egress NAT gateway per MNO. - Pod env
EGRESS_IP_POOL=awccselects the right egress. - NetworkPolicy egress allows TCP only to the MNO's published SMPP IP:port.
- Verified via
curl ifconfig.mefrom inside pod returns the expected IP.
Story Points: 5
US-SC-033 — Egress-IP rotation and MNO notification
Title: As a network admin, I want documented procedure for rotating an egress IP with MNO coordination so that whitelist changes are predictable.
Acceptance Criteria:
- Runbook
runbooks/smpp-egress-ip-rotation.mdexists. - Rotation requires: MNO notification (T-72h), new IP added to allow-list, old IP retained 30 d, then removed.
- Automation:
scripts/rotate-egress-ip.shupdatesnp-dataNAT.
Story Points: 3
US-SC-034 — Source-IP egress dashboard
Title: As the NOC, I want a panel showing per-MNO egress IP and bytes/s so that egress saturation is visible.
Acceptance Criteria:
- Panel: per-MNO egress bytes/s; per-MNO active connections.
- Alert if any egress NAT exceeds 80% bandwidth budget.
Story Points: 2
EP-SC-09: Pashto/Dari/Arabic UCS-2 Conformance Suite
Description: SMPP encoding for Pashto/Dari/Arabic must be UCS-2; segment counts and content fingerprints must match billing's calculation. Inconsistency causes billing complaints.
US-SC-035 — UCS-2 round-trip conformance test
Title: As an SMPP engineer, I want a conformance test that round-trips Pashto and Dari content through GSM7-detection → UCS-2 encoding → decode back so that no characters are corrupted.
Acceptance Criteria:
- Test corpus: 100 strings each of Pashto, Dari, Arabic, Persian-with-Arabic-numerals, mixed Latin+Arabic.
- Each string: encode → decode → assert byte-equal to source.
- Edge cases: ZWJ, ZWNJ, Tatweel, Arabic-Indic digits.
- Test runs in CI; fails build on regression.
Story Points: 5
US-SC-036 — Segment-count parity with billing
Title: As a billing engineer, I want smpp-connector's segment-count calculation to match billing-service's calculation byte-for-byte so that billing matches actual SMPP segments dispatched.
Acceptance Criteria:
- Shared library
@ghasi/sms-segment-counterpublished; both services consume the same version. - CI contract test: 1000-string corpus → both services compute identical
(encoding, segmentCount). - Fingerprint included in NATS payload
sms.dispatch.command.segmentFingerprint.
Story Points: 5
US-SC-037 — Auto-detection threshold tuning
Title: As an SMPP engineer, I want auto-detection of UCS-2 to trigger correctly even with one non-GSM7 character so that the recipient is not garbled.
Acceptance Criteria:
- Detection: any character not in GSM7 default + extension table → UCS-2.
- Test: ASCII string with one Pashto character → UCS-2 segments (60 chars max).
- Override:
encoding=GSM7_FORCElossy with replacement char (?); audit-logged. - Telemetry:
smpp_encoding_total{encoding,detection=auto|forced}.
Story Points: 3