Skip to main content

smpp-connector — Epics & User Stories

Last updated: 2026-04-18 Story point scale: 1 (trivial) · 2 (small) · 3 (medium) · 5 (large) · 8 (XL)


EP-SC-01: SMPP Session Management

Description: Establish, maintain, and recover persistent SMPP 3.4 sessions with MNO SMPP servers. Includes bind modes, enquire_link heartbeat, exponential backoff reconnection, and health event publishing.


US-SC-001 — Bootstrap NestJS SMPP application

Title: As a platform engineer, I want a NestJS application bootstrapped with NATS JetStream consumer transport and an SMPP session manager so that the service can start connecting to MNOs.

Description: Create the NestJS app with the NATS microservice transport and an initial SmppSessionManager module. Include configuration loading for OPERATOR_IDS and basic startup logging.

Acceptance Criteria:

  • Service starts and logs all configured operatorId values
  • NATS connection established with NKey credentials on startup
  • /health returns 200 immediately after process start
  • ESLint and TypeScript compiler report 0 errors

Story Points: 3


US-SC-002 — SMPP bind_transceiver with credential fetch

Title: As a platform engineer, I want smpp-connector to fetch SMPP credentials from operator-management-service and establish a bind_transceiver session so that outbound messages can be transmitted.

Description: Implement SmppSessionManager.connect(operatorId). Fetch credentials via HTTP, open TCP socket, send bind_transceiver PDU, await bind_transceiver_resp, transition state to BOUND.

Acceptance Criteria:

  • Credentials fetched via GET /internal/operators/{id}/smpp-credentials
  • bind_transceiver PDU sent with correct system_id, password, system_type
  • State transitions DISCONNECTED → CONNECTING → BOUND on success
  • operator.health BOUND event published on successful bind
  • Integration test: mock SMPP server accepts bind; verify state = BOUND

Story Points: 5


US-SC-003 — Fallback to bind_transmitter + bind_receiver

Title: As a platform engineer, I want smpp-connector to fall back to separate TX/RX bind modes when bind_transceiver is rejected so that compatibility with older MNO SMPP servers is maintained.

Acceptance Criteria:

  • On ESME_RINVCMDID response to bind_transceiver → retry with bind_transmitter
  • Open separate bind_receiver connection for DLR receipts in TX-only mode
  • Both connections tracked in sessionMap
  • Integration test: mock SMPP server rejects transceiver; verify TX+RX fallback binds

Story Points: 3


Title: As a platform engineer, I want smpp-connector to send enquire_link PDUs every 30 s and mark the session UNBOUND after 10 s timeout so that silent session deaths are detected quickly.

Acceptance Criteria:

  • enquire_link PDU sent every 30 s per active session
  • 10 s response timeout configured per heartbeat
  • On timeout: session state → UNBOUND; operator.health UNBOUND event published
  • On timeout: reconnect sequence (UC-01) initiated
  • Metric smpp_connector_enquire_link_timeout_total incremented
  • Unit test verifies 10 s timer fires and triggers disconnect

Story Points: 3


US-SC-005 — Exponential backoff reconnection

Title: As a platform engineer, I want smpp-connector to reconnect with exponential backoff (5 s → 10 s → 20 s → 40 s → 60 s max) so that MNO SMPP servers are not overwhelmed during outages.

Acceptance Criteria:

  • Backoff delay sequence: 5 000, 10 000, 20 000, 40 000, 60 000 ms (capped)
  • reconnectAttempts counter increments on each failed attempt
  • reconnectAttempts resets to 0 after successful bind
  • smpp_connector_reconnect_attempts_total counter increments
  • Unit tests validate delay calculation for attempts 1–6

Story Points: 2


EP-SC-02: PDU Transmission

Description: Consume NATS dispatch commands and transmit submit_sm PDUs with correct encoding and segmentation for both single and long messages.


US-SC-006 — NATS consumer for smpp.operator.{operatorId}

Title: As a platform engineer, I want smpp-connector to consume SmsDispatchCommand messages from NATS so that outbound SMS can be dispatched.

Description: Configure a durable JetStream consumer per operator. Validate incoming SmsDispatchCommand payload. If session is not BOUND, NAK the message.

Acceptance Criteria:

  • Consumer group: smpp-connector-{operatorId} (durable)
  • AckExplicit policy — manual ACK only after successful submit_sm_resp
  • NAK (requeue) if session state is not BOUND
  • Invalid payload (missing to, text, messageId) → ACK + DLQ event (do not requeue invalid messages)
  • Integration test: publish 5 commands; verify all 5 submit_sm PDUs sent to mock SMPP server

Story Points: 3


US-SC-007 — GSM7 and UCS2 submit_sm encoding

Title: As a platform engineer, I want smpp-connector to correctly encode submit_sm PDUs in GSM7 or UCS2 so that messages are delivered with the correct character set.

Acceptance Criteria:

  • GSM7 messages use data_coding = 0x00; encoded with GSM 7-bit alphabet
  • UCS2 messages use data_coding = 0x08; encoded as UTF-16BE
  • Non-GSM7 characters in text automatically trigger UCS2 encoding
  • encoding field in dispatch command overrides auto-detection
  • Unit tests cover: pure ASCII, GSM7 extended chars, Arabic text (UCS2)

Story Points: 3


US-SC-008 — Long message support: CSMS segmentation

Title: As a platform engineer, I want smpp-connector to split long messages into concatenated SMS segments with SAR UDH headers so that MNOs that do not support TLV receive long messages correctly.

Acceptance Criteria:

  • GSM7 messages > 160 chars split into 153-char segments (SAR UDH overhead)
  • UCS2 messages > 70 chars split into 67-char segments
  • Each segment includes correct SAR reference number, total parts, part index in UDH
  • All segments sent as separate submit_sm PDUs with esm_class = 0x40
  • smpp_connector_long_message_segments_total{strategy="csms"} incremented
  • Unit test: 161-char GSM7 message → exactly 2 segments with correct UDH

Story Points: 5


US-SC-009 — Long message support: message_payload TLV

Title: As a platform engineer, I want smpp-connector to send long messages as a single submit_sm PDU with the message_payload optional parameter so that MNOs supporting TLV receive undivided messages.

Acceptance Criteria:

  • When longMessageStrategy = TLV: single submit_sm with message_payload TLV (tag 0x0424)
  • short_message field left empty when TLV is used
  • smpp_connector_long_message_segments_total{strategy="tlv"} incremented
  • Integration test: mock SMPP server verifies TLV parameter in PDU

Story Points: 3


EP-SC-03: DLR Handling

Description: Receive deliver_sm DLRs from MNOs, correlate them to internal message IDs, and publish sms.dlr.inbound events to NATS.


US-SC-010 — Message correlation persistence

Title: As a platform engineer, I want smpp-connector to write a MessageCorrelation record for each successfully sent submit_sm so that incoming DLRs can be matched to internal message IDs.

Acceptance Criteria:

  • INSERT INTO smpp.message_correlations on submit_sm_resp ESME_ROK
  • operator_message_id taken from submit_sm_resp.message_id
  • expires_at set to submitted_at + 72 hours
  • Write failure → NATS NAK (message re-dispatched); no silent loss
  • Integration test: submit → verify DB record created with correct fields

Story Points: 3


US-SC-011 — deliver_sm DLR parsing and publishing

Title: As a platform engineer, I want smpp-connector to parse deliver_sm DLR PDUs and publish sms.dlr.inbound events so that dlr-processor can update message delivery status.

Acceptance Criteria:

  • deliver_sm received → receipted_message_id extracted from UDH or short_message text
  • Correlation lookup: SELECT FROM smpp.message_correlations WHERE operator_id AND operator_message_id
  • DlrInboundEvent published to sms.dlr.inbound with correct messageId and status
  • deliver_sm_resp ESME_ROK sent to MNO after successful publish
  • SMPP message_state → internal DlrStatus mapping covers all 8 values
  • Integration test: submit → receive mock DLR → verify sms.dlr.inbound event on NATS

Story Points: 5


EP-SC-04: TPS Throttling and Failover

Description: Enforce per-operator TPS limits via Redis, implement primary/backup operator failover, and validate observability coverage.


US-SC-012 — Redis TPS sliding-window throttling

Title: As a platform engineer, I want smpp-connector to enforce per-operator TPS limits using a Redis sliding-window counter so that the platform does not breach MNO TPS contracts.

Acceptance Criteria:

  • INCR tps:{operatorId}:{windowStart} + EXPIRE ... 2 pipeline per submit attempt
  • INCR result > tpsLimit → NAK NATS message with 500 ms delay
  • INCR result ≤ tpsLimit → proceed with submit_sm
  • Redis unavailability → fail-open (allow submit); warn log; alert fires
  • smpp_connector_tps_throttle_total counter incremented on throttle
  • Integration test: tpsLimit=10, send 15 messages in 1 s → verify 5 NAKed

Story Points: 3


US-SC-013 — Primary/backup operator failover

Title: As a platform engineer, I want smpp-connector to initiate a bind with the backup operator when the primary operator becomes UNBOUND so that message delivery continues during primary operator outages.

Acceptance Criteria:

  • Primary UNBOUND → read backupOperatorId from credentials response
  • Initiate bind with backup operator if backupOperatorId is configured
  • operator.health UNBOUND published for primary; operator.health BOUND published for backup
  • On primary recovery (successful rebind): operator.health FAILBACK published
  • Integration test: primary mock SMPP server drops → verify backup bind + health events

Story Points: 5


US-SC-014 — Complete Prometheus metrics instrumentation

Title: As a platform engineer, I want all defined Prometheus metrics emitting correctly so that I can monitor SMPP session health and throughput in Grafana.

Acceptance Criteria:

  • All 12 metrics from OBSERVABILITY.md registered and emitting
  • smpp_connector_sessions_total gauge correctly reflects BOUND/UNBOUND/CONNECTING counts
  • SmppConnectorSessionUnbound alert tested and fires in staging
  • Grafana dashboard dashboards/smpp-connector.json created with session state + TPS panels

Story Points: 3


US-SC-015 — Kubernetes deployment with readiness and liveness probes

Title: As a platform engineer, I want smpp-connector deployed to Kubernetes with correct health probes so that Kubernetes correctly manages pod lifecycle during SMPP session establishment.

Description: The /ready endpoint must reflect SMPP session state. A pod should not receive NATS messages until at least one operator session is BOUND.

Acceptance Criteria:

  • /ready returns 503 until at least one operator session is BOUND
  • /ready returns 200 when ≥ 1 operator session is BOUND
  • Rolling update completes without dropping in-flight NATS messages (verified with load test)
  • NetworkPolicy egress allows TCP to each MNO's SMPP IP:port
  • Pre-stop hook sends unbind PDU to gracefully close sessions before pod termination

Story Points: 3


EP-SC-05: Per-MNO × Per-Direction Connector Pool with Bind Affinity

Description: Replace the single StatefulSet model with one Deployment per (MNO × bind-direction) so that each pod owns exactly one bind, with stable identity, anti-affinity, and dedicated egress IPs whitelisted by the MNO. Aligns with ADR-0004 §7.


US-SC-016 — Per-MNO Deployment manifests with stable bind identity

Title: As a platform engineer, I want one Deployment per smpp-connector-{mno}-{direction} so that bind-affinity and per-MNO scaling are independent.

Acceptance Criteria:

  • Deployments named smpp-connector-awcc-tx, -awcc-rx, -awcc-trx, etc. for each MNO × direction.
  • Each pod reads its MNO_ID and BIND_DIRECTION from env (set by Deployment).
  • Pod anti-affinity: same (mno, direction) cannot colocate on the same node.
  • Each pod registers itself in a NATS KV bucket smpp.bind.owners.{mno}.{direction} with TTL 30 s; only one owner at a time per (mno, direction, bindId).
  • Helm/Kustomize chart parameterises MNO list; new MNO requires only adding to values.yaml.

Story Points: 8


US-SC-017 — Bind-id partitioning across replicas

Title: As a platform engineer, I want bind-ids partitioned across replicas of the same Deployment so that horizontal scale-out doesn't cause bind collisions.

Acceptance Criteria:

  • Each pod claims a bindId range based on its POD_ORDINAL (StatefulSet) or NATS KV lock (Deployment).
  • Bind-id reclaim on pod loss within 30 s (TTL).
  • Metric smpp_bind_owner_changes_total{mno,direction} increments on bind ownership change.
  • Integration test: scale 2→4 pods; verify all 4 bindIds are bound across distinct pods.

Story Points: 5


US-SC-018 — Lane-specific bind subjects

Title: As the smpp-connector pool, I want lane-specific NATS subjects so that priority-lane pods bind separately and don't share queue heads.

Acceptance Criteria:

  • Subjects: lane.p0.smpp.{mno}.{direction}, lane.p1.smpp.{mno}.{direction}, etc.
  • Per-lane consumer ack/lag metrics independent.
  • Pod owns one or more lanes via env LANES=P0,P1 (default: all lanes).
  • Reserved P0/P1 pods may exist with LANES=P0,P1 only, separate Deployment.

Story Points: 5


US-SC-019 — Bind-affinity NATS queue group of 1

Title: As the smpp-connector pool, I want each (mno, direction, bindId) NATS subject to be consumed by exactly one pod so that PDU sequence order is preserved per bind.

Acceptance Criteria:

  • Subject smpp.{mno}.{direction}.{bindId}.dispatch consumer queue group qg-{mno}-{direction}-{bindId} (size 1 enforced).
  • Pod ownership tied to bind ownership (US-SC-017).
  • On bind ownership transfer, JetStream consumer is recreated atomically to avoid double-consumption.

Story Points: 5


US-SC-020 — Bind ownership reconciliation cron

Title: As an SRE, I want a reconciler that periodically checks no (mno, direction, bindId) is owned by zero or multiple pods.

Acceptance Criteria:

  • Cron every 30 s: list NATS KV entries; reconcile against Deployment pods.
  • Orphan ownership (TTL expired but not cleaned) is forcibly deleted.
  • Metric smpp_bind_orphan_total increments on cleanup.
  • Alert SmppBindOrphan if > 0 sustained for 5 min.

Story Points: 3


US-SC-021 — Per-MNO connector pool dashboard

Title: As the NOC, I want a Grafana panel showing each (mno, direction) bind status, owner pod, lag, and TPS.

Acceptance Criteria:

  • Panel displays binding state machine per MNO.
  • Drill-down to per-bind metrics.
  • Linked alert SmppMnoBindAllUnbound{mno} if all binds for an MNO are UNBOUND for 60 s.

Story Points: 3


EP-SC-06: Per-Bind Submit Window, Sequence-Number Manager, and Concatenation Buffer

Description: Each SMPP bind requires its own (a) submit window (max in-flight PDUs awaiting submit_sm_resp), (b) monotonic sequence-number counter (survives pod restart with 60 s warm-up), (c) concatenation buffer (UDH 0x00 / 0x08) for inbound MO concat reassembly.


US-SC-022 — Per-bind sequence-number manager (Redis-backed)

Title: As an smpp-connector pod, I want a per-bind monotonic sequence-number counter persisted in Redis so that pod restart doesn't reuse sequence numbers within the SMPP bind_seq window.

Acceptance Criteria:

  • Key smpp:seq:{mno}:{direction}:{bindId} is INCR-incremented in Redis.
  • Value modulo 0x7FFFFFFF (per SMPP spec).
  • On pod start, current Redis value loaded; in-process counter is Redis + 1000 to avoid races during 60 s warm-up.
  • Sequence reset only on unbind/rebind.
  • Metric smpp_seq_skip_total{mno,direction,bindId} (should always be 0; skip indicates overflow).

Story Points: 5


US-SC-023 — Per-bind submit window enforcement

Title: As an smpp-connector pod, I want a per-bind submit window of N (default 100, configurable per MNO) so that the platform doesn't overwhelm an MNO peer.

Acceptance Criteria:

  • In-flight counter incremented on each submit_sm; decremented on submit_sm_resp or timeout (30 s).
  • When counter ≥ window: NATS message NAKed with delay 100 ms.
  • Window value loaded from ops.smpp_binds.window per bind; reload every 60 s.
  • Metric smpp_window_inflight{mno,direction,bindId} gauge.
  • Integration test: window=10; submit 20 messages → first 10 sent, next 10 NAKed.

Story Points: 5


US-SC-024 — Adaptive window learning

Title: As an smpp-connector pod, I want the submit window to auto-tune downward when the MNO returns ESME_RTHROTTLED so that we converge to the MNO's actual capacity.

Acceptance Criteria:

  • On ESME_RTHROTTLED response, in-process window halved (min 10).
  • Window grows additively (+5 every 60 s) until reaching configured max.
  • Window value reported via metric smpp_window_dynamic{mno,direction,bindId}.
  • Tests cover: throttle event halves, growth back to baseline.

Story Points: 5


US-SC-025 — Inbound MO concatenation buffer (UDH 0x00 + 0x08)

Title: As an smpp-connector pod (RX/TRX), I want to reassemble multi-segment inbound MO messages so that downstream consumers see a single logical MO.

Acceptance Criteria:

  • Concat buffer keyed by (originator, refNum, totalParts) with TTL = concat_window_seconds (default 60).
  • Both UDH formats supported: 0x00 (8-bit refNum) and 0x08 (16-bit refNum).
  • On all parts received → publish single reassembled MO to sms.mo.inbound.
  • On TTL expiry with missing parts → publish partial MO with incomplete: true flag.
  • Metric smpp_concat_complete_total, smpp_concat_partial_total.

Story Points: 8


US-SC-026 — Outbound concatenation reference-number manager

Title: As an smpp-connector pod, I want unique 8-bit (or 16-bit) reference numbers per concatenated outbound message so that recipient handsets reassemble correctly.

Acceptance Criteria:

  • Per-bind INCR Redis counter smpp:concat:ref:{bindId} modulo 256 (8-bit).
  • 16-bit ref allocated when payload > 65535 8-bit refs in 60 s.
  • All segments of one outbound message share the same ref; carried in UDH.
  • Test: send a 4-segment GSM7 message; verify all 4 PDUs share one refNum.

Story Points: 3


US-SC-027 — Concat-buffer pressure metric and alert

Title: As the NOC, I want a metric on the concat-buffer fill so that runaway buffer growth is detected.

Acceptance Criteria:

  • Gauge smpp_concat_buffer_size{mno,bindId}.
  • Alert SmppConcatBufferLeak if > 10 000 sustained for 5 min.

Story Points: 2


EP-SC-07: ESME_R* Error Taxonomy

Description: Comprehensive handling of MNO error responses with appropriate behaviour per error class.


US-SC-028 — ESME_RTHROTTLED back-off (linked to US-SC-024)

Title: As an smpp-connector pod, I want ESME_RTHROTTLED to halve the window AND requeue the failed message after 500 ms.

Acceptance Criteria:

  • On RTHROTTLED → window halved (US-SC-024); failing message NAKed with 500 ms delay.
  • Repeated RTHROTTLED in 60 s → emit smpp.bind.degraded event.

Story Points: 3


US-SC-029 — ESME_RSUBMITFAIL half-close

Title: As an smpp-connector pod, I want ESME_RSUBMITFAIL on consecutive submits to half-close the bind (stop submitting; keep RX) so we don't hammer a confused MNO.

Acceptance Criteria:

  • 3 consecutive RSUBMITFAIL → bind enters HALF_CLOSED_TX state.
  • Half-closed bind continues to receive DLR but stops submitting.
  • Recovery: 60 s test-submit at 1/min until success → restore to BOUND.
  • Metric smpp_bind_half_closed_total.

Story Points: 3


US-SC-030 — ESME_RMSGQFUL drain mode

Title: As an smpp-connector pod, I want ESME_RMSGQFUL (queue full at MNO) to enter drain mode (stop submitting, wait for capacity) so we don't lose messages to repeated rejection.

Acceptance Criteria:

  • On RMSGQFUL → bind state = DRAINING; in-flight finishes.
  • Resume submit when in-flight = 0 + 30 s wait.
  • Customer-portal alert if drain duration > 5 min.

Story Points: 3


US-SC-031 — Comprehensive error → metric + log mapping

Title: As an SRE, I want every SMPP error code mapped to a Prometheus label and logged with structured context so that incidents can be triaged quickly.

Acceptance Criteria:

  • Counter smpp_response_status_total{mno,direction,bindId,esme_status}.
  • Pino log with traceId, messageId, esme_status, pduSeqNum.
  • Runbook entry per error class.

Story Points: 2


EP-SC-08: Dedicated Egress IP Pools per MNO

Description: MNOs whitelist source IPs. The platform must present a stable per-MNO egress and prevent collateral damage from one MNO's traffic affecting another.


US-SC-032 — Per-MNO egress IP allocation

Title: As a platform engineer, I want each MNO bind to egress through a dedicated IP from a per-MNO pool so that an MNO can rate-limit our IPs without affecting other MNOs.

Acceptance Criteria:

  • NodePool np-data has dedicated egress NAT gateway per MNO.
  • Pod env EGRESS_IP_POOL=awcc selects the right egress.
  • NetworkPolicy egress allows TCP only to the MNO's published SMPP IP:port.
  • Verified via curl ifconfig.me from inside pod returns the expected IP.

Story Points: 5


US-SC-033 — Egress-IP rotation and MNO notification

Title: As a network admin, I want documented procedure for rotating an egress IP with MNO coordination so that whitelist changes are predictable.

Acceptance Criteria:

  • Runbook runbooks/smpp-egress-ip-rotation.md exists.
  • Rotation requires: MNO notification (T-72h), new IP added to allow-list, old IP retained 30 d, then removed.
  • Automation: scripts/rotate-egress-ip.sh updates np-data NAT.

Story Points: 3


US-SC-034 — Source-IP egress dashboard

Title: As the NOC, I want a panel showing per-MNO egress IP and bytes/s so that egress saturation is visible.

Acceptance Criteria:

  • Panel: per-MNO egress bytes/s; per-MNO active connections.
  • Alert if any egress NAT exceeds 80% bandwidth budget.

Story Points: 2


EP-SC-09: Pashto/Dari/Arabic UCS-2 Conformance Suite

Description: SMPP encoding for Pashto/Dari/Arabic must be UCS-2; segment counts and content fingerprints must match billing's calculation. Inconsistency causes billing complaints.


US-SC-035 — UCS-2 round-trip conformance test

Title: As an SMPP engineer, I want a conformance test that round-trips Pashto and Dari content through GSM7-detection → UCS-2 encoding → decode back so that no characters are corrupted.

Acceptance Criteria:

  • Test corpus: 100 strings each of Pashto, Dari, Arabic, Persian-with-Arabic-numerals, mixed Latin+Arabic.
  • Each string: encode → decode → assert byte-equal to source.
  • Edge cases: ZWJ, ZWNJ, Tatweel, Arabic-Indic digits.
  • Test runs in CI; fails build on regression.

Story Points: 5


US-SC-036 — Segment-count parity with billing

Title: As a billing engineer, I want smpp-connector's segment-count calculation to match billing-service's calculation byte-for-byte so that billing matches actual SMPP segments dispatched.

Acceptance Criteria:

  • Shared library @ghasi/sms-segment-counter published; both services consume the same version.
  • CI contract test: 1000-string corpus → both services compute identical (encoding, segmentCount).
  • Fingerprint included in NATS payload sms.dispatch.command.segmentFingerprint.

Story Points: 5


US-SC-037 — Auto-detection threshold tuning

Title: As an SMPP engineer, I want auto-detection of UCS-2 to trigger correctly even with one non-GSM7 character so that the recipient is not garbled.

Acceptance Criteria:

  • Detection: any character not in GSM7 default + extension table → UCS-2.
  • Test: ASCII string with one Pashto character → UCS-2 segments (60 chars max).
  • Override: encoding=GSM7_FORCE lossy with replacement char (?); audit-logged.
  • Telemetry: smpp_encoding_total{encoding,detection=auto|forced}.

Story Points: 3