Skip to main content

SMS Orchestrator — Service Overview

Status: populated Owner: Platform Engineering Last updated: 2026-04-18 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · ADR-0001 Kong edge · 01 Architecture

1. Purpose

sms-orchestrator is the central processing engine for outbound SMS in Ghasi-SMS-Gateway. It:

  • Accepts HTTP submission of outbound SMS from external clients (via Kong) — per ADR-0001 this responsibility moved from the retired custom api-gateway to this service.
  • Validates, normalizes, and idempotency-checks each request at the HTTP boundary.
  • Publishes sms.outbound.request to NATS JetStream for asynchronous pipeline processing.
  • Consumes the same subject (it is both producer and consumer) and executes the five-stage pipeline: idempotency → validation → routing → operator publish → state persistence.
  • Emits status domain events (sms.events.status) and DLQ events (sms.outbound.deadletter) on terminal failures.

2. Bounded Context

Outbound Messaging Pipeline — authoritative lifecycle owner of every SMS message from HTTP accept through carrier-bound publish. Classified as Core (business-critical; financial correctness + SLA depend on it).

3. Responsibilities

AreaWhat orchestrator owns
HTTP submit APIPOST /v1/sms/send, POST /v1/sms/bulk, GET /v1/sms/{messageId} — fronted by Kong
IdempotencyIdempotency-Key header replay window (Redis, 48h TTL)
Input validationZod schema for SMS payload; E.164 phone normalization; body length + segment count
Submission acceptPersists sms_messages row with QUEUED, publishes sms.outbound.request to NATS
Pipeline orchestrationIdempotency check → domain validation → routing (gRPC to routing-engine) → operator publish → PG state update
Retry logicExponential backoff (1s → 2s → 4s), max 3 attempts, stored in PG
DLQ routingPublish sms.outbound.deadletter on terminal failure; ACK original NATS
Status eventsPublish sms.events.status on every state transition

4. Non-Responsibilities

AreaOwnerWhy not orchestrator
AuthN (JWT / API key)Kong (+ auth-service JWKS/consumer lookup)Edge gateway layer
Rate limitingKong (rate-limiting-advanced plugin, Redis)Edge concern
TLS terminationCloudflare + KongEdge concern
Route selection (operator)routing-engine (gRPC)Separate bounded context (LCR + QoS)
SMPP submissionsmpp-connectorSeparate bounded context (protocol)
DLR correlationdlr-processorSeparate bounded context (ingest)
Rating + billingbilling-serviceConsumes domain events
Customer webhook deliverywebhook-dispatcherConsumes domain events

5. Dependencies

DependencyKindPurpose
Kong GatewayUpstream (HTTP)Proxies /v1/sms/* routes to this service
NATS JetStreamEvent busPublish + consume sms.outbound.request; publish smpp.operator.*, sms.events.status, sms.outbound.retry, sms.outbound.deadletter
PostgreSQL (schema orch)Data storesms_messages, idempotency_keys
RedisCacheIdempotency-Key storage (orch:idem:*)
routing-enginegRPCOperator selection (P95 ≤ 50 ms)
auth-serviceHTTPAccount metadata lookup when needed

6. High-Level Flow

7. Key Design Decisions

DecisionRationaleTrade-off
HTTP submit lives here, not in an edge gatewayKong cannot own idempotency storage + Zod validation cleanly; these are application concernsOne more responsibility for this service — acceptable; it already owns the message lifecycle
Async pipeline after HTTP acceptSub-second 202 response; pipeline absorbs operator latencyExtra NATS hop; mitigated by idempotency replay
Redis SET NX for idempotency, not PGAtomic + fast (~1 ms)Redis outage → fail open on idempotency; mitigated by NATS AckWait
Application-level retry (not NATS MaxDeliver)NATS lacks per-attempt backoffMust persist attempt_count to survive restarts
Status update AFTER operator publish ACKOperator publish is the point of no returnSmall crash window → reconciliation job detects ROUTED stuck rows
gRPC (sync) to routing-engineTight latency budget (P95 500 ms E2E)routing-engine outage handled by orchestrator retry

8. Status

Pipeline + pre-Kong HTTP submit: design approved (moved here from retired api-gateway per ADR-0001). See MIGRATION_PLAN for cutover, SERVICE_READINESS for gate checklist.