Skip to main content

SMS Orchestrator — Deployment Topology

Status: populated Owner: SRE Last updated: 2026-04-18

1. Runtime

  • Node.js 22 LTS, TypeScript 5.x strict.
  • NestJS 10 + @nestjs/platform-fastify.
  • Image: distroless Node, multi-stage build, non-root.
  • Listen ports: 3010 HTTP, 9464 metrics, 9100 gRPC (if future).

2. Kubernetes

ObjectValue
KindDeployment
Replicas3 (min) / HPA 3–10
HPA triggersCPU 70%, custom metric orch_submit_inflight (target 60 per pod)
Rolling updatemaxSurge=25%, maxUnavailable=0
PDBminAvailable=2
Resourcesrequests: 500m CPU / 512Mi; limits: 2 CPU / 1Gi
ServiceAccountsms-orchestrator (Vault K8s auth)

3. Dependencies (at runtime)

DepModeNotes
PostgreSQLPrimary writer + read replicasConnection pool 20 per pod
RedisCluster (6 nodes)Dedicated namespace; pipeline commands
NATS JetStream3-node clusterDurable consumer orch-consumer
routing-enginegRPCClient-side LB, timeout 200 ms, retry 1
VaultSidecarJWT signing keys fetch at startup

4. Kong routes (upstream of this service)

See api-gateway/API_CONTRACTS. Relevant routes:

  • /v1/sms/sendsms-orchestrator:3010
  • /v1/sms/bulksms-orchestrator:3010
  • /v1/sms/{id}sms-orchestrator:3010

5. Regions

Current: single primary region, optional DR (assumption A-001 in 01). Active-active multi-region is a future work item.

6. Scaling model

  • Submit path: CPU-bound on JSON + Zod; scales linearly.
  • Pipeline path: mostly I/O wait; limited by routing-engine + NATS publish throughput.
  • Benchmarked: 1500 submits/sec per pod at 70% CPU (4 vCPU).

7. Rollout

  • Canary: 5% traffic for 30 min, 2 replicas.
  • Promote if SLI-ok (error rate < 1%, latency P95 < 250 ms).
  • Rollback = revert Deployment image tag; Kong needs no change.