Skip to main content

SMS Orchestrator — Deployment Topology

Status: populated Owner: SRE Last updated: 2026-04-18

1. Runtime

Node.js 22 LTS, TypeScript 5.x strict.
NestJS 10 + @nestjs/platform-fastify.
Image: distroless Node, multi-stage build, non-root.
Listen ports: 3010 HTTP, 9464 metrics, 9100 gRPC (if future).

2. Kubernetes

Object	Value
Kind	Deployment
Replicas	3 (min) / HPA 3–10
HPA triggers	CPU 70%, custom metric `orch_submit_inflight` (target 60 per pod)
Rolling update	`maxSurge=25%, maxUnavailable=0`
PDB	`minAvailable=2`
Resources	`requests: 500m CPU / 512Mi`; `limits: 2 CPU / 1Gi`
ServiceAccount	`sms-orchestrator` (Vault K8s auth)

3. Dependencies (at runtime)

Dep	Mode	Notes
PostgreSQL	Primary writer + read replicas	Connection pool 20 per pod
Redis	Cluster (6 nodes)	Dedicated namespace; pipeline commands
NATS JetStream	3-node cluster	Durable consumer `orch-consumer`
routing-engine	gRPC	Client-side LB, timeout 200 ms, retry 1
Vault	Sidecar	JWT signing keys fetch at startup

4. Kong routes (upstream of this service)

See api-gateway/API_CONTRACTS. Relevant routes:

/v1/sms/send → sms-orchestrator:3010
/v1/sms/bulk → sms-orchestrator:3010
/v1/sms/{id} → sms-orchestrator:3010

5. Regions

Current: single primary region, optional DR (assumption A-001 in 01). Active-active multi-region is a future work item.

6. Scaling model

Submit path: CPU-bound on JSON + Zod; scales linearly.
Pipeline path: mostly I/O wait; limited by routing-engine + NATS publish throughput.
Benchmarked: 1500 submits/sec per pod at 70% CPU (4 vCPU).

7. Rollout

Canary: 5% traffic for 30 min, 2 replicas.
Promote if SLI-ok (error rate < 1%, latency P95 < 250 ms).
Rollback = revert Deployment image tag; Kong needs no change.

1. Runtime
2. Kubernetes
3. Dependencies (at runtime)
4. Kong routes (upstream of this service)
5. Regions
6. Scaling model
7. Rollout