SMS Orchestrator — Deployment Topology
Status: populated Owner: SRE Last updated: 2026-04-18
1. Runtime
- Node.js 22 LTS, TypeScript 5.x strict.
- NestJS 10 +
@nestjs/platform-fastify. - Image: distroless Node, multi-stage build, non-root.
- Listen ports:
3010HTTP,9464metrics,9100gRPC (if future).
2. Kubernetes
| Object | Value |
|---|---|
| Kind | Deployment |
| Replicas | 3 (min) / HPA 3–10 |
| HPA triggers | CPU 70%, custom metric orch_submit_inflight (target 60 per pod) |
| Rolling update | maxSurge=25%, maxUnavailable=0 |
| PDB | minAvailable=2 |
| Resources | requests: 500m CPU / 512Mi; limits: 2 CPU / 1Gi |
| ServiceAccount | sms-orchestrator (Vault K8s auth) |
3. Dependencies (at runtime)
| Dep | Mode | Notes |
|---|---|---|
| PostgreSQL | Primary writer + read replicas | Connection pool 20 per pod |
| Redis | Cluster (6 nodes) | Dedicated namespace; pipeline commands |
| NATS JetStream | 3-node cluster | Durable consumer orch-consumer |
| routing-engine | gRPC | Client-side LB, timeout 200 ms, retry 1 |
| Vault | Sidecar | JWT signing keys fetch at startup |
4. Kong routes (upstream of this service)
See api-gateway/API_CONTRACTS. Relevant routes:
/v1/sms/send→sms-orchestrator:3010/v1/sms/bulk→sms-orchestrator:3010/v1/sms/{id}→sms-orchestrator:3010
5. Regions
Current: single primary region, optional DR (assumption A-001 in 01). Active-active multi-region is a future work item.
6. Scaling model
- Submit path: CPU-bound on JSON + Zod; scales linearly.
- Pipeline path: mostly I/O wait; limited by routing-engine + NATS publish throughput.
- Benchmarked: 1500 submits/sec per pod at 70% CPU (4 vCPU).
7. Rollout
- Canary: 5% traffic for 30 min, 2 replicas.
- Promote if SLI-ok (error rate < 1%, latency P95 < 250 ms).
- Rollback = revert Deployment image tag; Kong needs no change.