Skip to main content

Operator Management Service — Deployment Topology

Status: populated Owner: SRE Last updated: 2026-04-18

1. Runtime

  • Node.js 22 LTS, TypeScript 5.x strict.
  • NestJS 10 + @nestjs/platform-fastify.
  • Image: distroless Node, multi-stage build, non-root.
  • Listen ports: 3020 HTTP (admin + internal), 9464 metrics.

2. Kubernetes

ObjectValue
KindDeployment
Replicas2 (min) / HPA 2–6
HPA triggersCPU 70%
Rolling updatemaxSurge=1, maxUnavailable=0
PDBminAvailable=1
Resourcesrequests: 250m CPU / 256Mi; limits: 1 CPU / 512Mi
ServiceAccountoperator-management-service (Vault K8s auth)

3. Vault Agent Sidecar

annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "operator-management-service"
vault.hashicorp.com/agent-pre-populate-only: "false"

Vault Agent renews the service token automatically. Service reads credentials on demand via Vault HTTP API (not injected file) to support runtime credential rotation.

4. Dependencies (at runtime)

DepModeNotes
PostgreSQLPrimary writer + read replicaPool 10 per pod
RedisCluster (6 nodes)Health cache namespace ops:
NATS JetStream3-node clusterDurable consumer ops-health-consumer
VaultSidecar + HTTP APIsecret/ops/* policy

5. Kong Routes (admin)

RouteUpstream
/v1/admin/operators*operator-management-service:3020

No Kong route for /v1/internal/* — accessed only intra-cluster.

6. mTLS (internal API)

Service mesh (Istio/Linkerd) enforces mTLS on port 3020 for internal callers. PeerAuthentication policy set to STRICT for the namespace. AuthorizationPolicy allows only namespaces smpp-connector and routing-engine.

7. Regions

Stateless HTTP layer; PostgreSQL and Redis are regional. Operator config is global — in multi-region, config is replicated via NATS geo-replication (future). Current: single region.

8. Rollout

  • Standard rolling deploy (no canary needed — admin-only low-traffic service).
  • Feature flags for any breaking config schema changes.