Skip to main content

Operator Management Service — Service Overview

Status: populated Owner: Platform Engineering Last updated: 2026-04-18 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · SECURITY_MODEL

1. Purpose

operator-management-service is the authoritative configuration store for all SMPP operators (carriers) in the Ghasi-SMS-Gateway platform. It:

  • Stores operator metadata, SMPP credentials (exclusively in HashiCorp Vault), routing rules, TPS limits, and real-time health state.
  • Exposes an admin REST API (via Kong, JWT-protected) for full CRUD operations on operators, routing rules, TPS limits, and destination prefix tables.
  • Exposes an internal REST API (mTLS, no Kong route) that peer services call to fetch operator credentials and config without going through the public edge.
  • Propagates all configuration changes via NATS JetStream events so downstream services (routing-engine, smpp-connector) can react without polling.
  • Maintains an operator health cache in Redis (ops:health:{operatorId}, TTL 60 s) that routing-engine reads directly for real-time routing decisions.

2. Bounded Context

Operator Configuration — authoritative source of truth for carrier connectivity configuration. Classified as Core-Support (no business transaction can route without its data; downtime prevents new operator onboarding but does not immediately halt in-flight traffic if caches are warm).

3. Responsibilities

AreaWhat this service owns
Operator CRUDFull lifecycle: create, read, update, soft-delete operators
Credential managementWrite/read SMPP passwords via Vault at secret/ops/operators/{operatorId}/credentials; never stored in PostgreSQL
Routing rulesDestination prefix tables, priority, weight, cost per segment
TPS limitsPer-operator TPS cap, burst window, enforcement mode
Health stateIngest health events from smpp-connector; persist to ops.operator_health_log; cache to Redis
Config propagationPublish operator.config.* and operator.health NATS events on every change
Duplicate preventionUnique constraint on (host, port, system_id) across operators
Audit trailSoft-delete (never hard-delete); updated_by, created_by columns on all rows

4. Non-Responsibilities

AreaOwner
SMPP connection lifecyclesmpp-connector
Route selection (LCR / QoS)routing-engine
SMS message lifecyclesms-orchestrator
Billing ratesbilling-service
Customer account managementauth-service / admin-dashboard

5. Dependencies

DependencyKindPurpose
Kong GatewayUpstream (HTTP)Proxies /v1/admin/operators/* admin routes; enforces JWT admin scope
HashiCorp VaultSecret storeStores/retrieves SMPP credentials; K8s auth
PostgreSQL (schema ops)Data storeOperators, routing rules, prefix tables, TPS limits, health log
RedisCacheops:health:{operatorId} TTL 60 s; read by routing-engine
NATS JetStreamEvent busPublishes operator.config.*, operator.health events
smpp-connectorCallerCalls internal /v1/internal/operators/:id/credentials via mTLS
routing-engineSubscriberReads Redis health cache; consumes NATS config events

6. High-Level Flow — Admin Create Operator

7. Key Design Decisions

DecisionRationaleTrade-off
Credentials stored only in VaultSMPP passwords are privileged secrets; PG at-rest encryption is insufficient for this threat modelVault outage blocks credential reads; mitigated by smpp-connector in-memory cache (30 min)
Soft-delete, not hard-deleteAudit trail; routing-engine caches operator config — hard-delete without drain causes orphaned cache entriesdeleted_at IS NOT NULL filter required on every query
Redis health cache authoritative for routingrouting-engine needs sub-millisecond health lookup; DB query on every route decision is too slowCache can be stale up to TTL 60 s; acceptable lag per SLA
NATS propagation (not polling)Config changes should propagate in < 1 s to routing-engine and smpp-connectorConsumers must handle out-of-order events; version field on event payload
mTLS for internal credential endpointCredential exposure must be strictly service-to-service; no Kong route means no risk of misconfigured Kong plugin exposing itmTLS cert rotation is an operational overhead

8. Status

Design approved. Implementation in progress. See SERVICE_READINESS for gate checklist.