Operator Management Service — Service Overview
Status: populated Owner: Platform Engineering Last updated: 2026-04-18 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · SECURITY_MODEL
1. Purpose
operator-management-service is the authoritative configuration store for all SMPP operators (carriers) in the Ghasi-SMS-Gateway platform. It:
- Stores operator metadata, SMPP credentials (exclusively in HashiCorp Vault), routing rules, TPS limits, and real-time health state.
- Exposes an admin REST API (via Kong, JWT-protected) for full CRUD operations on operators, routing rules, TPS limits, and destination prefix tables.
- Exposes an internal REST API (mTLS, no Kong route) that peer services call to fetch operator credentials and config without going through the public edge.
- Propagates all configuration changes via NATS JetStream events so downstream services (routing-engine, smpp-connector) can react without polling.
- Maintains an operator health cache in Redis (
ops:health:{operatorId}, TTL 60 s) that routing-engine reads directly for real-time routing decisions.
2. Bounded Context
Operator Configuration — authoritative source of truth for carrier connectivity configuration. Classified as Core-Support (no business transaction can route without its data; downtime prevents new operator onboarding but does not immediately halt in-flight traffic if caches are warm).
3. Responsibilities
| Area | What this service owns |
|---|---|
| Operator CRUD | Full lifecycle: create, read, update, soft-delete operators |
| Credential management | Write/read SMPP passwords via Vault at secret/ops/operators/{operatorId}/credentials; never stored in PostgreSQL |
| Routing rules | Destination prefix tables, priority, weight, cost per segment |
| TPS limits | Per-operator TPS cap, burst window, enforcement mode |
| Health state | Ingest health events from smpp-connector; persist to ops.operator_health_log; cache to Redis |
| Config propagation | Publish operator.config.* and operator.health NATS events on every change |
| Duplicate prevention | Unique constraint on (host, port, system_id) across operators |
| Audit trail | Soft-delete (never hard-delete); updated_by, created_by columns on all rows |
4. Non-Responsibilities
| Area | Owner |
|---|---|
| SMPP connection lifecycle | smpp-connector |
| Route selection (LCR / QoS) | routing-engine |
| SMS message lifecycle | sms-orchestrator |
| Billing rates | billing-service |
| Customer account management | auth-service / admin-dashboard |
5. Dependencies
| Dependency | Kind | Purpose |
|---|---|---|
| Kong Gateway | Upstream (HTTP) | Proxies /v1/admin/operators/* admin routes; enforces JWT admin scope |
| HashiCorp Vault | Secret store | Stores/retrieves SMPP credentials; K8s auth |
PostgreSQL (schema ops) | Data store | Operators, routing rules, prefix tables, TPS limits, health log |
| Redis | Cache | ops:health:{operatorId} TTL 60 s; read by routing-engine |
| NATS JetStream | Event bus | Publishes operator.config.*, operator.health events |
| smpp-connector | Caller | Calls internal /v1/internal/operators/:id/credentials via mTLS |
| routing-engine | Subscriber | Reads Redis health cache; consumes NATS config events |
6. High-Level Flow — Admin Create Operator
7. Key Design Decisions
| Decision | Rationale | Trade-off |
|---|---|---|
| Credentials stored only in Vault | SMPP passwords are privileged secrets; PG at-rest encryption is insufficient for this threat model | Vault outage blocks credential reads; mitigated by smpp-connector in-memory cache (30 min) |
| Soft-delete, not hard-delete | Audit trail; routing-engine caches operator config — hard-delete without drain causes orphaned cache entries | deleted_at IS NOT NULL filter required on every query |
| Redis health cache authoritative for routing | routing-engine needs sub-millisecond health lookup; DB query on every route decision is too slow | Cache can be stale up to TTL 60 s; acceptable lag per SLA |
| NATS propagation (not polling) | Config changes should propagate in < 1 s to routing-engine and smpp-connector | Consumers must handle out-of-order events; version field on event payload |
| mTLS for internal credential endpoint | Credential exposure must be strictly service-to-service; no Kong route means no risk of misconfigured Kong plugin exposing it | mTLS cert rotation is an operational overhead |
8. Status
Design approved. Implementation in progress. See SERVICE_READINESS for gate checklist.