Operator Management Service — Sync Contract
Status: populated Owner: Platform Engineering Last updated: 2026-04-18
1. Per-Aggregate Policy
| Aggregate | Policy | Rationale |
|---|---|---|
Operator | server_authoritative | OMS is the sole writer; consumers (routing-engine, smpp-connector) are read-only subscribers |
RoutingRule | server_authoritative | Admin-only writes; downstream caches invalidated via NATS |
| Health state | server_authoritative (OMS aggregates) | smpp-connector reports raw health; OMS reduces to authoritative state |
2. Downstream Cache Coherence
routing-engine and smpp-connector maintain local in-memory caches of operator config:
| Consumer | Cache TTL | Invalidation mechanism |
|---|---|---|
| routing-engine | 5 min in-memory + Redis ops:health:* (60 s) | NATS operator.config.* events + health events |
| smpp-connector | 30 min in-memory | NATS operator.config.* events; credential refresh via internal API on bind |
Worst-case propagation delay: NATS publish latency (~50 ms) + consumer processing (~10 ms). Effectively < 1 s.
3. Eventual Consistency Guarantees
- Config events carry a
versionfield (monotonically incrementing per operator). Consumers ignore events withversion≤ their current known version. - Health state: Redis TTL provides a soft consistency boundary. On Redis miss, consumers fall back to the internal REST API (synchronous, strongly consistent).
4. Replay Tolerance
- NATS
operator.config.*subject uses WorkQueue policy — exactly-once delivery. If consumer misses an event (service restart), it resumes from the durable consumer offset. - On routing-engine cold start: fetches full operator list via
GET /v1/internal/operators(REST), then subscribes to NATS for incremental updates.