Skip to main content

Operator Management Service — Sync Contract

Status: populated Owner: Platform Engineering Last updated: 2026-04-18

1. Per-Aggregate Policy

AggregatePolicyRationale
Operatorserver_authoritativeOMS is the sole writer; consumers (routing-engine, smpp-connector) are read-only subscribers
RoutingRuleserver_authoritativeAdmin-only writes; downstream caches invalidated via NATS
Health stateserver_authoritative (OMS aggregates)smpp-connector reports raw health; OMS reduces to authoritative state

2. Downstream Cache Coherence

routing-engine and smpp-connector maintain local in-memory caches of operator config:

ConsumerCache TTLInvalidation mechanism
routing-engine5 min in-memory + Redis ops:health:* (60 s)NATS operator.config.* events + health events
smpp-connector30 min in-memoryNATS operator.config.* events; credential refresh via internal API on bind

Worst-case propagation delay: NATS publish latency (~50 ms) + consumer processing (~10 ms). Effectively < 1 s.

3. Eventual Consistency Guarantees

  • Config events carry a version field (monotonically incrementing per operator). Consumers ignore events with version ≤ their current known version.
  • Health state: Redis TTL provides a soft consistency boundary. On Redis miss, consumers fall back to the internal REST API (synchronous, strongly consistent).

4. Replay Tolerance

  • NATS operator.config.* subject uses WorkQueue policy — exactly-once delivery. If consumer misses an event (service restart), it resumes from the durable consumer offset.
  • On routing-engine cold start: fetches full operator list via GET /v1/internal/operators (REST), then subscribes to NATS for incremental updates.