Skip to main content

routing-engine — Migration Plan

Status: populated | Last updated: 2026-04-18

Current State

routing-engine is a greenfield service — there is no existing system to migrate from. Routing decisions are currently made inline within sms-orchestrator using a basic static configuration (no database-driven rules, no health-aware selection). This plan describes the path from the current monolithic routing logic to a standalone routing-engine microservice.


Phase 1: Schema and Seed Data (Week 1–2)

Goal: Deploy the ops_routing PostgreSQL schema and populate it with initial routing rules.

Tasks:

  1. Apply ops_routing schema migrations via Flyway (managed by operator-management-service migration pipeline).
  2. Seed initial routing rules from the existing static configuration in sms-orchestrator.
  3. Verify routing_rules, routing_rule_operators, destination_prefixes, and operators tables are populated correctly in staging.
  4. Confirm routing-engine read-only DB user has SELECT grants only.

Exit criteria: All existing static routes are represented as database rows in staging.


Phase 2: Service Deployment in Shadow Mode (Week 3–4)

Goal: Deploy routing-engine alongside the current inline routing logic. sms-orchestrator calls routing-engine but ignores the result, using its own decision.

Tasks:

  1. Deploy routing-engine to staging with 2 replicas.
  2. Update sms-orchestrator to call SelectOperator in parallel (shadow mode — fire-and-forget, do not use result).
  3. Compare shadow results vs current decisions; log discrepancies.
  4. Tune routing rules until shadow decisions match current decisions for ≥ 99% of messages.
  5. Validate latency: shadow calls must complete within 50 ms P95.

Exit criteria: Shadow mode discrepancy rate < 1%; P95 latency ≤ 50 ms.


Phase 3: Cutover (Week 5)

Goal: sms-orchestrator uses routing-engine as the authoritative routing source.

Tasks:

  1. Update sms-orchestrator to use routing-engine response as the primary routing decision.
  2. Keep shadow comparison logging for 72 hours post-cutover.
  3. Monitor error rate, latency, and routing decision quality dashboards.
  4. Remove legacy inline routing logic from sms-orchestrator after 72-hour validation window.

Rollback plan: Feature flag in sms-orchestrator to revert to inline routing without redeployment.

Exit criteria: Zero regressions in message delivery rate; P95 ≤ 50 ms sustained; no critical alerts.


Phase 4: Production Hardening (Week 6+)

Tasks:

  1. Enable HPA (min 3, max 10 replicas).
  2. Apply NetworkPolicy and mTLS in production.
  3. Tune Redis cache TTLs based on observed cache-miss ratio.
  4. Implement cache-bust webhook from operator-management-service.
  5. Document operational runbook.