routing-engine — Migration Plan
Status: populated | Last updated: 2026-04-18
Current State
routing-engine is a greenfield service — there is no existing system to migrate from. Routing decisions are currently made inline within sms-orchestrator using a basic static configuration (no database-driven rules, no health-aware selection). This plan describes the path from the current monolithic routing logic to a standalone routing-engine microservice.
Phase 1: Schema and Seed Data (Week 1–2)
Goal: Deploy the ops_routing PostgreSQL schema and populate it with initial routing rules.
Tasks:
- Apply
ops_routingschema migrations via Flyway (managed byoperator-management-servicemigration pipeline). - Seed initial routing rules from the existing static configuration in
sms-orchestrator. - Verify
routing_rules,routing_rule_operators,destination_prefixes, andoperatorstables are populated correctly in staging. - Confirm
routing-engineread-only DB user hasSELECTgrants only.
Exit criteria: All existing static routes are represented as database rows in staging.
Phase 2: Service Deployment in Shadow Mode (Week 3–4)
Goal: Deploy routing-engine alongside the current inline routing logic. sms-orchestrator calls routing-engine but ignores the result, using its own decision.
Tasks:
- Deploy
routing-engineto staging with 2 replicas. - Update
sms-orchestratorto callSelectOperatorin parallel (shadow mode — fire-and-forget, do not use result). - Compare shadow results vs current decisions; log discrepancies.
- Tune routing rules until shadow decisions match current decisions for ≥ 99% of messages.
- Validate latency: shadow calls must complete within 50 ms P95.
Exit criteria: Shadow mode discrepancy rate < 1%; P95 latency ≤ 50 ms.
Phase 3: Cutover (Week 5)
Goal: sms-orchestrator uses routing-engine as the authoritative routing source.
Tasks:
- Update
sms-orchestratorto userouting-engineresponse as the primary routing decision. - Keep shadow comparison logging for 72 hours post-cutover.
- Monitor error rate, latency, and routing decision quality dashboards.
- Remove legacy inline routing logic from
sms-orchestratorafter 72-hour validation window.
Rollback plan: Feature flag in sms-orchestrator to revert to inline routing without redeployment.
Exit criteria: Zero regressions in message delivery rate; P95 ≤ 50 ms sustained; no critical alerts.
Phase 4: Production Hardening (Week 6+)
Tasks:
- Enable HPA (min 3, max 10 replicas).
- Apply NetworkPolicy and mTLS in production.
- Tune Redis cache TTLs based on observed cache-miss ratio.
- Implement cache-bust webhook from
operator-management-service. - Document operational runbook.