Skip to main content

routing-engine — Service Readiness

Status: populated | Last updated: 2026-04-18

Production Readiness Checklist

Functional

  • SelectOperator gRPC handler implements all three routing strategies (COST, PRIORITY, FAILOVER)
  • Longest-prefix matching resolves E.164 numbers correctly
  • Routing decision is cached in Redis with 300 s TTL
  • Operator health cache is updated from NATS operator.health events
  • Cache invalidation on UNBOUND events is implemented and tested
  • /health, /ready, /metrics endpoints operational
  • gRPC error codes (NOT_FOUND, UNAVAILABLE, INVALID_ARGUMENT, INTERNAL) returned correctly

Testing

  • Unit test coverage ≥ 80%
  • Integration tests pass with Testcontainers (PostgreSQL + Redis)
  • Consumer-driven contract tests pass (Pact verification with sms-orchestrator)
  • Performance smoke test: P95 ≤ 50 ms at 500 RPS
  • No ESLint or TypeScript compilation errors

Security

  • mTLS configured and verified in staging environment
  • PostgreSQL connection uses read-only user
  • Secrets mounted via Kubernetes Secrets (not environment literals)
  • NetworkPolicy applied restricting inbound to sms-orchestrator only
  • to field masked in logs (only prefix shown)
  • No hardcoded credentials in source code

Observability

  • All Prometheus metrics emitting correctly (verified in staging)
  • Structured JSON logs confirmed in log aggregation system
  • OpenTelemetry traces visible in tracing backend
  • Alerts configured and tested: RoutingEngineHighLatency, RoutingEngineHighErrorRate, RoutingEngineNoHealthyOperators

Deployment

  • Docker image published to registry with correct tag
  • Kubernetes Deployment, Service, HPA manifests applied to staging
  • Rolling update tested (zero downtime confirmed)
  • Resource requests/limits validated under load
  • Horizontal Pod Autoscaler triggers tested

Operational

  • Runbook written and linked from internal wiki
  • On-call rotation updated to include routing-engine alerts
  • Database seed and migration scripts verified in staging
  • Redis eviction policy confirmed (allkeys-lru or volatile-lru)
  • NATS consumer group durable name registered in JetStream config