routing-engine — Application Logic
Status: populated | Last updated: 2026-04-18
1. Use Cases
UC-01: SelectOperator (gRPC Handler)
Trigger: sms-orchestrator calls RoutingService/SelectOperator
Input: { to: string, accountId: string, messageType: MessageType }
Output: OperatorConfig { operatorId, host, port, systemId, tpsLimit }
Steps:
- Validate input fields; reject with
INVALID_ARGUMENTiftois not a valid E.164 number. - Build cache key:
route:decision:{prefix}:{accountId}:{messageType}— whereprefixis computed via longest-prefix match (step 4 if cache miss). - Attempt Redis
GETon the cache key. - Cache HIT → deserialize and return
OperatorConfigimmediately. - Cache MISS:
a. Run longest-prefix match against
destination_prefixestable for thetonumber. b. If no prefix matches → return gRPC statusNOT_FOUNDwith message"No routing rule for destination". c. Load all activerouting_rulesfor(prefixId, accountId OR global). Prefer account-scoped over global rules. d. Load operator health states from Redis (MGET operator:health:{operatorId}for all candidate operators). e. Filter candidates toBOUNDorFAILBACKstatus. If none → returnUNAVAILABLE. f. Apply routing strategy:- COST: Sort by
cost ASC; pick first. - PRIORITY: Sort by
priority ASC; pick first. - FAILOVER: Iterate in priority order; return first healthy operator.
g. Fetch full
OperatorConfigfromoperatorstable for selectedoperatorId. h. Serialise decision andSETRedis cache key with TTL 300 s.
- COST: Sort by
- Return
OperatorConfig.
Error codes:
| gRPC status | Condition |
|---|---|
INVALID_ARGUMENT | to not E.164, missing accountId |
NOT_FOUND | No prefix match or no routing rule |
UNAVAILABLE | All candidate operators unhealthy |
INTERNAL | Unexpected DB/Redis error (logged, not surfaced) |
UC-02: ConsumeOperatorHealthEvent (NATS Consumer)
Trigger: Message on NATS subject operator.health
Input: OperatorHealthEvent { operatorId, status, timestamp }
Steps:
- Deserialize and validate the event payload.
- Compute Redis key:
operator:health:{operatorId}. - Write
{ status, updatedAt }as a JSON string withSET ... EX 60. - If status transitions to
UNBOUND: proactively invalidate allroute:decision:*:*:*cache keys that reference thisoperatorId(scan patternroute:decision:*and delete matching entries). This is a best-effort operation — cache TTL will eventually self-correct. - Acknowledge NATS message (manual ack with JetStream).
- Emit structured log:
{ event: "health.updated", operatorId, status }.
UC-03: Health / Readiness Probe
Trigger: Kubernetes kubelet HTTP GET /health or /ready
Steps for /ready:
- Attempt a
PINGto Redis. - Attempt a lightweight
SELECT 1on the PostgreSQL read replica. - Return HTTP 200 if both pass; HTTP 503 otherwise with
{ reason }.
Steps for /health (liveness):
- Return HTTP 200 with
{ status: "ok", uptime }— no dependency checks (avoid false restarts).
2. Longest-Prefix Match Algorithm
The service must resolve a full E.164 number (e.g. +447911123456) to its most specific matching prefix (e.g. +4479 before +447 before +44).
function findLongestPrefixMatch(to: string, prefixes: DestinationPrefix[]): DestinationPrefix | null {
return prefixes
.filter(p => to.startsWith(p.prefix))
.sort((a, b) => b.prefix.length - a.prefix.length)[0] ?? null;
}
In production, prefixes are loaded once at startup and re-fetched on a 60 s background interval. They are not stored in Redis — the working set is small enough to fit in process memory.
3. Cache Invalidation Policy
| Trigger | Action |
|---|---|
operator.health → UNBOUND | Scan and delete route:decision:* entries referencing that operatorId |
| Routing rule updated (future webhook) | Delete all route:decision:* keys for affected prefix+account pair |
| Cache TTL expiry (300 s) | Natural expiry; next call re-resolves from DB |
| Health TTL expiry (60 s) | Natural expiry; next health check re-reads or NATS event refreshes |