Channel Router Service — Application Logic
Version: 1.0 Status: Draft Owner: Messaging Core Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS
1. Use Case Catalogue
| # | Use case | Trigger | Latency budget (P95) |
|---|---|---|---|
| UC-01 | RouteWithFallback | gRPC from sms-orchestrator | ≤ 50 ms (decision only) |
| UC-02 | DeliverViaChannel | Internal dispatch per step | Per-adapter (see §3) |
| UC-03 | HandleFallback | Step deadline elapsed or negative outcome | ≤ 5 s full cascade (3-step OTP) |
| UC-04 | RouteMo | NATS mo.allowed.v1 | ≤ 1 s to tenant webhook ACK |
| UC-05 | ConversationSession (lookup / refresh / close) | Per-MT and per-MO | ≤ 5 ms |
| UC-06 | UpdateRecipientProfile | Delivery outcome feedback | ≤ 100 ms (async) |
| UC-07 | EnforceChannelConsent | Ladder resolution | ≤ 15 ms (with cache) |
| UC-08 | EmitBillingEvent | Per-attempt terminal status | ≤ 200 ms (async) |
| UC-09 | EnforceTenantFallbackCap | Every step transition | In-process; O(1) |
2. UC-01 — RouteWithFallback (gRPC hot path)
Trigger. sms-orchestrator calls ChannelRouterService.RouteWithFallback over mTLS after compliance has returned ALLOW/FLAG and the orchestrator chose the omnichannel path for the tenant.
Input — RouteWithFallbackRequest { notificationId, recipientId, tenantId, useCase, msisdn, body, segments, senderId, requestedChannels?, policyOverride? }
Output — RouteWithFallbackAck { executionId, ladderAccepted: Channel[], excludedReasons: { channel, reason }[], eta }
Steps (synchronous, sub-50 ms):
- Input validation.
INVALID_ARGUMENTon missing fields; MSISDN parsed to E.164; reject MCC ≠412unlessallowForeignRecipientfeature flag set for tenant. - Idempotency guard. Compute
dispatchKey = {notificationId}:{recipientId}.SETNX chan:inflight:{dispatchKey} <executionId> EX 300. On collision return the existingexecutionId(at-least-once safety). - Load policy.
GET chan:policy:{tenantId}:{useCase}from Redis (TTL 300 s). Miss → read fromchan.fallback_policies; cache. If absent, use default (see DOMAIN_MODEL §2.2). - Load recipient profile.
GET chan:profile:{msisdnHash}from Redis (TTL 60 s). Miss →chan.recipient_profiles; profile-absent is treated asUNSEENwith synthetic defaults. - Gate: consent per channel. Call
ConsentLedgerService.CheckConsent(msisdn, tenantId, channels[])with 10 ms deadline; cached 60 s underchan:gate:{tenantId}:{recipientHash}:{useCase}. Channels without consent are flaggedrecipient_opt_out. - Gate: compliance per channel. Call
ComplianceEngine.EvaluateChannelCompliance(channels[], body, senderId)with 15 ms deadline; cached 60 s. Channels with verdictBLOCKare flaggedcompliance_block. - Gate: sender-ID authorisation.
SenderIdRegistry.VerifySender(tenantId, senderId)— hot-cached (see sender-id-registry OBSERVABILITY); on failure emitREFUSED_SENDER_UNAUTHORIZED. - Gate: per-adapter circuit-breaker. In-process;
OPENadapter →step_skipped reason=adapter_circuit_open. - Gate: OTT link existence. WhatsApp/Telegram/Viber steps require a known link on the profile; otherwise
step_skipped reason=no_link. - Build resolved ladder = policy.ladder filtered by (5–9), preserving order. If resolved length is 0 → emit
notification.delivery.outcome.v1 { final: REFUSED_NO_CHANNEL }and ACK. - Pre-flight cost cap. Sum of worst-case per-step cost must be ≤
policy.costCapPerMessage; if not, shrink the ladder tail until it fits. If even the first step breaches →REFUSED_COST_CAP. - Persist execution. INSERT
chan.fallback_executionswithladderSnapshot; emitchannel.fallback.ladder.resolved.v1(internal tracing). - Kick off step 0 dispatch via UC-02. Return ack to orchestrator immediately — the ladder progression is async beyond step 0.
Error mapping — table in API_CONTRACTS §1.
Fail-closed on dependency unavailable — if consent-ledger is unreachable past its 10 ms deadline and the cached gating entry is missing or stale > 5 min → return REFUSED_CONSENT_UNKNOWN; do not dispatch. Compliance failures similarly fail-closed (match compliance-engine fail-closed posture).
3. UC-02 — DeliverViaChannel (per-step adapter dispatch)
Trigger. Internal call from UC-01 (step 0) or UC-03 (subsequent steps). One attempt = one DeliveryAttempt row + one provider call.
Per-channel dispatch.
| Channel | Adapter call | Expected P95 | Terminal via |
|---|---|---|---|
SMS | NATS publish sms.outbound.dispatch.v1 { messageId, tenantId, to, from, body, classOfService } | 4 s | sms.dlr.inbound |
WHATSAPP | POST https://graph.facebook.com/v20.0/{phoneNumberId}/messages with Authorization: Bearer {token} and template payload | 1 s | Meta status webhook (sent / delivered / read / failed) |
TELEGRAM | POST https://api.telegram.org/bot{token}/sendMessage { chat_id, text, parse_mode: "HTML" } | 500 ms | 200 response contains message_id; no richer terminal (best-effort) |
VIBER | POST https://chatapi.viber.com/pa/send_message with X-Viber-Auth-Token | 1 s | Viber webhook (delivered / seen / failed) |
VOICE | gRPC VoiceOtpService.PlayOtp(recipient, otpDigits, language, retries) | 9 s | Call-completion callback (ANSWERED / BUSY / NO_ANSWER / FAILED) |
EMAIL | SMTP submit to egress MTA | 2 s | Bounce or MTA ACK |
Shared steps:
- Acquire per-provider token-bucket slot:
INCR chan:tps:{provider}:{accountId}with window 1 s; on deny retry after jitter up toretryBudget. - Register deadline timer in Redis ZSET:
ZADD chan:deadlines <now+deadlineSeconds> <attemptId>. - Call provider; on 2xx, persist
providerMessageIdand transition attempt tosent. - On 4xx permanent (e.g. WhatsApp
131026 recipient unsupported, Telegram403 bot blocked), emitchannel.delivery.failed.v1withreason=rejected_by_recipient; mark link INVALID on the profile. - On 5xx / transient, apply breaker logic and back-off (exponential, base 500 ms, max 8 s, jitter ±25%).
- On provider-webhook arrival (WhatsApp
wamid..., Vibermessage_token), correlate viachan.delivery_attempts.provider_message_idand transition to terminal status.
HMAC signature verification on provider webhooks:
| Provider | Header | Verification |
|---|---|---|
| WhatsApp Cloud | X-Hub-Signature-256: sha256=... | HMAC_SHA256(appSecret, rawBody) — drop on mismatch |
| Telegram | Secret path /v1/webhooks/telegram/{secretPath} + optional X-Telegram-Bot-Api-Secret-Token | Compare constant-time |
| Viber | X-Viber-Content-Signature | HMAC_SHA256(authToken, rawBody) |
4. UC-03 — HandleFallback
Trigger. Two conditions both emit a progression event:
- Deadline scanner (every 1 s) reads
ZRANGEBYSCORE chan:deadlines 0 nowand finds an elapsed attempt whose status is not terminal. - Provider-side terminal negative status (failed, rejected_by_provider/recipient).
Steps:
- Race resolution.
WATCH chan:exec:{executionId}; MULTI; check current stepIndex; a race between positive DLR and deadline elapse is resolved so exactly one progression outcome is emitted. - Progression vs termination decision.
- If current
stepIndex + 1 < len(ladder)→ progress (UC-02 on next step), emitchannel.fallback.taken.v1. - Otherwise → terminate with
final = FAILED; emitnotification.delivery.outcome.v1.
- If current
- Cost-cap check. Before calling UC-02 on step N+1, sum current
totalCostNgn + expectedCostOfNextStep. If >policy.costCapPerMessage→ terminate withREFUSED_COST_CAP. - Emit per-step billing event for the step just terminated (UC-08).
- Update recipient profile (UC-06) with the observed per-channel outcome — the LWW feedback loop.
Positive DLR path — arrival of delivered / delivered_read / ANSWERED+played terminates the execution: cancel the deadline timer (ZREM chan:deadlines {attemptId}), emit channel.delivery.confirmed.v1 + notification.delivery.outcome.v1 { final: DELIVERED }.
5. UC-04 — RouteMo (inbound MO routing)
Trigger. JetStream durable consumer chan-mo-router on mo.allowed.v1 (upstream: sms-firewall-service; fallback sms.mo.received.v1 when firewall bypass is active).
Steps:
- Parse MO envelope
{ messageId, originatorMsisdn, destination, body, mno, receivedAt }. - Normalise MSISDN to E.164; compute
msisdnHash. - Session lookup —
HGETALL chan:session:{senderOrDestination}:{msisdnHash}. Session has precedence over static map because a two-way conversation must return to the originating tenant. - Static fallback — if no session,
SELECT tenant_id, webhook_url, secret_ref FROM chan.tenant_inbound_routes WHERE inbound = $destination AND active = TRUE AND (grace_period_ends_at IS NULL OR grace_period_ends_at > now()). - Unmatched — no session, no route → publish
mo.unmatched.v1 { messageId, destination, originatorMsisdnMasked }for analytics; do not store body. - STOP keyword detection (pre-dispatch). If body matches the tenant's STOP set (default
STOP, UNSUBSCRIBE, کنسل, متوقف, ایست), callConsentLedgerService.RecordOptOut(msisdn, tenantId, channel, reason="stop_keyword"); mark sessionCLOSED_STOP(even before webhook delivery attempt). - Dispatch to tenant webhook via
webhook-dispatcher.Deliver(webhookId, payload)which handles HMAC v2 signing, retries, and dead-lettering (seewebhook-dispatcher/SERVICE_OVERVIEW.md). - Re-publish the MO to
sms.mo.inboundsoconsent-ledger-service(STOP consumer) andsms-firewall-service(feedback) see it — this is in addition to the tenant webhook fan-out (which is the tenant's interface, not a platform consumer interface). - Session refresh — if session was OPEN,
HSET lastSeenAt now; EXPIRE chan:session:... sessionTtlSeconds.
Latency target — P95 ≤ 1 s from mo.allowed.v1 consume to tenant-webhook 2xx.
6. UC-05 — ConversationSession management
Create. On first successful MT dispatch for (senderId, msisdnHash, tenantId) that was not already session-backed:
HSET chan:session:{senderId}:{msisdnHash} tenantId ... conversationId conv_ULID openedAt now lastSeenAt now turnCount 1EXPIRE ... sessionTtlSeconds- INSERT
chan.conversationsvia outbox - Emit
channel.conversation.started.v1
Refresh. On every subsequent MT or MO success: HINCRBY turnCount 1; HSET lastSeenAt now; EXPIRE.
Close.
- STOP keyword →
status = CLOSED_STOP - Redis key expiry → keyspace-notification worker writes
status = CLOSED_IDLE - Explicit admin close via REST →
status = CLOSED_MANUALAll closures emitchannel.conversation.ended.v1and are terminal.
Reconciliation. Daily job compares chan:session:* vs chan.conversations WHERE status = OPEN; Redis-lost entries are closed with reason=redis_loss.
7. UC-06 — UpdateRecipientProfile
Trigger. A delivery-feedback consumer on channel.delivery.confirmed.v1 / .failed.v1. Runs async so the hot path is unaffected.
Steps:
- For each attempt, compute a channel-preference delta:
- Positive terminal (
DELIVERED,ANSWERED) → +Δ (min 5, max 20 depending on confidence) rejected_by_recipient→ −Δ (min 10, max 40) + link marked INVALID- Ambiguous (
sentonly, timeout) → ±0
- Positive terminal (
- LWW merge:
UPDATE chan.recipient_profiles SET channelPreferences = jsonb_set(..., ..., ...) WHERE msisdn_hash = $1 AND updatedAt < NEW.updatedAt. - Transition
discoveryState:UNSEEN → LEARNINGafter first observation;LEARNING → STABLEafter ≥ 5 outcomes across ≥ 2 channels. - Emit
channel.recipient.profile.updated.v1(analytics only; subject retained 90 d).
8. UC-07 — EnforceChannelConsent
Per-channel scopes (per consent-ledger's scope model):
SMS—TRANSACTIONALfor OTP/txn,MARKETINGfor marketing,EMERGENCYalways-onWHATSAPP— MARKETING category requires explicit opt-in (Meta business-template rule)TELEGRAM— Telegram Terms of Service require the user initiated the conversation; platform enforceslink exists + last activity ≤ 30 dVIBER— PA-initiated outbound permitted under Viber business rules; opt-out respectedVOICE— OTP-only by default;MARKETINGvoice gated by distinct scopeVOICE_MARKETINGEMAIL— CAN-SPAM / GDPR-style double opt-in; verified-email flag required
Cache structure — chan:gate:{tenantId}:{sha256(msisdn ‖ tenantSalt)}:{useCase} stores a {channel: allowed|denied|reason} map with TTL 60 s. Invalidated on consent.revoked.v1 for the affected (tenantId, msisdnHash).
9. UC-08 — EmitBillingEvent
Per-attempt metering. For every terminal DeliveryAttempt (positive or negative) that actually reached the provider, emit one channel.billing.event.v1 with SKU by channel (sms.outbound.v1, whatsapp.outbound.v1, telegram.outbound.v1, viber.outbound.v1, voice.otp.v1, email.outbound.v1).
Rules:
- Adapter-refused attempts (breaker open, no-link) do not meter.
- Voice OTP meters only when
status = ANSWERED && playedDigits = otp.length. - WhatsApp
template_rejectedstill meters (the API call was made). Nats-Msg-Id = {notificationId}:{recipientId}:{stepIndex}for inbox-side dedup at the billing consumer.
10. UC-09 — EnforceTenantFallbackCap
Pre-step-transition guard. Evaluated in-process (no DB/Redis call) using the ladderSnapshot + attempts[].costNgn projection already loaded on the execution. Two counters:
executionTotalCostNgn(observed) — sum ofcostNgnacross completed attempts.nextStepExpectedCostNgn(model-driven) — based onchan.channel_adapter_configs.pricing_modellookup at policy-load time.
If executionTotalCostNgn + nextStepExpectedCostNgn > policy.costCapPerMessage → terminate with REFUSED_COST_CAP. This protects against runaway fallback cost, e.g. a mis-configured policy chain that sends via Voice OTP on marketing traffic.
11. Sequence Diagram — Typical Fallback (SMS → WhatsApp → Voice)
12. Performance & Concurrency
- Per-pod inflight cap:
MAX_INFLIGHT = 200(consumer) +MAX_GRPC_INFLIGHT = 1000. Exceeding either returnsRESOURCE_EXHAUSTEDon gRPC and leaves NATS messages un-acked for JetStream redelivery. - Deadline scanner runs per replica with a distributed lock (
chan:deadline:lock) claimed for 2 s at a time; only one replica does the scan to avoid duplicate progressions. - Profile-update consumer scales independently on a separate subject (
channel.delivery.confirmed.v1,.failed.v1) with HPA on consumer lag. - OTT-adapter Deployments are separate so WhatsApp adapter pressure does not drain Telegram / Viber pools.