Skip to main content

Channel Router Service (channel-router-service) — Service Overview

Status: populated Version: 1.0 Owner: Messaging Core Last updated: 2026-04-20 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL


1. Purpose

The Channel Router Service is the platform's omnichannel delivery decisioner. For every notification a tenant submits, the service decides — per recipient, per use case, per priority class — which channel(s) to attempt and in what order, with explicit fallback rules, deadlines, and provider adapters. It owns:

  1. Multi-channel fallback engine — Given a recipient profile and a tenant policy, chain attempts across SMS → WhatsApp BSP → Voice OTP → Email; emit a single canonical delivery outcome per recipient regardless of how many attempts occurred.
  2. OTT provider adapters — First-class integration with WhatsApp Cloud API (POST /{phone-number-id}/messages), Telegram Bot API, and Viber Business; all conform to a single internal ChannelAdapter port.
  3. Inbound MO routing — Inbound mobile-originated SMS (from smpp-connector MO subjects) is routed to the correct tenant webhook based on inbound number / shortcode → tenant mapping; full HMAC-signed delivery with at-least-once semantics.
  4. Conversational session manager — Sticky correlation across alpha sender ID ↔ MSISDN ↔ tenant for two-way SMS conversations: each MO has a session that joins it back to the originating MT, so reply text "STOP" or "1" is contextual within the right campaign / OTT thread.

The service sits between sms-orchestrator (which produces channel-agnostic notification.dispatch.requested.v1 events for tenants opting into omnichannel) and the channel-specific connectors (smpp-connector for SMS, OTT adapters for WhatsApp/Telegram/Viber, Voice OTP gateway, SMTP egress for email).

It is also a peer collaborator with compliance-engine: the channel-router asks compliance for per-channel verdicts (a message may be allowed on SMS but blocked on WhatsApp, e.g. business-template policy violations), and provides compliance with channel-attribution context for audit.


2. Bounded Context

DimensionValue
DomainOmnichannel Messaging — channel selection, fallback, OTT adapters, MO routing, conversation sessions
Owner squadMessaging Core
Deployment unitKubernetes Deployment channel-router-service (control plane) + Deployment per OTT adapter (data plane, isolated egress)
Communication styleInbound: NATS JetStream (notification.dispatch.requested.v1, sms.mo.received.v1), gRPC (mTLS) admin · Outbound: HTTPS to OTT providers, NATS to smpp-connector, HMAC-signed HTTPS webhooks to tenants
StoragePostgreSQL schema chan (Patroni HA) · Redis (session state, idempotency, fallback decision cache)
Failure modeFail-degraded per channel; channel substitution before refusal — if WhatsApp adapter is down, fallback ladder skips it without aborting; only a fully-exhausted ladder yields a terminal DELIVERY_FAILED
Region pinningActive-active in kbl and mzr per ADR-0004 §2; OTT adapter pods may be pinned to nodes with the appropriate egress IP for provider IP-allowlists

3. Responsibilities

#Responsibility
R1Consume notification.dispatch.requested.v1 and route per recipient through the configured fallback ladder
R2Maintain chan.recipient_profile and chan.tenant_policy to drive channel selection (preferred channel, opt-out per channel, capability gating)
R3Provide ChannelAdapter port and concrete adapters for sms, whatsapp_cloud, telegram_bot, viber_business, voice_otp, email_smtp
R4Route inbound MO traffic from sms-firewall-service and smpp-connector MO streams to the correct tenant webhook with HMAC v2 signing
R5Manage conversational sessions (sticky (senderId, msisdn, tenantId) keys) with TTL and explicit close semantics
R6Emit a single canonical notification.delivery.outcome.v1 per recipient regardless of how many channels were attempted
R7Surface per-channel attempt records as channel.attempt.recorded.v1 for analytics & billing
R8Honour per-channel quotas, per-tenant fallback policy, and per-recipient opt-out flags
R9Enforce per-channel content rules (WhatsApp template approval state, Voice TTS language support, SMTP DMARC alignment) before adapter dispatch
R10Provide a fallback-decision explainer in the outcome event (fallback_path: ["sms_failed_carrier", "whatsapp_template_rejected", "voice_delivered"]) for tenant debuggability

4. Non-Responsibilities

  • Does not transmit SMS at the SMPP layer — smpp-connector does
  • Does not decide compliance verdict — compliance-engine does (channel-router asks for verdict per-channel)
  • Does not select MNO routing — routing-engine does (channel-router publishes to the MNO-agnostic SMS subject which routing-engine consumes)
  • Does not own DLR correlation for SMS — dlr-processor does (channel-router subscribes to outcome events)
  • Does not authenticate tenants — auth-service does
  • Does not charge for OTT messages — billing-service does (channel-router publishes per-attempt metering events)
  • Does not manage WhatsApp template approvals — compliance-engine and developer-portal-service do (channel-router only checks the approved-template state)

5. Upstream / Downstream Dependencies

DirectionSystemProtocolPurpose
Inbound producersms-orchestratorNATS notification.dispatch.requested.v1Per-recipient channel routing requests
Inbound producersmpp-connector MONATS sms.mo.received.v1Inbound MO for tenant webhook routing
Inbound producersms-firewall-serviceNATS mo.allowed.v1Pre-filtered MO that passed the firewall
Inbound callercompliance-enginegRPC (mTLS)Per-channel verdict requests, channel attribution feedback
Inbound callerAdmin via KongHTTPS RESTTenant policy and recipient-profile management
Outboundsmpp-connector (SMS)NATS sms.outbound.dispatch.v1SMS dispatch via per-MNO connector pool
OutboundWhatsApp Cloud APIHTTPS POST https://graph.facebook.com/v20.0/{phone-number-id}/messagesOTT adapter
OutboundTelegram Bot APIHTTPS POST https://api.telegram.org/bot{token}/sendMessageOTT adapter
OutboundViber Business APIHTTPS POST https://chatapi.viber.com/pa/send_messageOTT adapter
OutboundVoice OTP gatewaygRPCVoice TTS OTP delivery
OutboundTenant webhookHTTPS POST + HMAC v2Inbound MO delivery
OutboundPostgreSQL chan schemaTCPState, sessions, profiles
OutboundRedisTCPSession state, idempotency, decision cache
OutboundNATS JetStreamTCPOutcome and attempt events
Outboundbilling-serviceNATS billing.metering.recorded.v1Per-channel attempt metering
Outboundcompliance-enginegRPC EvaluateChannelCompliancePer-channel verdict
Outboundconsent-ledger-servicegRPC CheckConsentOpt-in / opt-out per channel

6. Runtime Topology

ComponentReplicas (per region)Notes
channel-router-service (decision core)8 in kbl, 6 in mzr (HPA min)Stateless. Scales on NATS consumer lag + CPU.
chan-adapter-whatsapp4 in each regionPinned to nodes with the WhatsApp-allowlisted egress IP pool.
chan-adapter-telegram2 in each region
chan-adapter-viber2 in each region
chan-adapter-voice4 in each regiongRPC client to Voice OTP gateway.
chan-adapter-email2 in each regionSMTP egress from dedicated mail IP pool.
chan-mo-router4 in each regionInbound MO → tenant webhook fanout.
Postgres chanPatroni 1+2 sync standbys per region; kbl ↔ mzr logical replication for sessions and profiles
Redis chan6-node Sentinel per regionSession and idempotency state

7. Outbound Fallback Flow


8. Inbound MO Routing Flow


9. Position in the Platform


10. Key Design Decisions

DecisionRationale
Single canonical outcome event per recipientTenants want one webhook per recipient ("delivered via voice OTP") not one per attempt. Per-attempt detail is available via separate channel.attempt.recorded.v1 for analytics.
Fallback ladder is policy-driven, not hard-codedTenant policy stored in chan.tenant_policy: ladder, per-step deadlines, per-step retry budget. New channels added without code change.
Per-channel verdict from compliance-engineA message may pass SMS rules but fail WhatsApp business-template rules. Channel-router asks compliance per channel and excludes the channel from the ladder if blocked.
OTT adapters are separate DeploymentsEgress IP allowlists and provider rate limits differ per OTT. Separating adapters allows independent scaling and provider-specific NetworkPolicies.
Voice OTP is treated as a channel, not a separate serviceLets tenants set Voice as a fallback step (e.g. SMS → WhatsApp → Voice). Voice OTP gateway is the protocol terminus.
Inbound MO routing uses session table first, static map secondConversations need stickiness. If a session exists for (senderId, msisdn), the MO returns to the originating tenant regardless of static inbound-number mapping.
HMAC v2 signing on tenant webhooksX-Ghasi-Signature: t={ts},v2={hex(HMAC_SHA256(secret, ts + "." + body))} — replay-protected, secret-rotatable.
Adapter is a port, providers are adapters (Hex)New OTT (e.g. RCS) added by implementing ChannelAdapter port; no router-core change.
Fallback decision cached for 60 sIdentical (tenantId, recipient, useCase) triplet will resolve to the same ladder for 60 s — saves consent and compliance round trips during burst.
Per-attempt metering, not per-recipientOTT and Voice OTP cost differently per attempt. Tenants are billed accurately even if the first attempt failed.
Fail-degraded per channelIf WhatsApp adapter is down, ladder skips WhatsApp without aborting. Only a fully-exhausted ladder yields DELIVERY_FAILED.

11. Latency and Throughput Budget

PathTarget P50Target P95
Channel decision (cached)5 ms15 ms
Channel decision (cold: consent + compliance)25 ms80 ms
SMS attempt → DLR receipt (carrier-dependent)4 s30 s
WhatsApp Cloud API attempt → status webhook1 s8 s
Voice OTP attempt (call setup + TTS playback)9 s25 s
Inbound MO → tenant webhook delivery200 ms1 s

End-to-end fallback ladder for a 3-step OTP recipient SLO: P95 ≤ 25 s (SMS-fail then voice-success worst case).


12. Cross-Service Invariants

  1. One outcome per recipient. No matter how many attempts occur, exactly one notification.delivery.outcome.v1 is published per (notificationId, recipientId).
  2. No silent channel skip. If a channel is excluded from the ladder (compliance block, opt-out, adapter down), an explanatory entry is included in fallback_path.
  3. Per-channel metering. Every adapter attempt produces exactly one billing.metering.recorded.v1 event with the right SKU.
  4. Idempotent inbound webhook delivery. Tenants receive the same MO at-least-once with Idempotency-Key header equal to the MO's canonical messageId.
  5. Session integrity. A conversational session is closed on explicit STOP keyword, idle TTL (default 24 h), or tenant-initiated close. Closure is final; the next MO from the same MSISDN starts a new session.

13. References

  • ADR-0004 §3 (new bounded contexts) — defines this service's scope
  • docs/07-epics-and-user-stories.md §6.7 — EP-CHAN-01..04
  • services/sms-orchestrator/SERVICE_OVERVIEW.md — primary upstream
  • services/compliance-engine/SERVICE_OVERVIEW.md — per-channel verdict
  • services/consent-ledger-service/SERVICE_OVERVIEW.md — opt-in/out source of truth
  • services/dlr-processor/SERVICE_OVERVIEW.md — SMS DLR feedback
  • WhatsApp Cloud API — https://developers.facebook.com/docs/whatsapp/cloud-api
  • Telegram Bot API — https://core.telegram.org/bots/api
  • Viber Business API — https://developers.viber.com/docs/api/rest-bot-api/