Channel Router Service (channel-router-service) — Service Overview
Status: populated Version: 1.0 Owner: Messaging Core Last updated: 2026-04-20 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL
1. Purpose
The Channel Router Service is the platform's omnichannel delivery decisioner. For every notification a tenant submits, the service decides — per recipient, per use case, per priority class — which channel(s) to attempt and in what order, with explicit fallback rules, deadlines, and provider adapters. It owns:
- Multi-channel fallback engine — Given a recipient profile and a tenant policy, chain attempts across SMS → WhatsApp BSP → Voice OTP → Email; emit a single canonical delivery outcome per recipient regardless of how many attempts occurred.
- OTT provider adapters — First-class integration with WhatsApp Cloud API (
POST /{phone-number-id}/messages), Telegram Bot API, and Viber Business; all conform to a single internalChannelAdapterport. - Inbound MO routing — Inbound mobile-originated SMS (from
smpp-connectorMO subjects) is routed to the correct tenant webhook based on inbound number / shortcode → tenant mapping; full HMAC-signed delivery with at-least-once semantics. - Conversational session manager — Sticky correlation across alpha sender ID ↔ MSISDN ↔ tenant for two-way SMS conversations: each MO has a session that joins it back to the originating MT, so reply text "STOP" or "1" is contextual within the right campaign / OTT thread.
The service sits between sms-orchestrator (which produces channel-agnostic notification.dispatch.requested.v1 events for tenants opting into omnichannel) and the channel-specific connectors (smpp-connector for SMS, OTT adapters for WhatsApp/Telegram/Viber, Voice OTP gateway, SMTP egress for email).
It is also a peer collaborator with compliance-engine: the channel-router asks compliance for per-channel verdicts (a message may be allowed on SMS but blocked on WhatsApp, e.g. business-template policy violations), and provides compliance with channel-attribution context for audit.
2. Bounded Context
| Dimension | Value |
|---|---|
| Domain | Omnichannel Messaging — channel selection, fallback, OTT adapters, MO routing, conversation sessions |
| Owner squad | Messaging Core |
| Deployment unit | Kubernetes Deployment channel-router-service (control plane) + Deployment per OTT adapter (data plane, isolated egress) |
| Communication style | Inbound: NATS JetStream (notification.dispatch.requested.v1, sms.mo.received.v1), gRPC (mTLS) admin · Outbound: HTTPS to OTT providers, NATS to smpp-connector, HMAC-signed HTTPS webhooks to tenants |
| Storage | PostgreSQL schema chan (Patroni HA) · Redis (session state, idempotency, fallback decision cache) |
| Failure mode | Fail-degraded per channel; channel substitution before refusal — if WhatsApp adapter is down, fallback ladder skips it without aborting; only a fully-exhausted ladder yields a terminal DELIVERY_FAILED |
| Region pinning | Active-active in kbl and mzr per ADR-0004 §2; OTT adapter pods may be pinned to nodes with the appropriate egress IP for provider IP-allowlists |
3. Responsibilities
| # | Responsibility |
|---|---|
| R1 | Consume notification.dispatch.requested.v1 and route per recipient through the configured fallback ladder |
| R2 | Maintain chan.recipient_profile and chan.tenant_policy to drive channel selection (preferred channel, opt-out per channel, capability gating) |
| R3 | Provide ChannelAdapter port and concrete adapters for sms, whatsapp_cloud, telegram_bot, viber_business, voice_otp, email_smtp |
| R4 | Route inbound MO traffic from sms-firewall-service and smpp-connector MO streams to the correct tenant webhook with HMAC v2 signing |
| R5 | Manage conversational sessions (sticky (senderId, msisdn, tenantId) keys) with TTL and explicit close semantics |
| R6 | Emit a single canonical notification.delivery.outcome.v1 per recipient regardless of how many channels were attempted |
| R7 | Surface per-channel attempt records as channel.attempt.recorded.v1 for analytics & billing |
| R8 | Honour per-channel quotas, per-tenant fallback policy, and per-recipient opt-out flags |
| R9 | Enforce per-channel content rules (WhatsApp template approval state, Voice TTS language support, SMTP DMARC alignment) before adapter dispatch |
| R10 | Provide a fallback-decision explainer in the outcome event (fallback_path: ["sms_failed_carrier", "whatsapp_template_rejected", "voice_delivered"]) for tenant debuggability |
4. Non-Responsibilities
- Does not transmit SMS at the SMPP layer —
smpp-connectordoes - Does not decide compliance verdict —
compliance-enginedoes (channel-router asks for verdict per-channel) - Does not select MNO routing —
routing-enginedoes (channel-router publishes to the MNO-agnostic SMS subject whichrouting-engineconsumes) - Does not own DLR correlation for SMS —
dlr-processordoes (channel-router subscribes to outcome events) - Does not authenticate tenants —
auth-servicedoes - Does not charge for OTT messages —
billing-servicedoes (channel-router publishes per-attempt metering events) - Does not manage WhatsApp template approvals —
compliance-engineanddeveloper-portal-servicedo (channel-router only checks the approved-template state)
5. Upstream / Downstream Dependencies
| Direction | System | Protocol | Purpose |
|---|---|---|---|
| Inbound producer | sms-orchestrator | NATS notification.dispatch.requested.v1 | Per-recipient channel routing requests |
| Inbound producer | smpp-connector MO | NATS sms.mo.received.v1 | Inbound MO for tenant webhook routing |
| Inbound producer | sms-firewall-service | NATS mo.allowed.v1 | Pre-filtered MO that passed the firewall |
| Inbound caller | compliance-engine | gRPC (mTLS) | Per-channel verdict requests, channel attribution feedback |
| Inbound caller | Admin via Kong | HTTPS REST | Tenant policy and recipient-profile management |
| Outbound | smpp-connector (SMS) | NATS sms.outbound.dispatch.v1 | SMS dispatch via per-MNO connector pool |
| Outbound | WhatsApp Cloud API | HTTPS POST https://graph.facebook.com/v20.0/{phone-number-id}/messages | OTT adapter |
| Outbound | Telegram Bot API | HTTPS POST https://api.telegram.org/bot{token}/sendMessage | OTT adapter |
| Outbound | Viber Business API | HTTPS POST https://chatapi.viber.com/pa/send_message | OTT adapter |
| Outbound | Voice OTP gateway | gRPC | Voice TTS OTP delivery |
| Outbound | Tenant webhook | HTTPS POST + HMAC v2 | Inbound MO delivery |
| Outbound | PostgreSQL chan schema | TCP | State, sessions, profiles |
| Outbound | Redis | TCP | Session state, idempotency, decision cache |
| Outbound | NATS JetStream | TCP | Outcome and attempt events |
| Outbound | billing-service | NATS billing.metering.recorded.v1 | Per-channel attempt metering |
| Outbound | compliance-engine | gRPC EvaluateChannelCompliance | Per-channel verdict |
| Outbound | consent-ledger-service | gRPC CheckConsent | Opt-in / opt-out per channel |
6. Runtime Topology
| Component | Replicas (per region) | Notes |
|---|---|---|
channel-router-service (decision core) | 8 in kbl, 6 in mzr (HPA min) | Stateless. Scales on NATS consumer lag + CPU. |
chan-adapter-whatsapp | 4 in each region | Pinned to nodes with the WhatsApp-allowlisted egress IP pool. |
chan-adapter-telegram | 2 in each region | |
chan-adapter-viber | 2 in each region | |
chan-adapter-voice | 4 in each region | gRPC client to Voice OTP gateway. |
chan-adapter-email | 2 in each region | SMTP egress from dedicated mail IP pool. |
chan-mo-router | 4 in each region | Inbound MO → tenant webhook fanout. |
Postgres chan | Patroni 1+2 sync standbys per region; kbl ↔ mzr logical replication for sessions and profiles | |
Redis chan | 6-node Sentinel per region | Session and idempotency state |
7. Outbound Fallback Flow
8. Inbound MO Routing Flow
9. Position in the Platform
10. Key Design Decisions
| Decision | Rationale |
|---|---|
| Single canonical outcome event per recipient | Tenants want one webhook per recipient ("delivered via voice OTP") not one per attempt. Per-attempt detail is available via separate channel.attempt.recorded.v1 for analytics. |
| Fallback ladder is policy-driven, not hard-coded | Tenant policy stored in chan.tenant_policy: ladder, per-step deadlines, per-step retry budget. New channels added without code change. |
| Per-channel verdict from compliance-engine | A message may pass SMS rules but fail WhatsApp business-template rules. Channel-router asks compliance per channel and excludes the channel from the ladder if blocked. |
| OTT adapters are separate Deployments | Egress IP allowlists and provider rate limits differ per OTT. Separating adapters allows independent scaling and provider-specific NetworkPolicies. |
| Voice OTP is treated as a channel, not a separate service | Lets tenants set Voice as a fallback step (e.g. SMS → WhatsApp → Voice). Voice OTP gateway is the protocol terminus. |
| Inbound MO routing uses session table first, static map second | Conversations need stickiness. If a session exists for (senderId, msisdn), the MO returns to the originating tenant regardless of static inbound-number mapping. |
| HMAC v2 signing on tenant webhooks | X-Ghasi-Signature: t={ts},v2={hex(HMAC_SHA256(secret, ts + "." + body))} — replay-protected, secret-rotatable. |
| Adapter is a port, providers are adapters (Hex) | New OTT (e.g. RCS) added by implementing ChannelAdapter port; no router-core change. |
| Fallback decision cached for 60 s | Identical (tenantId, recipient, useCase) triplet will resolve to the same ladder for 60 s — saves consent and compliance round trips during burst. |
| Per-attempt metering, not per-recipient | OTT and Voice OTP cost differently per attempt. Tenants are billed accurately even if the first attempt failed. |
| Fail-degraded per channel | If WhatsApp adapter is down, ladder skips WhatsApp without aborting. Only a fully-exhausted ladder yields DELIVERY_FAILED. |
11. Latency and Throughput Budget
| Path | Target P50 | Target P95 |
|---|---|---|
| Channel decision (cached) | 5 ms | 15 ms |
| Channel decision (cold: consent + compliance) | 25 ms | 80 ms |
| SMS attempt → DLR receipt (carrier-dependent) | 4 s | 30 s |
| WhatsApp Cloud API attempt → status webhook | 1 s | 8 s |
| Voice OTP attempt (call setup + TTS playback) | 9 s | 25 s |
| Inbound MO → tenant webhook delivery | 200 ms | 1 s |
End-to-end fallback ladder for a 3-step OTP recipient SLO: P95 ≤ 25 s (SMS-fail then voice-success worst case).
12. Cross-Service Invariants
- One outcome per recipient. No matter how many attempts occur, exactly one
notification.delivery.outcome.v1is published per(notificationId, recipientId). - No silent channel skip. If a channel is excluded from the ladder (compliance block, opt-out, adapter down), an explanatory entry is included in
fallback_path. - Per-channel metering. Every adapter attempt produces exactly one
billing.metering.recorded.v1event with the right SKU. - Idempotent inbound webhook delivery. Tenants receive the same MO at-least-once with
Idempotency-Keyheader equal to the MO's canonicalmessageId. - Session integrity. A conversational session is closed on explicit STOP keyword, idle TTL (default 24 h), or tenant-initiated close. Closure is final; the next MO from the same MSISDN starts a new session.
13. References
- ADR-0004 §3 (new bounded contexts) — defines this service's scope
docs/07-epics-and-user-stories.md§6.7 — EP-CHAN-01..04services/sms-orchestrator/SERVICE_OVERVIEW.md— primary upstreamservices/compliance-engine/SERVICE_OVERVIEW.md— per-channel verdictservices/consent-ledger-service/SERVICE_OVERVIEW.md— opt-in/out source of truthservices/dlr-processor/SERVICE_OVERVIEW.md— SMS DLR feedback- WhatsApp Cloud API —
https://developers.facebook.com/docs/whatsapp/cloud-api - Telegram Bot API —
https://core.telegram.org/bots/api - Viber Business API —
https://developers.viber.com/docs/api/rest-bot-api/