Channel Router Service — Domain Model
Version: 1.0 Status: Draft Owner: Messaging Core Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · APPLICATION_LOGIC · DATA_MODEL · EVENT_SCHEMAS · API_CONTRACTS Related: ADR-0004 §3,
docs/07-epics-and-user-stories.md§6.7, EP-CHAN-01..04
1. Bounded Context
Omnichannel Messaging — channel selection, fallback, OTT adapters, MO routing, conversation sessions. The channel-router owns every decision about which channel a notification takes, in what order a fallback ladder progresses, and how an inbound MO returns to the correct tenant and conversation thread. It is the sole authority for ChannelAdapter dispatch and for Conversation lifecycle.
Context boundary:
- Inside the boundary:
RecipientProfile,FallbackPolicy,FallbackExecution,Conversation,MoRouting,ChannelAdapterConfig,DeliveryAttempt,OutcomeEnvelope,TenantInboundRoute,StatusMap, per-channel circuit-breakers, per-OTT token buckets. - Outside the boundary: tenant / account identity (
auth-service), per-channel consent (consent-ledger-service), content-policy verdicts (compliance-engine), sender-ID authorisation (sender-id-registry-service), SMS transport (smpp-connector), MNO selection (routing-engine), DLR correlation (dlr-processor), per-attempt billing aggregation (billing-service), tenant notification UI (notification-service).
Channel-router is a hot-path data-plane service: both RouteWithFallback (P95 ≤ 50 ms decision) and inbound MO routing (chan-mo-router, P95 ≤ 1 s end-to-end) share the same aggregates and PG/Redis infra but run as separate Deployments for blast-radius isolation.
2. Aggregates
2.1 RecipientProfile
The learned per-recipient capability map. One row per (tenantId, msisdnHash). Written by profile-update consumers (delivery outcome feedback, OTT capability discovery) and read on every RouteWithFallback call.
| Field | Type | Notes |
|---|---|---|
profileId | profile_… | External ID |
tenantId | UUIDv4 | Scope |
msisdnHash | sha256(msisdn ‖ tenantSalt) | Never raw MSISDN in profile store |
channelPreferences | ChannelPreference[] VO | Per-channel score (0–100), last-observed, confidence |
hasWhatsappBusiness | TriState | KNOWN_YES / KNOWN_NO / UNKNOWN |
telegramChatId | opaque string | Populated only when recipient linked a bot |
viberId | opaque string | Populated only when linked |
voiceOtpSupported | TriState | Inferred from prior ANSWERED / NO_ANSWER voice attempts |
emailVerified | bool | Opt-in, double-verified |
lastSuccessfulChannel | Channel | null | Used to bias ladder order |
lastObservedAt | Instant | Latest signal across any channel |
discoveryState | UNSEEN | LEARNING | STABLE | Gating for ML-assisted ordering |
updatedAt | Instant | LWW tie-break key |
Invariants
- MSISDN never stored in plaintext at this aggregate; the raw number lives only transiently in the dispatch request and in the
delivery_attemptsrow untillastDispatchAt + 90d. - Per-channel preference score is bounded
[0, 100]; decays exponentially at half-life 30 d. - A profile row MUST NOT be created for an unknown recipient — the first
RouteWithFallbackuses a synthetic in-memory default; the row is written only on first successful attempt.
2.2 FallbackPolicy
The tenant-scoped ladder definition. One row per (tenantId, useCase) where useCase ∈ {otp, txn, marketing, alert, conversational}.
| Field | Type | Notes |
|---|---|---|
policyId | UUIDv4 | Identity |
tenantId, useCase | composite | Lookup key |
ladder | LadderStep[] — ordered, length 1..6 | Each step: { channel, deadlineSeconds, retryBudget, escalateOn[], costCapNgn? } |
strategy | SEQUENTIAL | PARALLEL | FAILOVER | SEQUENTIAL is the default (wait-then-fallback); PARALLEL sends on N channels simultaneously (OTT only, fraud-review-gated); FAILOVER is single-channel primary with hot-backup adapter |
costCapPerMessage | NUMERIC | Hard per-message upper bound across all attempts; breach short-circuits with REFUSED_COST_CAP |
sessionTtlSeconds | int | Overrides the platform default 24 h for this tenant+useCase |
stopKeywordsOverride | string[] | Appends to default set |
version | int | Monotonic; bumped on every mutation |
createdBy, updatedBy, createdAt, updatedAt | audit | — |
Invariants
- Ladder length ≤ 6 steps; enforced at write.
- No step may repeat the same
channelwithin the same ladder (prevents double-charge loops). strategy = PARALLELpermitted only if all steps are OTT (not SMS, not Voice) — enforced on write.costCapPerMessagemust be ≥ sum of cheapest path through the ladder; otherwise write rejected withPOLICY_UNREACHABLE_COST_CAP.- Default ladder when no row exists (per use-case):
otp:[sms(60s), whatsapp(30s), voice(45s)]txn:[sms(60s), whatsapp(30s)]marketing:[sms(120s)]alert:[sms(30s), voice(30s)]
2.3 FallbackExecution
A per-message append-only trace of the ladder evaluation. Written once per RouteWithFallback.
| Field | Notes |
|---|---|
executionId | UUIDv4 (exec_…) |
notificationId, recipientId, tenantId, useCase | Scope |
policyId + policyVersion | Snapshot of which policy governed this execution |
ladderSnapshot | JSONB of the resolved ladder at decision time (consent-filtered, circuit-breaker-filtered) |
attempts | DeliveryAttempt[] — appended as ladder progresses |
outcome | DELIVERED | FAILED | REFUSED_NO_CHANNEL | REFUSED_COST_CAP | REFUSED_CONSENT |
finalChannel | Channel | null |
totalCostNgn | NUMERIC |
fallbackPath | FallbackPathEntry[] — human-readable explainer |
startedAt, terminatedAt | Instants |
Invariants — append-only at DB level (Postgres rule); terminal outcome is emitted exactly once (outbox table UNIQUE(notificationId, recipientId)).
2.4 Conversation
The sticky correlation across one (senderId, msisdnHash, tenantId) triple for two-way messaging.
| Field | Notes |
|---|---|
conversationId | conv_… (ULID-based; time-sortable) |
tenantId, senderId, msisdnHash | Identity triple |
status | OPEN | CLOSED_STOP | CLOSED_IDLE | CLOSED_MANUAL |
openedAt, lastSeenAt, expiresAt, closedAt | Instants |
turnCount | int — monotonic; incremented on every MT or MO |
lastMtMessageId, lastMoMessageId | Back-pointers for debug |
channel | Which channel initiated the session (SMS for the default flow; OTT otherwise) |
Invariants
turnCountis strictly monotonic; a race that would violate monotonicity is resolved via RedisWATCH/MULTI.- Closure is terminal — a STOP keyword during
CLOSED_*state creates a fresh conversation only after consent is re-granted. - TTL sliding on every MO/MT success; Redis key TTL and Postgres
expiresAtmust be within 30 s of each other (reconciliation job).
2.5 MoRouting
The inbound-number → tenant static mapping. Used only on session miss.
| Field | Notes |
|---|---|
routeId | UUIDv4 |
tenantId | Owner |
inbound | Shortcode or long-code E.164; platform-unique |
webhookUrl | HTTPS URL |
secretRef | Vault reference to HMAC secret; never exposed |
gracePeriodEndsAt | Instant | null — 24 h soft-delete window |
active | bool |
Invariants — number uniqueness enforced by unique index; cross-check with numbering-service.GetLease(tenantId, inbound) on write.
2.6 ChannelAdapterConfig
Per-provider / per-tenant credentials + adapter-level behaviour.
| Field | Notes |
|---|---|
adapterConfigId | adcfg_… |
tenantId | Owner (platform-wide adapters are tenantId = null) |
provider | WHATSAPP_CLOUD | TELEGRAM_BOT | VIBER | VOICE_OTP_GATEWAY | SMTP | SMPP_CONNECTOR |
phoneNumberIdOrBotHandle | Provider-specific routing key |
secretRef | Vault path (secrets/data/chan/ott/{tenantId}/{provider}) |
circuitState | CLOSED | OPEN | HALF_OPEN — adapter circuit-breaker |
rateLimitPerSecond, rateLimitPerDay | Token-bucket sizing (per-provider defaults; see §5) |
regionalEgressIpPool | For allow-listed providers (WhatsApp Cloud) |
2.7 DeliveryAttempt
Single-step dispatch record (child of FallbackExecution).
| Field | Notes |
|---|---|
attemptId | attempt_… |
executionId, stepIndex | Parent FK |
channel, adapterConfigId | Used |
providerMessageId | e.g. wamid.HBgL... for WhatsApp, SMSC messageId for SMS |
status | accepted | sent | delivered | delivered_read | failed_temp | failed_perm | rejected_by_provider | rejected_by_recipient | step_skipped |
reason | Canonical reason string (mapped by chan.adapter_status_map) |
costNgn | NUMERIC |
durationMs | From dispatch to terminal status |
startedAt, terminatedAt | Instants |
rawProviderPayload | JSONB — retained 30 d for debug, redacted for PII |
3. Value Objects
| VO | Shape | Invariants |
|---|---|---|
Channel | enum SMS | WHATSAPP | TELEGRAM | VIBER | VOICE | EMAIL | Additive only; unknown → UNKNOWN |
ChannelCapability | { channel, supported: TriState, confidence: [0..1], lastObservedAt } | confidence = 1 for direct API probes, ≤ 0.8 for inferred |
FallbackStrategy | SEQUENTIAL | PARALLEL | FAILOVER | See §2.2 constraint on PARALLEL |
DeliveryConfidence | DEFINITIVE | PROBABLE | AMBIGUOUS | DEFINITIVE = provider ACK; PROBABLE = sent only; AMBIGUOUS = timeout |
ConversationId | ULID-based conv_… | Time-sortable |
TurnNumber | int ≥ 0 | Monotonic per conversation |
FallbackPathEntry | { channel, status, reason, durationMs, costNgn } | Redacted — no raw body |
MessageContextRef | { notificationId, recipientId, tenantId, useCase } | No PII |
OutcomeEnvelope | { final, channel, attempts, fallback_path[], occurredAt, executionId } | Emitted exactly once per (notificationId, recipientId) |
ReasonCode | canonical string | See docs/standards/ERROR_CODES.md (CHAN_*) |
4. Domain Events (produced)
Full schemas in EVENT_SCHEMAS.md.
| Event | Trigger |
|---|---|
channel.delivery.attempted.v1 | An adapter dispatch() call starts |
channel.delivery.confirmed.v1 | Terminal positive provider status (DELIVERED or DELIVERED_READ) |
channel.delivery.failed.v1 | Terminal negative status |
channel.fallback.taken.v1 | Ladder progression (step N → step N+1) |
channel.mo.inbound.v1 | Inbound MO routed — also re-published to sms.mo.inbound for consent-ledger + sms-firewall consumers |
channel.conversation.started.v1 | First MT on a new (senderId, msisdn, tenantId) key |
channel.conversation.ended.v1 | Any terminal state on a Conversation |
channel.recipient.profile.updated.v1 | Profile LWW merge from delivery feedback or capability probe |
channel.billing.event.v1 | Per-attempt metering feed to billing-service |
notification.delivery.outcome.v1 | Single canonical outcome per (notificationId, recipientId) — consumed platform-wide |
channel.tenant_policy.changed.v1 | FallbackPolicy CRUD |
channel.inbound_route.changed.v1 | MoRouting CRUD |
Events consumed
| Subject | Producer | Purpose |
|---|---|---|
notification.dispatch.requested.v1 | sms-orchestrator | Trigger RouteWithFallback on durable consumer |
mo.allowed.v1 | sms-firewall-service | Pre-filtered MO for tenant-webhook routing |
sms.mo.received.v1 | smpp-connector (MO) | Ingress from MNO SMSC (when firewall bypass in effect) |
sms.dlr.inbound | dlr-processor | Carrier DLR → terminates SMS step of ladder |
consent.revoked.v1 | consent-ledger-service | Per-channel consent invalidation; update recipient profile |
sender.id.suspended.v1 | sender-id-registry-service | Stop dispatches from that sender |
| WhatsApp Cloud API webhook | Meta Graph API | Provider status (sent/delivered/read/failed) |
| Telegram Bot API webhook | Telegram | Inbound updates + message confirmations |
| Viber webhook | Viber | Message + event updates |
5. Per-Provider Rate-Limit Baselines
Sourced from published vendor docs; reflected in ChannelAdapterConfig defaults. These are starting thresholds — production tuning per account tier.
| Provider | Per-second | Per-day | Notes |
|---|---|---|---|
| WhatsApp Cloud API | 80 msg/s per phone-number-id (Tier-1 default) | 1K–100K unique recipients depending on quality rating | Meta raises tier on low block rate; see developers.facebook.com/docs/whatsapp/cloud-api/overview#rate-limits |
| Telegram Bot API | 30 msg/s per bot, 1 msg/s per chat | ~soft-capped by Telegram | core.telegram.org/bots/faq#broadcasting-to-users |
| Viber Business | 40 msg/s per PA (default) | Negotiated per PA | developers.viber.com/docs/api/rest-bot-api/#rate-limiting |
| Voice OTP | 20 call setups/s per outbound trunk | Trunk CPS contract | 3GPP TS 22.171 voice-assisted auth pattern |
| SMTP egress | 50 msg/s per IP reputation tier | Per-ISP dependent | RFC 5321 |
| SMS (via smpp-connector) | — | — | Governed by routing-engine / bind TPS |
6. Global Invariants
- One canonical outcome per recipient. Exactly one
notification.delivery.outcome.v1per(notificationId, recipientId); enforced viachan.delivery_outbox UNIQUE(notificationId, recipientId). - Consent-aware ladders. The ladder is consent-filtered before the first dispatch (
consent-ledger-service.CheckConsent(channels[])); a channel missing consent is excluded with reasonrecipient_opt_outand audit-trailed infallback_path. - Cost-capped fallbacks.
FallbackPolicy.costCapPerMessageis evaluated at each step transition; breach short-circuits withREFUSED_COST_CAPand a single outcome event. - Idempotent dispatch. Per-adapter
dispatchmust honourIdempotency-Key = {notificationId}:{recipientId}:{stepIndex}; duplicate provider calls are prevented by the adapter. - Session authority. For inbound MO, session lookup precedes static routing; session key scope is per
senderIdso one MSISDN may simultaneously hold sessions with multiple tenants. - No silent channel drop. Any channel excluded from the live ladder (compliance, consent, circuit-breaker, unavailable adapter, no-link) contributes an entry to
fallback_path[]with a canonical reason. - Profile writes are LWW. Two concurrent profile updates for the same
(tenantId, msisdnHash)resolve byupdatedAtmonotonic timestamp; causally-related updates are batched via the profile-update worker. - Append-only executions + audit.
fallback_executionsandchannel.auditreject UPDATE/DELETE at the Postgres schema level; retention is by partition drop. - PII hygiene. MSISDN never appears in events, logs, or profile storage; only
sha256(msisdn ‖ tenantSalt). Message bodies are emitted only to providers and to the tenant webhook — never to analytics. - Fail-closed on consent lookup. If
consent-ledger-serviceis unreachable on the hot path beyond the retry budget, theRouteWithFallbackcall returnsREFUSED_CONSENT_UNKNOWNand no dispatch occurs. - Fail-degraded on adapter failure. An OTT adapter breaker in
OPENstate causes ladder-step skip, not call refusal.
7. Cross-Service References
sms-orchestrator/SERVICE_OVERVIEW.md— upstream producer ofnotification.dispatch.requested.v1consent-ledger-service/DOMAIN_MODEL.md— per-channel scope semantics (MARKETING / TRANSACTIONAL / OTP / EMERGENCY)sender-id-registry-service/API_CONTRACTS.md—VerifySender(tenantId, senderId)consumed at dispatch timesms-firewall-service/EVENT_SCHEMAS.md—mo.allowed.v1producer contractwebhook-dispatcher/SERVICE_OVERVIEW.md— downstream HMAC v2 signing implementationbilling-service/EVENT_SCHEMAS.md— consumer ofchannel.billing.event.v1compliance-engine/API_CONTRACTS.md—EvaluateChannelComplianceper-channel verdictfraud-intel-service/EVENT_SCHEMAS.md—fraud.detected.*signals gating PARALLEL strategy- ADR-0004 §3 (new bounded contexts; channel-router's position in the national backbone)