Skip to main content

Channel Router Service — Application Logic

Version: 1.0 Status: Draft Owner: Messaging Core Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS


1. Use Case Catalogue

#Use caseTriggerLatency budget (P95)
UC-01RouteWithFallbackgRPC from sms-orchestrator≤ 50 ms (decision only)
UC-02DeliverViaChannelInternal dispatch per stepPer-adapter (see §3)
UC-03HandleFallbackStep deadline elapsed or negative outcome≤ 5 s full cascade (3-step OTP)
UC-04RouteMoNATS mo.allowed.v1≤ 1 s to tenant webhook ACK
UC-05ConversationSession (lookup / refresh / close)Per-MT and per-MO≤ 5 ms
UC-06UpdateRecipientProfileDelivery outcome feedback≤ 100 ms (async)
UC-07EnforceChannelConsentLadder resolution≤ 15 ms (with cache)
UC-08EmitBillingEventPer-attempt terminal status≤ 200 ms (async)
UC-09EnforceTenantFallbackCapEvery step transitionIn-process; O(1)

2. UC-01 — RouteWithFallback (gRPC hot path)

Trigger. sms-orchestrator calls ChannelRouterService.RouteWithFallback over mTLS after compliance has returned ALLOW/FLAG and the orchestrator chose the omnichannel path for the tenant.

InputRouteWithFallbackRequest { notificationId, recipientId, tenantId, useCase, msisdn, body, segments, senderId, requestedChannels?, policyOverride? }

OutputRouteWithFallbackAck { executionId, ladderAccepted: Channel[], excludedReasons: { channel, reason }[], eta }

Steps (synchronous, sub-50 ms):

  1. Input validation. INVALID_ARGUMENT on missing fields; MSISDN parsed to E.164; reject MCC ≠ 412 unless allowForeignRecipient feature flag set for tenant.
  2. Idempotency guard. Compute dispatchKey = {notificationId}:{recipientId}. SETNX chan:inflight:{dispatchKey} <executionId> EX 300. On collision return the existing executionId (at-least-once safety).
  3. Load policy. GET chan:policy:{tenantId}:{useCase} from Redis (TTL 300 s). Miss → read from chan.fallback_policies; cache. If absent, use default (see DOMAIN_MODEL §2.2).
  4. Load recipient profile. GET chan:profile:{msisdnHash} from Redis (TTL 60 s). Miss → chan.recipient_profiles; profile-absent is treated as UNSEEN with synthetic defaults.
  5. Gate: consent per channel. Call ConsentLedgerService.CheckConsent(msisdn, tenantId, channels[]) with 10 ms deadline; cached 60 s under chan:gate:{tenantId}:{recipientHash}:{useCase}. Channels without consent are flagged recipient_opt_out.
  6. Gate: compliance per channel. Call ComplianceEngine.EvaluateChannelCompliance(channels[], body, senderId) with 15 ms deadline; cached 60 s. Channels with verdict BLOCK are flagged compliance_block.
  7. Gate: sender-ID authorisation. SenderIdRegistry.VerifySender(tenantId, senderId) — hot-cached (see sender-id-registry OBSERVABILITY); on failure emit REFUSED_SENDER_UNAUTHORIZED.
  8. Gate: per-adapter circuit-breaker. In-process; OPEN adapter → step_skipped reason=adapter_circuit_open.
  9. Gate: OTT link existence. WhatsApp/Telegram/Viber steps require a known link on the profile; otherwise step_skipped reason=no_link.
  10. Build resolved ladder = policy.ladder filtered by (5–9), preserving order. If resolved length is 0 → emit notification.delivery.outcome.v1 { final: REFUSED_NO_CHANNEL } and ACK.
  11. Pre-flight cost cap. Sum of worst-case per-step cost must be ≤ policy.costCapPerMessage; if not, shrink the ladder tail until it fits. If even the first step breaches → REFUSED_COST_CAP.
  12. Persist execution. INSERT chan.fallback_executions with ladderSnapshot; emit channel.fallback.ladder.resolved.v1 (internal tracing).
  13. Kick off step 0 dispatch via UC-02. Return ack to orchestrator immediately — the ladder progression is async beyond step 0.

Error mapping — table in API_CONTRACTS §1.

Fail-closed on dependency unavailable — if consent-ledger is unreachable past its 10 ms deadline and the cached gating entry is missing or stale > 5 min → return REFUSED_CONSENT_UNKNOWN; do not dispatch. Compliance failures similarly fail-closed (match compliance-engine fail-closed posture).


3. UC-02 — DeliverViaChannel (per-step adapter dispatch)

Trigger. Internal call from UC-01 (step 0) or UC-03 (subsequent steps). One attempt = one DeliveryAttempt row + one provider call.

Per-channel dispatch.

ChannelAdapter callExpected P95Terminal via
SMSNATS publish sms.outbound.dispatch.v1 { messageId, tenantId, to, from, body, classOfService }4 ssms.dlr.inbound
WHATSAPPPOST https://graph.facebook.com/v20.0/{phoneNumberId}/messages with Authorization: Bearer {token} and template payload1 sMeta status webhook (sent / delivered / read / failed)
TELEGRAMPOST https://api.telegram.org/bot{token}/sendMessage { chat_id, text, parse_mode: "HTML" }500 ms200 response contains message_id; no richer terminal (best-effort)
VIBERPOST https://chatapi.viber.com/pa/send_message with X-Viber-Auth-Token1 sViber webhook (delivered / seen / failed)
VOICEgRPC VoiceOtpService.PlayOtp(recipient, otpDigits, language, retries)9 sCall-completion callback (ANSWERED / BUSY / NO_ANSWER / FAILED)
EMAILSMTP submit to egress MTA2 sBounce or MTA ACK

Shared steps:

  1. Acquire per-provider token-bucket slot: INCR chan:tps:{provider}:{accountId} with window 1 s; on deny retry after jitter up to retryBudget.
  2. Register deadline timer in Redis ZSET: ZADD chan:deadlines <now+deadlineSeconds> <attemptId>.
  3. Call provider; on 2xx, persist providerMessageId and transition attempt to sent.
  4. On 4xx permanent (e.g. WhatsApp 131026 recipient unsupported, Telegram 403 bot blocked), emit channel.delivery.failed.v1 with reason=rejected_by_recipient; mark link INVALID on the profile.
  5. On 5xx / transient, apply breaker logic and back-off (exponential, base 500 ms, max 8 s, jitter ±25%).
  6. On provider-webhook arrival (WhatsApp wamid..., Viber message_token), correlate via chan.delivery_attempts.provider_message_id and transition to terminal status.

HMAC signature verification on provider webhooks:

ProviderHeaderVerification
WhatsApp CloudX-Hub-Signature-256: sha256=...HMAC_SHA256(appSecret, rawBody) — drop on mismatch
TelegramSecret path /v1/webhooks/telegram/{secretPath} + optional X-Telegram-Bot-Api-Secret-TokenCompare constant-time
ViberX-Viber-Content-SignatureHMAC_SHA256(authToken, rawBody)

4. UC-03 — HandleFallback

Trigger. Two conditions both emit a progression event:

  • Deadline scanner (every 1 s) reads ZRANGEBYSCORE chan:deadlines 0 now and finds an elapsed attempt whose status is not terminal.
  • Provider-side terminal negative status (failed, rejected_by_provider/recipient).

Steps:

  1. Race resolution. WATCH chan:exec:{executionId}; MULTI; check current stepIndex; a race between positive DLR and deadline elapse is resolved so exactly one progression outcome is emitted.
  2. Progression vs termination decision.
    • If current stepIndex + 1 < len(ladder) → progress (UC-02 on next step), emit channel.fallback.taken.v1.
    • Otherwise → terminate with final = FAILED; emit notification.delivery.outcome.v1.
  3. Cost-cap check. Before calling UC-02 on step N+1, sum current totalCostNgn + expectedCostOfNextStep. If > policy.costCapPerMessage → terminate with REFUSED_COST_CAP.
  4. Emit per-step billing event for the step just terminated (UC-08).
  5. Update recipient profile (UC-06) with the observed per-channel outcome — the LWW feedback loop.

Positive DLR path — arrival of delivered / delivered_read / ANSWERED+played terminates the execution: cancel the deadline timer (ZREM chan:deadlines {attemptId}), emit channel.delivery.confirmed.v1 + notification.delivery.outcome.v1 { final: DELIVERED }.


5. UC-04 — RouteMo (inbound MO routing)

Trigger. JetStream durable consumer chan-mo-router on mo.allowed.v1 (upstream: sms-firewall-service; fallback sms.mo.received.v1 when firewall bypass is active).

Steps:

  1. Parse MO envelope { messageId, originatorMsisdn, destination, body, mno, receivedAt }.
  2. Normalise MSISDN to E.164; compute msisdnHash.
  3. Session lookupHGETALL chan:session:{senderOrDestination}:{msisdnHash}. Session has precedence over static map because a two-way conversation must return to the originating tenant.
  4. Static fallback — if no session, SELECT tenant_id, webhook_url, secret_ref FROM chan.tenant_inbound_routes WHERE inbound = $destination AND active = TRUE AND (grace_period_ends_at IS NULL OR grace_period_ends_at > now()).
  5. Unmatched — no session, no route → publish mo.unmatched.v1 { messageId, destination, originatorMsisdnMasked } for analytics; do not store body.
  6. STOP keyword detection (pre-dispatch). If body matches the tenant's STOP set (default STOP, UNSUBSCRIBE, کنسل, متوقف, ایست), call ConsentLedgerService.RecordOptOut(msisdn, tenantId, channel, reason="stop_keyword"); mark session CLOSED_STOP (even before webhook delivery attempt).
  7. Dispatch to tenant webhook via webhook-dispatcher.Deliver(webhookId, payload) which handles HMAC v2 signing, retries, and dead-lettering (see webhook-dispatcher/SERVICE_OVERVIEW.md).
  8. Re-publish the MO to sms.mo.inbound so consent-ledger-service (STOP consumer) and sms-firewall-service (feedback) see it — this is in addition to the tenant webhook fan-out (which is the tenant's interface, not a platform consumer interface).
  9. Session refresh — if session was OPEN, HSET lastSeenAt now; EXPIRE chan:session:... sessionTtlSeconds.

Latency target — P95 ≤ 1 s from mo.allowed.v1 consume to tenant-webhook 2xx.


6. UC-05 — ConversationSession management

Create. On first successful MT dispatch for (senderId, msisdnHash, tenantId) that was not already session-backed:

  • HSET chan:session:{senderId}:{msisdnHash} tenantId ... conversationId conv_ULID openedAt now lastSeenAt now turnCount 1
  • EXPIRE ... sessionTtlSeconds
  • INSERT chan.conversations via outbox
  • Emit channel.conversation.started.v1

Refresh. On every subsequent MT or MO success: HINCRBY turnCount 1; HSET lastSeenAt now; EXPIRE.

Close.

  • STOP keyword → status = CLOSED_STOP
  • Redis key expiry → keyspace-notification worker writes status = CLOSED_IDLE
  • Explicit admin close via REST → status = CLOSED_MANUAL All closures emit channel.conversation.ended.v1 and are terminal.

Reconciliation. Daily job compares chan:session:* vs chan.conversations WHERE status = OPEN; Redis-lost entries are closed with reason=redis_loss.


7. UC-06 — UpdateRecipientProfile

Trigger. A delivery-feedback consumer on channel.delivery.confirmed.v1 / .failed.v1. Runs async so the hot path is unaffected.

Steps:

  1. For each attempt, compute a channel-preference delta:
    • Positive terminal (DELIVERED, ANSWERED) → +Δ (min 5, max 20 depending on confidence)
    • rejected_by_recipient → −Δ (min 10, max 40) + link marked INVALID
    • Ambiguous (sent only, timeout) → ±0
  2. LWW merge: UPDATE chan.recipient_profiles SET channelPreferences = jsonb_set(..., ..., ...) WHERE msisdn_hash = $1 AND updatedAt < NEW.updatedAt.
  3. Transition discoveryState: UNSEEN → LEARNING after first observation; LEARNING → STABLE after ≥ 5 outcomes across ≥ 2 channels.
  4. Emit channel.recipient.profile.updated.v1 (analytics only; subject retained 90 d).

8. UC-07 — EnforceChannelConsent

Per-channel scopes (per consent-ledger's scope model):

  • SMSTRANSACTIONAL for OTP/txn, MARKETING for marketing, EMERGENCY always-on
  • WHATSAPP — MARKETING category requires explicit opt-in (Meta business-template rule)
  • TELEGRAM — Telegram Terms of Service require the user initiated the conversation; platform enforces link exists + last activity ≤ 30 d
  • VIBER — PA-initiated outbound permitted under Viber business rules; opt-out respected
  • VOICE — OTP-only by default; MARKETING voice gated by distinct scope VOICE_MARKETING
  • EMAIL — CAN-SPAM / GDPR-style double opt-in; verified-email flag required

Cache structurechan:gate:{tenantId}:{sha256(msisdn ‖ tenantSalt)}:{useCase} stores a {channel: allowed|denied|reason} map with TTL 60 s. Invalidated on consent.revoked.v1 for the affected (tenantId, msisdnHash).


9. UC-08 — EmitBillingEvent

Per-attempt metering. For every terminal DeliveryAttempt (positive or negative) that actually reached the provider, emit one channel.billing.event.v1 with SKU by channel (sms.outbound.v1, whatsapp.outbound.v1, telegram.outbound.v1, viber.outbound.v1, voice.otp.v1, email.outbound.v1).

Rules:

  • Adapter-refused attempts (breaker open, no-link) do not meter.
  • Voice OTP meters only when status = ANSWERED && playedDigits = otp.length.
  • WhatsApp template_rejected still meters (the API call was made).
  • Nats-Msg-Id = {notificationId}:{recipientId}:{stepIndex} for inbox-side dedup at the billing consumer.

10. UC-09 — EnforceTenantFallbackCap

Pre-step-transition guard. Evaluated in-process (no DB/Redis call) using the ladderSnapshot + attempts[].costNgn projection already loaded on the execution. Two counters:

  • executionTotalCostNgn (observed) — sum of costNgn across completed attempts.
  • nextStepExpectedCostNgn (model-driven) — based on chan.channel_adapter_configs.pricing_model lookup at policy-load time.

If executionTotalCostNgn + nextStepExpectedCostNgn > policy.costCapPerMessage → terminate with REFUSED_COST_CAP. This protects against runaway fallback cost, e.g. a mis-configured policy chain that sends via Voice OTP on marketing traffic.


11. Sequence Diagram — Typical Fallback (SMS → WhatsApp → Voice)


12. Performance & Concurrency

  • Per-pod inflight cap: MAX_INFLIGHT = 200 (consumer) + MAX_GRPC_INFLIGHT = 1000. Exceeding either returns RESOURCE_EXHAUSTED on gRPC and leaves NATS messages un-acked for JetStream redelivery.
  • Deadline scanner runs per replica with a distributed lock (chan:deadline:lock) claimed for 2 s at a time; only one replica does the scan to avoid duplicate progressions.
  • Profile-update consumer scales independently on a separate subject (channel.delivery.confirmed.v1, .failed.v1) with HPA on consumer lag.
  • OTT-adapter Deployments are separate so WhatsApp adapter pressure does not drain Telegram / Viber pools.