Skip to main content

Channel Router Service — Jira-Ready Epics & User Stories

Status: populated Owner: Messaging Core Last updated: 2026-04-20 Service prefix: CHAN Scope: Multi-channel fallback engine (SMS → WhatsApp → Voice → Email), OTT provider adapters (WhatsApp Cloud API, Telegram Bot, Viber Business), inbound MO routing to tenant webhooks, conversational session management. Derived from docs/07-epics-and-user-stories.md §6.7 and ADR-0004 §3.


Epic Summary

Epic IDTitleStoriesPoints
EP-CHAN-01Multi-Channel Fallback Engine (SMS → WhatsApp BSP → Voice OTP → email) per Recipient ProfileUS-CHAN-001 – US-CHAN-00842
EP-CHAN-02OTT Provider Adapters (WhatsApp Cloud API, Telegram Bot, Viber)US-CHAN-009 – US-CHAN-01429
EP-CHAN-03Inbound MO Routing to Tenant Webhook (2-way SMS)US-CHAN-015 – US-CHAN-01818
EP-CHAN-04Conversational Session ManagerUS-CHAN-019 – US-CHAN-02319
Total23 stories108

EP-CHAN-01 · Multi-Channel Fallback Engine

Context: Per-recipient ladder traversal across SMS, WhatsApp, Voice OTP, and Email channels driven by tenant policy and recipient profile. Single canonical outcome event per recipient.

US-CHAN-001 · Consume notification dispatch request

Type: Feature | Points: 5

Description: As the channel router NATS consumer, I need to subscribe to notification.dispatch.requested.v1 so every omnichannel notification from sms-orchestrator enters the ladder.

Acceptance Criteria:

  • Durable JetStream consumer chan-router, AckExplicit, AckWait 60 s, MaxDeliver 5
  • Inbox dedup on Nats-Msg-Id = {notificationId}:{recipientId}
  • MAX_INFLIGHT configurable; backpressure via un-acked messages
  • After MaxDeliver, message lands on notification.dispatch.deadletter.v1
  • Metrics: chan_consumer_lag, chan_inflight_dispatches, chan_deadletter_total

US-CHAN-002 · Compose channel ladder per tenant policy

Type: Feature | Points: 8

Description: As a tenant configuring an omnichannel campaign, I need to define a fallback ladder per use case so the platform attempts channels in my preferred order.

Acceptance Criteria:

  • chan.tenant_policy row keyed (tenantId, useCase) with ladder JSONB
  • Default ladders applied for otp and marketing when row absent
  • Admin REST PUT /v1/admin/tenants/{tenantId}/policy/{useCase}
  • Ladder cached in Redis; refreshed on chan.tenant_policy.changed.v1
  • Max 6 steps; invalid → 400 INVALID_LADDER

Type: Feature | Points: 5

Description: As the decision core, I need per-channel verdicts from compliance-engine and per-channel consent from consent-ledger-service before composing the final ladder.

Acceptance Criteria:

  • Per-channel verdicts ALLOW/FLAG/BLOCK map to ladder inclusion
  • Per-channel consent maps to inclusion with reason recipient_opt_out
  • Combined exclusion list captured in fallback_path[]
  • No channel survives → final = "REFUSED_NO_CHANNEL" with reasons
  • Combined gating call P95 ≤ 30 ms

US-CHAN-004 · Step deadline enforcement and ladder progression

Type: Feature | Points: 8

Description: As the fallback engine, I need to enforce per-step deadlines and progress on negative outcome or timeout.

Acceptance Criteria:

  • Deadline timers in Redis ZSET chan:deadlines
  • Worker scans every 1 s and progresses elapsed ladders
  • Positive DLR/status before deadline terminates with DELIVERED
  • Race between positive DLR and deadline guarded by Redis WATCH/MULTI
  • Exactly one terminal outcome per recipient

US-CHAN-005 · Per-channel adapter dispatch

Type: Feature | Points: 5

Description: As the decision core, I need to dispatch each step via the appropriate adapter (smpp-connector for SMS, OTT adapters, Voice gateway, SMTP).

Acceptance Criteria:

  • ChannelAdapter port with concrete impls per channel
  • SMS publishes sms.outbound.dispatch.v1 with idempotent messageId
  • WhatsApp calls POST /v20.0/{phoneNumberId}/messages with template payload
  • Voice gRPC Voice.PlayOtp(recipient, otp_digits, language, retries)
  • Per-adapter timeout and circuit breaker; failures emit channel.attempt.recorded.v1

US-CHAN-006 · Single canonical outcome per recipient

Type: Feature | Points: 5

Description: As tenant integrator, I need exactly one notification.delivery.outcome.v1 per (notificationId, recipientId) regardless of attempts.

Acceptance Criteria:

  • Payload includes final, channel, attempts, fallback_path, occurredAt
  • Uniqueness enforced via chan.delivery_outbox UNIQUE(notificationId, recipientId)
  • Outbox is source of truth; relay never re-publishes
  • Tenant receives outcome via webhook-dispatcher
  • Per-attempt detail surfaces via separate channel.attempt.recorded.v1

US-CHAN-007 · Per-attempt billing metering

Type: Feature | Points: 3

Description: As billing-service, I need one billing.metering.recorded.v1 per channel attempt with the appropriate SKU.

Acceptance Criteria:

  • SKUs per channel (sms.outbound.v1, whatsapp.outbound.v1, voice.otp.v1, etc.)
  • Nats-Msg-Id = {notificationId}:{recipientId}:{stepIndex} for inbox dedup
  • Failed adapter dispatch (no provider call) → no metering event
  • Voice OTP only meters on ANSWERED + completed playback
  • WhatsApp template-rejected attempts ARE metered

US-CHAN-008 · Fallback decision cache

Type: Feature | Points: 3

Description: As the decision core under burst load, I need to cache (tenantId, recipientId, useCase) gating decisions for 60 s.

Acceptance Criteria:

  • Redis key chan:gate:{tenantId}:{recipientHash}:{useCase}, TTL 60 s
  • Cache hit short-circuits gating round trips
  • Invalidated on consent.changed.v1 and compliance.policy.changed.v1 events
  • Hit/miss ratio metric chan_gate_cache_hits_total

EP-CHAN-02 · OTT Provider Adapters

Context: First-class OTT integration with WhatsApp Cloud API, Telegram Bot API, and Viber Business. Provider-agnostic via ChannelAdapter port; status normalisation, circuit breakers, secret rotation.

US-CHAN-009 · WhatsApp Cloud API adapter

Type: Feature | Points: 8

Description: As the WhatsApp adapter pod, I need to send template messages and process status webhooks for the cloud-api provider.

Acceptance Criteria:

  • Template state validated against chan.whatsapp_template_state before dispatch
  • Token rotation in 60 s via developer-portal-service push
  • Webhook POST /v1/webhooks/whatsapp validated by X-Hub-Signature-256
  • Status mapping sent/delivered/read/failed → in_progress/delivered/delivered_read/failed
  • Per-phone-number-id Redis token bucket; 429 → exponential back-off

US-CHAN-010 · Telegram Bot API adapter

Type: Feature | Points: 5

Description: As the Telegram adapter pod, I need to send via sendMessage and process inbound updates.

Acceptance Criteria:

  • Outbound payload { chat_id, text, parse_mode: "HTML", disable_web_page_preview: true }
  • Recipient resolution via chan.recipient_telegram_link
  • Per-bot 30/s and per-chat 1/s Redis token buckets
  • Inbound webhook /v1/webhooks/telegram/{secretPath} validated by secret-path
  • 401/403 marks link INVALID and excludes Telegram for that recipient

US-CHAN-011 · Viber Business adapter

Type: Feature | Points: 5

Description: As the Viber adapter pod, I need to send via chatapi.viber.com/pa/send_message and process events.

Acceptance Criteria:

  • Outbound payload with X-Viber-Auth-Token; events message/delivered/seen/failed correlated by message_token
  • Viber 429 → exponential back-off
  • chan.recipient_viber_link mapping
  • Webhook signature validation per Viber spec

US-CHAN-012 · OTT adapter health and circuit breaker

Type: Feature | Points: 3

Description: As channel router, I need to skip an OTT channel when its adapter is unhealthy.

Acceptance Criteria:

  • Per-adapter Hystrix breaker, 50-call window, 50 % error threshold
  • Open breaker for 60 s; dispatches return step_skipped with adapter_circuit_open
  • Half-open probe every 30 s
  • State exposed chan_adapter_circuit_state{adapter}
  • Manual override via POST /v1/admin/adapters/{adapter}/circuit

US-CHAN-013 · Per-OTT account onboarding & token storage

Type: Feature | Points: 5

Description: As tenant onboarding admin, I need to register OTT credentials securely.

Acceptance Criteria:

  • POST /v1/admin/tenants/{tenantId}/ott/{provider} encrypts with chan-ott-kek
  • Plaintext secrets never returned in GETs
  • Rotation via POST .../rotate; effective in ≤ 60 s on all pods via chan.ott_account.rotated.v1
  • Validation step on registration (no-op test call)
  • All admin actions emit audit.admin.action.v1

US-CHAN-014 · OTT delivery status mapping

Type: Feature | Points: 3

Description: As channel router, I need to normalise per-provider statuses into the canonical AdapterDispatchResult.status enum.

Acceptance Criteria:

  • Canonical statuses enumerated
  • Versioned chan.adapter_status_map table
  • Unmapped codes → failed_temp + metric chan_unknown_provider_status_total
  • Map updates loaded without deploy on chan.status_map.changed.v1

EP-CHAN-03 · Inbound MO Routing to Tenant Webhook

Context: Inbound MO from sms-firewall-service and smpp-connector routed to tenant webhooks with HMAC v2 signatures and at-least-once delivery.

US-CHAN-015 · Inbound MO ingest from firewall

Type: Feature | Points: 5

Description: As chan-mo-router, I need to consume mo.allowed.v1 from sms-firewall-service.

Acceptance Criteria:

  • Durable consumer chan-mo-router on mo.allowed.v1
  • Payload { messageId, originatorMsisdn, destinationMsisdn|shortCode, body, mno, receivedAt }
  • Tenant resolution: session-table first, static chan.tenant_inbound_routes second
  • No match → mo.unmatched.v1; no webhook
  • chan_mo_inbound_total{tenant,shortcode} metric

US-CHAN-016 · HMAC v2 signed webhook delivery

Type: Feature | Points: 5

Description: As tenant integrator, I need HMAC-signed webhook delivery so I can authenticate the platform.

Acceptance Criteria:

  • X-Ghasi-Signature: t={ts},v2={hex(HMAC_SHA256(secret, ts + "." + body))} header
  • Idempotency-Key: {messageId} header
  • Body schema includes messageId, tenantId, originatorMsisdn, destinationShortCode, body, mno, receivedAt, sessionId?
  • 2xx ACK; 4xx terminate; 5xx/timeout retry
  • Secret rotation with 24 h grace accepting both old and new

US-CHAN-017 · Tenant webhook retry & dead-letter

Type: Feature | Points: 5

Description: As channel router, I need to retry tenant webhooks on transient failures and dead-letter after 5 attempts.

Acceptance Criteria:

  • Schedule 1s, 5s, 30s, 5m, 1h with ±25 % jitter
  • Per-tenant concurrent webhook cap (default 50)
  • After 5 failures → mo.webhook.deadletter.v1 + portal alert via notification-service
  • Tenant re-deliver via POST /v1/admin/mo/redeliver/{messageId}
  • Health metrics: chan_tenant_webhook_5xx_total, chan_tenant_webhook_p95_latency_ms

US-CHAN-018 · Inbound number → tenant static mapping

Type: Feature | Points: 3

Description: As tenant admin, I need to map an inbound number to my webhook URL.

Acceptance Criteria:

  • PUT /v1/admin/tenants/{tenantId}/inbound/{numberOrShortCode} persists in chan.tenant_inbound_routes
  • Number uniqueness enforced platform-wide
  • Cross-checked against numbering-service lease state
  • Soft-delete with 24 h grace
  • Changes emit chan.inbound_route.changed.v1

EP-CHAN-04 · Conversational Session Manager

Context: Sticky correlation across (senderId, msisdn, tenantId) for two-way SMS conversations. Sessions persist 24 h sliding by default; STOP keyword closes with opt-out registration.

US-CHAN-019 · Conversational session creation

Type: Feature | Points: 5

Description: As channel router, I need to create a session on the first MT for each (senderId, msisdn, tenantId).

Acceptance Criteria:

  • Redis HASH chan:session:{senderId}:{msisdn} with tenantId, conversationId, openedAt, lastSeenAt, expiresAt
  • Created on first successful adapter dispatch; refreshed on subsequent MTs
  • TTL 24 h sliding; per-tenant override supported
  • Persistent backup row in chan.session via outbox
  • chan.session.opened.v1 emitted on creation

US-CHAN-020 · Session-aware MO routing

Type: Feature | Points: 3

Description: As chan-mo-router, I need to resolve MO tenant by session lookup before static inbound mapping.

Acceptance Criteria:

  • Lookup order: session → static map → unmatched
  • Webhook payload includes sessionId and inReplyTo
  • Per-senderId session scoping (one MSISDN can have multiple sessions to different alphas)
  • Lookup latency P95 ≤ 5 ms

US-CHAN-021 · STOP keyword closes session

Type: Feature | Points: 5

Description: As recipient, I need STOP (and local-language equivalents) to close the session and register opt-out.

Acceptance Criteria:

  • Keywords default: STOP, UNSUBSCRIBE, کنسل, متوقف, ایست (configurable per tenant)
  • Match → mark session CLOSED, call consent-ledger-service.RecordOptOut with reason = "stop_keyword"
  • MO still delivered to webhook with closedSessionByStop: true
  • chan.session.closed.v1 emitted with reason = "stop_keyword"
  • Subsequent dispatches exclude the channel via consent gating (US-CHAN-003)

US-CHAN-022 · Session expiry and cleanup

Type: Feature | Points: 3

Description: As operator, I need idle sessions to expire and be cleaned up automatically.

Acceptance Criteria:

  • Per-key Redis TTL with keyspace expiration events
  • Worker writes chan.session.closed.v1 with reason = "idle_expiry"
  • Postgres rows aged out by daily job after 7 d in CLOSED state
  • Daily reconciliation handles orphaned PG rows (reason = "redis_loss")

US-CHAN-023 · Session inspector REST endpoint

Type: Feature | Points: 3

Description: As tenant support engineer, I need to inspect active sessions for my tenant.

Acceptance Criteria:

  • GET /v1/sessions?tenantId=...&senderId=...&msisdnHash=... paginated from Postgres
  • Returns session fields with hashed MSISDN only
  • Tenant scope enforced; admin role bypass
  • POST /v1/sessions/{sessionId}/close (admin / tenant-admin) with reason = "manual"