Channel Router Service — Jira-Ready Epics & User Stories
Status: populated Owner: Messaging Core Last updated: 2026-04-20 Service prefix: CHAN Scope: Multi-channel fallback engine (SMS → WhatsApp → Voice → Email), OTT provider adapters (WhatsApp Cloud API, Telegram Bot, Viber Business), inbound MO routing to tenant webhooks, conversational session management. Derived from
docs/07-epics-and-user-stories.md§6.7 and ADR-0004 §3.
Epic Summary
| Epic ID | Title | Stories | Points |
|---|---|---|---|
| EP-CHAN-01 | Multi-Channel Fallback Engine (SMS → WhatsApp BSP → Voice OTP → email) per Recipient Profile | US-CHAN-001 – US-CHAN-008 | 42 |
| EP-CHAN-02 | OTT Provider Adapters (WhatsApp Cloud API, Telegram Bot, Viber) | US-CHAN-009 – US-CHAN-014 | 29 |
| EP-CHAN-03 | Inbound MO Routing to Tenant Webhook (2-way SMS) | US-CHAN-015 – US-CHAN-018 | 18 |
| EP-CHAN-04 | Conversational Session Manager | US-CHAN-019 – US-CHAN-023 | 19 |
| Total | 23 stories | 108 |
EP-CHAN-01 · Multi-Channel Fallback Engine
Context: Per-recipient ladder traversal across SMS, WhatsApp, Voice OTP, and Email channels driven by tenant policy and recipient profile. Single canonical outcome event per recipient.
US-CHAN-001 · Consume notification dispatch request
Type: Feature | Points: 5
Description:
As the channel router NATS consumer, I need to subscribe to notification.dispatch.requested.v1 so every omnichannel notification from sms-orchestrator enters the ladder.
Acceptance Criteria:
- Durable JetStream consumer
chan-router, AckExplicit, AckWait 60 s, MaxDeliver 5 - Inbox dedup on
Nats-Msg-Id = {notificationId}:{recipientId} -
MAX_INFLIGHTconfigurable; backpressure via un-acked messages - After MaxDeliver, message lands on
notification.dispatch.deadletter.v1 - Metrics:
chan_consumer_lag,chan_inflight_dispatches,chan_deadletter_total
US-CHAN-002 · Compose channel ladder per tenant policy
Type: Feature | Points: 8
Description: As a tenant configuring an omnichannel campaign, I need to define a fallback ladder per use case so the platform attempts channels in my preferred order.
Acceptance Criteria:
-
chan.tenant_policyrow keyed(tenantId, useCase)with ladder JSONB - Default ladders applied for
otpandmarketingwhen row absent - Admin REST
PUT /v1/admin/tenants/{tenantId}/policy/{useCase} - Ladder cached in Redis; refreshed on
chan.tenant_policy.changed.v1 - Max 6 steps; invalid →
400 INVALID_LADDER
US-CHAN-003 · Compliance & consent gating per channel
Type: Feature | Points: 5
Description:
As the decision core, I need per-channel verdicts from compliance-engine and per-channel consent from consent-ledger-service before composing the final ladder.
Acceptance Criteria:
- Per-channel verdicts ALLOW/FLAG/BLOCK map to ladder inclusion
- Per-channel consent maps to inclusion with reason
recipient_opt_out - Combined exclusion list captured in
fallback_path[] - No channel survives →
final = "REFUSED_NO_CHANNEL"with reasons - Combined gating call P95 ≤ 30 ms
US-CHAN-004 · Step deadline enforcement and ladder progression
Type: Feature | Points: 8
Description: As the fallback engine, I need to enforce per-step deadlines and progress on negative outcome or timeout.
Acceptance Criteria:
- Deadline timers in Redis ZSET
chan:deadlines - Worker scans every 1 s and progresses elapsed ladders
- Positive DLR/status before deadline terminates with
DELIVERED - Race between positive DLR and deadline guarded by Redis WATCH/MULTI
- Exactly one terminal outcome per recipient
US-CHAN-005 · Per-channel adapter dispatch
Type: Feature | Points: 5
Description:
As the decision core, I need to dispatch each step via the appropriate adapter (smpp-connector for SMS, OTT adapters, Voice gateway, SMTP).
Acceptance Criteria:
-
ChannelAdapterport with concrete impls per channel - SMS publishes
sms.outbound.dispatch.v1with idempotentmessageId - WhatsApp calls
POST /v20.0/{phoneNumberId}/messageswith template payload - Voice gRPC
Voice.PlayOtp(recipient, otp_digits, language, retries) - Per-adapter timeout and circuit breaker; failures emit
channel.attempt.recorded.v1
US-CHAN-006 · Single canonical outcome per recipient
Type: Feature | Points: 5
Description:
As tenant integrator, I need exactly one notification.delivery.outcome.v1 per (notificationId, recipientId) regardless of attempts.
Acceptance Criteria:
- Payload includes
final,channel,attempts,fallback_path,occurredAt - Uniqueness enforced via
chan.delivery_outbox UNIQUE(notificationId, recipientId) - Outbox is source of truth; relay never re-publishes
- Tenant receives outcome via
webhook-dispatcher - Per-attempt detail surfaces via separate
channel.attempt.recorded.v1
US-CHAN-007 · Per-attempt billing metering
Type: Feature | Points: 3
Description:
As billing-service, I need one billing.metering.recorded.v1 per channel attempt with the appropriate SKU.
Acceptance Criteria:
- SKUs per channel (
sms.outbound.v1,whatsapp.outbound.v1,voice.otp.v1, etc.) -
Nats-Msg-Id = {notificationId}:{recipientId}:{stepIndex}for inbox dedup - Failed adapter dispatch (no provider call) → no metering event
- Voice OTP only meters on
ANSWERED+ completed playback - WhatsApp template-rejected attempts ARE metered
US-CHAN-008 · Fallback decision cache
Type: Feature | Points: 3
Description:
As the decision core under burst load, I need to cache (tenantId, recipientId, useCase) gating decisions for 60 s.
Acceptance Criteria:
- Redis key
chan:gate:{tenantId}:{recipientHash}:{useCase}, TTL 60 s - Cache hit short-circuits gating round trips
- Invalidated on
consent.changed.v1andcompliance.policy.changed.v1events - Hit/miss ratio metric
chan_gate_cache_hits_total
EP-CHAN-02 · OTT Provider Adapters
Context: First-class OTT integration with WhatsApp Cloud API, Telegram Bot API, and Viber Business. Provider-agnostic via
ChannelAdapterport; status normalisation, circuit breakers, secret rotation.
US-CHAN-009 · WhatsApp Cloud API adapter
Type: Feature | Points: 8
Description:
As the WhatsApp adapter pod, I need to send template messages and process status webhooks for the cloud-api provider.
Acceptance Criteria:
- Template state validated against
chan.whatsapp_template_statebefore dispatch - Token rotation in 60 s via
developer-portal-servicepush - Webhook
POST /v1/webhooks/whatsappvalidated byX-Hub-Signature-256 - Status mapping
sent/delivered/read/failed → in_progress/delivered/delivered_read/failed - Per-
phone-number-idRedis token bucket; 429 → exponential back-off
US-CHAN-010 · Telegram Bot API adapter
Type: Feature | Points: 5
Description:
As the Telegram adapter pod, I need to send via sendMessage and process inbound updates.
Acceptance Criteria:
- Outbound payload
{ chat_id, text, parse_mode: "HTML", disable_web_page_preview: true } - Recipient resolution via
chan.recipient_telegram_link - Per-bot 30/s and per-chat 1/s Redis token buckets
- Inbound webhook
/v1/webhooks/telegram/{secretPath}validated by secret-path - 401/403 marks link
INVALIDand excludes Telegram for that recipient
US-CHAN-011 · Viber Business adapter
Type: Feature | Points: 5
Description:
As the Viber adapter pod, I need to send via chatapi.viber.com/pa/send_message and process events.
Acceptance Criteria:
- Outbound payload with
X-Viber-Auth-Token; eventsmessage/delivered/seen/failedcorrelated bymessage_token - Viber 429 → exponential back-off
-
chan.recipient_viber_linkmapping - Webhook signature validation per Viber spec
US-CHAN-012 · OTT adapter health and circuit breaker
Type: Feature | Points: 3
Description: As channel router, I need to skip an OTT channel when its adapter is unhealthy.
Acceptance Criteria:
- Per-adapter Hystrix breaker, 50-call window, 50 % error threshold
- Open breaker for 60 s; dispatches return
step_skippedwithadapter_circuit_open - Half-open probe every 30 s
- State exposed
chan_adapter_circuit_state{adapter} - Manual override via
POST /v1/admin/adapters/{adapter}/circuit
US-CHAN-013 · Per-OTT account onboarding & token storage
Type: Feature | Points: 5
Description: As tenant onboarding admin, I need to register OTT credentials securely.
Acceptance Criteria:
-
POST /v1/admin/tenants/{tenantId}/ott/{provider}encrypts withchan-ott-kek - Plaintext secrets never returned in GETs
- Rotation via
POST .../rotate; effective in ≤ 60 s on all pods viachan.ott_account.rotated.v1 - Validation step on registration (no-op test call)
- All admin actions emit
audit.admin.action.v1
US-CHAN-014 · OTT delivery status mapping
Type: Feature | Points: 3
Description:
As channel router, I need to normalise per-provider statuses into the canonical AdapterDispatchResult.status enum.
Acceptance Criteria:
- Canonical statuses enumerated
- Versioned
chan.adapter_status_maptable - Unmapped codes →
failed_temp+ metricchan_unknown_provider_status_total - Map updates loaded without deploy on
chan.status_map.changed.v1
EP-CHAN-03 · Inbound MO Routing to Tenant Webhook
Context: Inbound MO from
sms-firewall-serviceandsmpp-connectorrouted to tenant webhooks with HMAC v2 signatures and at-least-once delivery.
US-CHAN-015 · Inbound MO ingest from firewall
Type: Feature | Points: 5
Description:
As chan-mo-router, I need to consume mo.allowed.v1 from sms-firewall-service.
Acceptance Criteria:
- Durable consumer
chan-mo-routeronmo.allowed.v1 - Payload
{ messageId, originatorMsisdn, destinationMsisdn|shortCode, body, mno, receivedAt } - Tenant resolution: session-table first, static
chan.tenant_inbound_routessecond - No match →
mo.unmatched.v1; no webhook -
chan_mo_inbound_total{tenant,shortcode}metric
US-CHAN-016 · HMAC v2 signed webhook delivery
Type: Feature | Points: 5
Description: As tenant integrator, I need HMAC-signed webhook delivery so I can authenticate the platform.
Acceptance Criteria:
-
X-Ghasi-Signature: t={ts},v2={hex(HMAC_SHA256(secret, ts + "." + body))}header -
Idempotency-Key: {messageId}header - Body schema includes
messageId, tenantId, originatorMsisdn, destinationShortCode, body, mno, receivedAt, sessionId? - 2xx ACK; 4xx terminate; 5xx/timeout retry
- Secret rotation with 24 h grace accepting both old and new
US-CHAN-017 · Tenant webhook retry & dead-letter
Type: Feature | Points: 5
Description: As channel router, I need to retry tenant webhooks on transient failures and dead-letter after 5 attempts.
Acceptance Criteria:
- Schedule 1s, 5s, 30s, 5m, 1h with ±25 % jitter
- Per-tenant concurrent webhook cap (default 50)
- After 5 failures →
mo.webhook.deadletter.v1+ portal alert vianotification-service - Tenant re-deliver via
POST /v1/admin/mo/redeliver/{messageId} - Health metrics:
chan_tenant_webhook_5xx_total,chan_tenant_webhook_p95_latency_ms
US-CHAN-018 · Inbound number → tenant static mapping
Type: Feature | Points: 3
Description: As tenant admin, I need to map an inbound number to my webhook URL.
Acceptance Criteria:
-
PUT /v1/admin/tenants/{tenantId}/inbound/{numberOrShortCode}persists inchan.tenant_inbound_routes - Number uniqueness enforced platform-wide
- Cross-checked against
numbering-servicelease state - Soft-delete with 24 h grace
- Changes emit
chan.inbound_route.changed.v1
EP-CHAN-04 · Conversational Session Manager
Context: Sticky correlation across
(senderId, msisdn, tenantId)for two-way SMS conversations. Sessions persist 24 h sliding by default; STOP keyword closes with opt-out registration.
US-CHAN-019 · Conversational session creation
Type: Feature | Points: 5
Description:
As channel router, I need to create a session on the first MT for each (senderId, msisdn, tenantId).
Acceptance Criteria:
- Redis HASH
chan:session:{senderId}:{msisdn}withtenantId, conversationId, openedAt, lastSeenAt, expiresAt - Created on first successful adapter dispatch; refreshed on subsequent MTs
- TTL 24 h sliding; per-tenant override supported
- Persistent backup row in
chan.sessionvia outbox -
chan.session.opened.v1emitted on creation
US-CHAN-020 · Session-aware MO routing
Type: Feature | Points: 3
Description:
As chan-mo-router, I need to resolve MO tenant by session lookup before static inbound mapping.
Acceptance Criteria:
- Lookup order: session → static map → unmatched
- Webhook payload includes
sessionIdandinReplyTo - Per-
senderIdsession scoping (one MSISDN can have multiple sessions to different alphas) - Lookup latency P95 ≤ 5 ms
US-CHAN-021 · STOP keyword closes session
Type: Feature | Points: 5
Description: As recipient, I need STOP (and local-language equivalents) to close the session and register opt-out.
Acceptance Criteria:
- Keywords default:
STOP, UNSUBSCRIBE, کنسل, متوقف, ایست(configurable per tenant) - Match → mark session
CLOSED, callconsent-ledger-service.RecordOptOutwithreason = "stop_keyword" - MO still delivered to webhook with
closedSessionByStop: true -
chan.session.closed.v1emitted withreason = "stop_keyword" - Subsequent dispatches exclude the channel via consent gating (US-CHAN-003)
US-CHAN-022 · Session expiry and cleanup
Type: Feature | Points: 3
Description: As operator, I need idle sessions to expire and be cleaned up automatically.
Acceptance Criteria:
- Per-key Redis TTL with keyspace expiration events
- Worker writes
chan.session.closed.v1withreason = "idle_expiry" - Postgres rows aged out by daily job after 7 d in CLOSED state
- Daily reconciliation handles orphaned PG rows (
reason = "redis_loss")
US-CHAN-023 · Session inspector REST endpoint
Type: Feature | Points: 3
Description: As tenant support engineer, I need to inspect active sessions for my tenant.
Acceptance Criteria:
-
GET /v1/sessions?tenantId=...&senderId=...&msisdnHash=...paginated from Postgres - Returns session fields with hashed MSISDN only
- Tenant scope enforced; admin role bypass
-
POST /v1/sessions/{sessionId}/close(admin / tenant-admin) withreason = "manual"