Skip to main content

Channel Router Service — Testing Strategy

Version: 1.0 Status: Draft Owner: Messaging Core Last Updated: 2026-04-21 Companion: SERVICE_READINESS · APPLICATION_LOGIC · SECURITY_MODEL

The channel-router is hot-path; tests must cover correctness (one outcome per recipient, no silent channel drop, fail-closed on consent), performance (sub-50 ms decision under 5000 RPS), and security (no cross-tenant leakage, no PII in events). Test pyramid follows the platform standard: unit ≥ 80%, integration end-to-end the major flows, contract tests on every cross-service surface, load + chaos as gating criteria.


1. Test pyramid

LayerCoverage targetTooling
Unit≥ 85% line / ≥ 80% branch (domain ≥ 90%)Jest + ts-mockito
IntegrationAll hot-path use casesJest + Testcontainers (PG, Redis, NATS, mock-Triton)
ContractEvery gRPC + REST + NATS surfacePact, buf breaking, JSON Schema
E2ECritical flowsCypress (admin), k6 (gRPC)
Load5000 RPS sustained 1 hk6 / ghz
ChaosAll FM-01..FM-13LitmusChaos
SecurityOWASP API Top-10 + RLS + audit chainOWASP ZAP, custom RLS test harness

2. Unit tests

2.1 Fallback policy evaluator

  • Default ladder applied when no policy row exists for (tenantId, useCase).
  • Ladder length validation (1..6); reject 7-step write with INVALID_LADDER.
  • Duplicate-channel rejection.
  • PARALLEL strategy with a non-OTT channel rejected with CHAN_PARALLEL_STRATEGY_INVALID.
  • costCapPerMessage < cheapest path → POLICY_UNREACHABLE_COST_CAP.
  • LWW ordering on policy version conflict (resolved by updated_at).

2.2 Cost calculator

  • Per-channel cost lookup by pricing_model JSON; defaults applied for missing entries.
  • Rounding semantics (NUMERIC(12,4); banker's rounding).
  • Cumulative cost across attempts equals sum of delivery_attempts.cost_ngn.
  • EnforceTenantFallbackCap short-circuit on next-step expected cost > remaining budget.
  • Per-channel consent map respected (deny SMS for marketing scope when MARKETING opt-out).
  • consent.revoked.v1 invalidates chan:gate:* for the affected (tenantId, msisdnHash).
  • Cache miss + dependency unreachable beyond deadline → REFUSED_CONSENT_UNKNOWN (fail-closed).
  • Compliance verdict BLOCK for one channel does not affect other channels in the ladder.

2.4 State machine — DeliveryAttempt

  • Transitions: accepted → sent → delivered (positive); accepted → failed_temp → step_skipped (transient); accepted → rejected_by_recipient (terminal negative); accepted → timed_out (deadline elapse).
  • Race: positive DLR vs deadline elapse — Redis WATCH/MULTI guard ensures exactly one terminal outcome.
  • step_skipped produces no billing event.

2.5 Conversation correlation

  • Session created on first successful MT.
  • Sliding TTL refresh on every MT/MO success.
  • STOP keyword (multi-language) closes session and emits consent.RecordOptOut.
  • Two simultaneous OPEN sessions for one MSISDN under different senderIds route MOs independently (US-CHAN-020 AC#3).
  • turnCount strictly monotonic under concurrent MT race.

2.6 MSISDN normalisation + hashing

  • E.164 parser handles Afghan local-format inputs (07XXXXXXXX+937XXXXXXXX).
  • Per-tenant salt produces distinct hashes for the same MSISDN across tenants.
  • Reject MCC ≠ 412 unless allowForeignRecipient = true.

2.7 STOP-keyword matcher

  • ASCII set: STOP, UNSUBSCRIBE.
  • Pashto/Dari/Arabic set: کنسل, متوقف, ایست.
  • Case-insensitive; whitespace-trimmed; segment-aware (only first segment matched).
  • Tenant-override appends to default set (US-CHAN-021 AC#1).

2.8 Provider-status canonicalisation

  • WhatsApp 131026rejected_by_recipient.
  • Telegram 403 (bot blocked)rejected_by_recipient.
  • Viber failed event → failed_perm.
  • Unmapped code defaults to failed_temp and increments chan_unknown_provider_status_total.

3. Integration tests (Testcontainers)

3.1 Hot-path RouteWithFallback

  • gRPC at 5000 RPS sustained 60 s; P95 ≤ 50 ms; no errors.
  • Cold-cache cold-start path (Redis empty); P95 ≤ 80 ms; no fail-closed unless the dependency itself is down.
  • Cache invalidation by consent.revoked.v1 reflects in next decision within 1 s.

3.2 OTT adapter integration (mock providers)

  • WhatsApp Cloud mock at https://wa-mock:8443/v20.0/{phone-id}/messages:
    • Happy path: 200 + status webhook → terminal delivered.
    • Template rejected (131008): adapter returns failed_perm; ladder progresses.
    • 429 rate-limit: exponential back-off; eventual success or failed_temp → ladder progresses.
  • Telegram mock: bot-blocked path returns 403; profile link marked INVALID.
  • Viber mock: signature-mismatch on webhook → 401; no correlation; counter increments.
  • Voice OTP mock: NO_ANSWER triggers ladder progression; ANSWERED terminates.
  • Live consent-ledger Testcontainer; revoke MARKETING for an MSISDN; subsequent marketing dispatch excludes SMS+WhatsApp; outcome REFUSED_NO_CHANNEL if no allowed channel.
  • Cache age 60 s — revocation propagates via consent.revoked.v1 within 5 s.

3.4 Full fallback cascade (SMS → WhatsApp → Voice)

  • Inject SMS DLR UNDELIVERED after 60 s.
  • Inject WhatsApp failed webhook after 25 s.
  • Voice mock returns ANSWERED.
  • Assert: 3 attempts, 1 outcome DELIVERED via voice, 3 billing events, fallback_path[3].

3.5 MO routing roundtrip

  • Publish mo.allowed.v1 for tenant T with destination 2211; webhook mock asserts HMAC-v2 signature; payload includes routeType=STATIC.
  • Open session via prior MT; subsequent MO routes via SESSION; inReplyTo populated.
  • STOP keyword closes session; consent-ledger receives RecordOptOut; tenant webhook still receives MO with closedSessionByStop=true.

3.6 Profile learning loop

  • 10 successive delivered outcomes on WhatsApp for an MSISDN raise whatsapp.score by ≥ +30 across 1 hour; LWW merge correctness under simulated out-of-order events.

3.7 Outbox + outcome uniqueness

  • Crash relay mid-publish; restart; assert no duplicate notification.delivery.outcome.v1 for the same (notificationId, recipientId).
  • Concurrent dispatch with same idempotencyKey produces one execution.

4. Contract tests

Producer ↔ ConsumerContract
sms-orchestrator → channel-routergRPC RouteWithFallback request/response (Pact)
channel-router → consent-ledgergRPC CheckConsent (Pact)
channel-router → compliance-enginegRPC EvaluateChannelCompliance (Pact)
channel-router → sender-id-registrygRPC VerifySender (Pact)
channel-router → webhook-dispatchergRPC Deliver(...) MO webhook (Pact)
channel-router → billing-serviceNATS channel.billing.event.v1 JSON Schema
channel-router → analytics-serviceNATS notification.delivery.outcome.v1 JSON Schema
consent-ledger → channel-routerNATS consent.revoked.v1 JSON Schema
WhatsApp Cloud → channel-routerWebhook payload fixtures (Meta v20.0 spec)
Telegram → channel-routerWebhook payload fixtures
Viber → channel-routerWebhook payload fixtures

CI gate: buf breaking on every PR; oasdiff on REST OpenAPI; Pact provider verification on every PR.


5. E2E tests

5.1 Bank OTP flow

  • Submit OTP via sms-orchestrator HTTP API → channel-router routes via SMS-first ladder → DLR positive → outcome DELIVERED via sms.
  • Submit OTP for a flaky carrier MSISDN; SMS times out → WhatsApp succeeds → outcome DELIVERED via whatsapp; verify billing has 2 events.

5.2 Government emergency alert (PARALLEL strategy)

  • Tenant policy strategy = PARALLEL for useCase = alert with [WHATSAPP, VIBER, TELEGRAM].
  • All three OTT adapters fire simultaneously; first delivered wins; remaining cancelled (idempotency tagged).
  • Cost capped: assert all three meterings recorded but only one outcome.

5.3 MO conversational flow

  • Tenant sends MT (insurance reminder).
  • Recipient replies "YES" → routed to webhook with routeType=SESSION, inReplyTo populated.
  • Recipient replies "STOP" → session closed, opt-out recorded, MO still delivered with flag.
  • Subsequent MT excluded by consent gate.

5.4 Admin policy edit

  • tenant-admin updates ladder via PUT /v1/channel/tenants/{tenantId}/policies/otp; invalidate-cache event observed; next decision within 1 s reflects new ladder.

6. Load tests

  • 5000 RPS sustained 1 h on RouteWithFallback; SLO must hold (P95 ≤ 50 ms).
  • 10 000 RPS for 5 min — verify HPA scale-up within 60 s.
  • 1000 RPS MO routing for 30 min — webhook-dispatcher integration sustained; no backlog.
  • Burst test: 0 → 5000 RPS in 5 s — verify cache warm-up and circuit-breaker behaviour.

Tooling: ghz for gRPC, k6 for REST + NATS publish loads.


7. Chaos tests (LitmusChaos)

ScenarioExpected behaviour
Postgres primary failoverReads from cache continue 5 min; new writes UNAVAILABLE; orchestrator redelivers
Redis Sentinel failoverLatency degrades but correctness preserved (PG fallback)
consent-ledger pod killHot path uses cache; on cache expiry → REFUSED_CONSENT_UNKNOWN (fail-closed)
WhatsApp adapter pod killBreaker opens; ladder skips WhatsApp; eventual recovery half-open probe
NATS partitionOutbox grows; on heal, drains within 5 min; no event loss
OTT provider 503 stormPer-provider breaker opens within 30 s; ladder progresses
Network partition between kbl and mzrEach region operates independently; cross-region MO forwarding queues; on heal, drains
Vault unavailableExisting cached creds (60 s) continue; new pods can't start; existing pods continue

8. Security tests

  • RLS — automated harness asserts cross-tenant SELECT returns 0 rows; cross-tenant UPDATE rejected.
  • JWT — expired token → 401; missing tenantId claim → 403; tampered signature → 401.
  • mTLS — non-allow-listed SPIFFE ID → connection rejected at sidecar.
  • Webhook signature — synthetic invalid X-Hub-Signature-256 → 401; no body parse; counter increments.
  • Audit chain — tamper one row's payload; daily verifier breaks chain within 24 h; alert fires.
  • MSISDN erasure — request erasure; assert recipient_profiles row deleted, conversations.msisdn_hash re-tokenised, cached entries purged within 30 d.
  • Consent-bypass attempts — synthetic RouteWithFallback to opted-out recipient → REFUSED_NO_CHANNEL; metrics + audit captured.
  • Cost-cap evasion — synthetic ladder mutation mid-flight rejected; admin audit captured.
  • Multi-tenant profile leakage — fuzz GET /v1/channel/tenants/{tenantId}/profiles?msisdnHash= with cross-tenant hash → 404, not 200.

9. CI gates

StageGate
PRunit + buf breaking + oasdiff + lint + test coverage report
Merge to developintegration suite (Testcontainers) + contract verification
Nightlyfull E2E + chaos suite
Pre-releaseload test + security scan (OWASP ZAP) + RLS test harness
Production deploysmoke + canary (10% traffic for 30 min, no SLO regression)

Coverage threshold enforced in CI: PR fails if coverage drops by > 1 pp.


10. Test data

  • Synthetic MSISDNs: +9379000XXXX reserved test range; never billed.
  • WhatsApp test phone-number-id from Meta sandbox app.
  • Telegram bot test handle @ghasi_chan_test_bot.
  • Viber sandbox PA token rotated quarterly.
  • Voice OTP mock returns deterministic outcomes per recipient suffix.

Test fixtures versioned in services/channel-router-service/test/fixtures/.