Channel Router Service — Testing Strategy
Version: 1.0 Status: Draft Owner: Messaging Core Last Updated: 2026-04-21 Companion: SERVICE_READINESS · APPLICATION_LOGIC · SECURITY_MODEL
The channel-router is hot-path; tests must cover correctness (one outcome per recipient, no silent channel drop, fail-closed on consent), performance (sub-50 ms decision under 5000 RPS), and security (no cross-tenant leakage, no PII in events). Test pyramid follows the platform standard: unit ≥ 80%, integration end-to-end the major flows, contract tests on every cross-service surface, load + chaos as gating criteria.
1. Test pyramid
| Layer | Coverage target | Tooling |
|---|---|---|
| Unit | ≥ 85% line / ≥ 80% branch (domain ≥ 90%) | Jest + ts-mockito |
| Integration | All hot-path use cases | Jest + Testcontainers (PG, Redis, NATS, mock-Triton) |
| Contract | Every gRPC + REST + NATS surface | Pact, buf breaking, JSON Schema |
| E2E | Critical flows | Cypress (admin), k6 (gRPC) |
| Load | 5000 RPS sustained 1 h | k6 / ghz |
| Chaos | All FM-01..FM-13 | LitmusChaos |
| Security | OWASP API Top-10 + RLS + audit chain | OWASP ZAP, custom RLS test harness |
2. Unit tests
2.1 Fallback policy evaluator
- Default ladder applied when no policy row exists for
(tenantId, useCase). - Ladder length validation (1..6); reject 7-step write with
INVALID_LADDER. - Duplicate-channel rejection.
PARALLELstrategy with a non-OTT channel rejected withCHAN_PARALLEL_STRATEGY_INVALID.costCapPerMessage< cheapest path →POLICY_UNREACHABLE_COST_CAP.- LWW ordering on policy version conflict (resolved by
updated_at).
2.2 Cost calculator
- Per-channel cost lookup by
pricing_modelJSON; defaults applied for missing entries. - Rounding semantics (NUMERIC(12,4); banker's rounding).
- Cumulative cost across attempts equals sum of
delivery_attempts.cost_ngn. EnforceTenantFallbackCapshort-circuit on next-step expected cost > remaining budget.
2.3 Consent / compliance gate
- Per-channel consent map respected (deny SMS for marketing scope when MARKETING opt-out).
consent.revoked.v1invalidateschan:gate:*for the affected(tenantId, msisdnHash).- Cache miss + dependency unreachable beyond deadline →
REFUSED_CONSENT_UNKNOWN(fail-closed). - Compliance verdict
BLOCKfor one channel does not affect other channels in the ladder.
2.4 State machine — DeliveryAttempt
- Transitions:
accepted → sent → delivered(positive);accepted → failed_temp → step_skipped(transient);accepted → rejected_by_recipient(terminal negative);accepted → timed_out(deadline elapse). - Race: positive DLR vs deadline elapse — Redis WATCH/MULTI guard ensures exactly one terminal outcome.
step_skippedproduces no billing event.
2.5 Conversation correlation
- Session created on first successful MT.
- Sliding TTL refresh on every MT/MO success.
- STOP keyword (multi-language) closes session and emits
consent.RecordOptOut. - Two simultaneous OPEN sessions for one MSISDN under different
senderIds route MOs independently (US-CHAN-020 AC#3). turnCountstrictly monotonic under concurrent MT race.
2.6 MSISDN normalisation + hashing
- E.164 parser handles Afghan local-format inputs (
07XXXXXXXX→+937XXXXXXXX). - Per-tenant salt produces distinct hashes for the same MSISDN across tenants.
- Reject MCC ≠ 412 unless
allowForeignRecipient = true.
2.7 STOP-keyword matcher
- ASCII set:
STOP, UNSUBSCRIBE. - Pashto/Dari/Arabic set:
کنسل, متوقف, ایست. - Case-insensitive; whitespace-trimmed; segment-aware (only first segment matched).
- Tenant-override appends to default set (US-CHAN-021 AC#1).
2.8 Provider-status canonicalisation
- WhatsApp
131026→rejected_by_recipient. - Telegram
403 (bot blocked)→rejected_by_recipient. - Viber
failedevent →failed_perm. - Unmapped code defaults to
failed_tempand incrementschan_unknown_provider_status_total.
3. Integration tests (Testcontainers)
3.1 Hot-path RouteWithFallback
- gRPC at 5000 RPS sustained 60 s; P95 ≤ 50 ms; no errors.
- Cold-cache cold-start path (Redis empty); P95 ≤ 80 ms; no fail-closed unless the dependency itself is down.
- Cache invalidation by
consent.revoked.v1reflects in next decision within 1 s.
3.2 OTT adapter integration (mock providers)
- WhatsApp Cloud mock at
https://wa-mock:8443/v20.0/{phone-id}/messages:- Happy path: 200 + status webhook → terminal
delivered. - Template rejected (
131008): adapter returnsfailed_perm; ladder progresses. - 429 rate-limit: exponential back-off; eventual success or
failed_temp→ ladder progresses.
- Happy path: 200 + status webhook → terminal
- Telegram mock: bot-blocked path returns 403; profile link marked INVALID.
- Viber mock: signature-mismatch on webhook → 401; no correlation; counter increments.
- Voice OTP mock:
NO_ANSWERtriggers ladder progression;ANSWEREDterminates.
3.3 Consent-ledger integration
- Live consent-ledger Testcontainer; revoke MARKETING for an MSISDN; subsequent marketing dispatch excludes SMS+WhatsApp; outcome
REFUSED_NO_CHANNELif no allowed channel. - Cache age 60 s — revocation propagates via
consent.revoked.v1within 5 s.
3.4 Full fallback cascade (SMS → WhatsApp → Voice)
- Inject SMS DLR
UNDELIVEREDafter 60 s. - Inject WhatsApp
failedwebhook after 25 s. - Voice mock returns
ANSWERED. - Assert: 3 attempts, 1 outcome
DELIVERED via voice, 3 billing events, fallback_path[3].
3.5 MO routing roundtrip
- Publish
mo.allowed.v1for tenant T with destination2211; webhook mock asserts HMAC-v2 signature; payload includesrouteType=STATIC. - Open session via prior MT; subsequent MO routes via SESSION;
inReplyTopopulated. - STOP keyword closes session; consent-ledger receives
RecordOptOut; tenant webhook still receives MO withclosedSessionByStop=true.
3.6 Profile learning loop
- 10 successive
deliveredoutcomes on WhatsApp for an MSISDN raisewhatsapp.scoreby ≥ +30 across 1 hour; LWW merge correctness under simulated out-of-order events.
3.7 Outbox + outcome uniqueness
- Crash relay mid-publish; restart; assert no duplicate
notification.delivery.outcome.v1for the same(notificationId, recipientId). - Concurrent dispatch with same
idempotencyKeyproduces one execution.
4. Contract tests
| Producer ↔ Consumer | Contract |
|---|---|
sms-orchestrator → channel-router | gRPC RouteWithFallback request/response (Pact) |
channel-router → consent-ledger | gRPC CheckConsent (Pact) |
channel-router → compliance-engine | gRPC EvaluateChannelCompliance (Pact) |
channel-router → sender-id-registry | gRPC VerifySender (Pact) |
channel-router → webhook-dispatcher | gRPC Deliver(...) MO webhook (Pact) |
channel-router → billing-service | NATS channel.billing.event.v1 JSON Schema |
channel-router → analytics-service | NATS notification.delivery.outcome.v1 JSON Schema |
consent-ledger → channel-router | NATS consent.revoked.v1 JSON Schema |
WhatsApp Cloud → channel-router | Webhook payload fixtures (Meta v20.0 spec) |
Telegram → channel-router | Webhook payload fixtures |
Viber → channel-router | Webhook payload fixtures |
CI gate: buf breaking on every PR; oasdiff on REST OpenAPI; Pact provider verification on every PR.
5. E2E tests
5.1 Bank OTP flow
- Submit OTP via
sms-orchestratorHTTP API → channel-router routes via SMS-first ladder → DLR positive → outcomeDELIVERED via sms. - Submit OTP for a flaky carrier MSISDN; SMS times out → WhatsApp succeeds → outcome
DELIVERED via whatsapp; verify billing has 2 events.
5.2 Government emergency alert (PARALLEL strategy)
- Tenant policy
strategy = PARALLELforuseCase = alertwith [WHATSAPP, VIBER, TELEGRAM]. - All three OTT adapters fire simultaneously; first
deliveredwins; remaining cancelled (idempotency tagged). - Cost capped: assert all three meterings recorded but only one outcome.
5.3 MO conversational flow
- Tenant sends MT (insurance reminder).
- Recipient replies "YES" → routed to webhook with
routeType=SESSION,inReplyTopopulated. - Recipient replies "STOP" → session closed, opt-out recorded, MO still delivered with flag.
- Subsequent MT excluded by consent gate.
5.4 Admin policy edit
- tenant-admin updates ladder via
PUT /v1/channel/tenants/{tenantId}/policies/otp; invalidate-cache event observed; next decision within 1 s reflects new ladder.
6. Load tests
- 5000 RPS sustained 1 h on
RouteWithFallback; SLO must hold (P95 ≤ 50 ms). - 10 000 RPS for 5 min — verify HPA scale-up within 60 s.
- 1000 RPS MO routing for 30 min — webhook-dispatcher integration sustained; no backlog.
- Burst test: 0 → 5000 RPS in 5 s — verify cache warm-up and circuit-breaker behaviour.
Tooling: ghz for gRPC, k6 for REST + NATS publish loads.
7. Chaos tests (LitmusChaos)
| Scenario | Expected behaviour |
|---|---|
| Postgres primary failover | Reads from cache continue 5 min; new writes UNAVAILABLE; orchestrator redelivers |
| Redis Sentinel failover | Latency degrades but correctness preserved (PG fallback) |
| consent-ledger pod kill | Hot path uses cache; on cache expiry → REFUSED_CONSENT_UNKNOWN (fail-closed) |
| WhatsApp adapter pod kill | Breaker opens; ladder skips WhatsApp; eventual recovery half-open probe |
| NATS partition | Outbox grows; on heal, drains within 5 min; no event loss |
| OTT provider 503 storm | Per-provider breaker opens within 30 s; ladder progresses |
Network partition between kbl and mzr | Each region operates independently; cross-region MO forwarding queues; on heal, drains |
| Vault unavailable | Existing cached creds (60 s) continue; new pods can't start; existing pods continue |
8. Security tests
- RLS — automated harness asserts cross-tenant SELECT returns 0 rows; cross-tenant UPDATE rejected.
- JWT — expired token → 401; missing
tenantIdclaim → 403; tampered signature → 401. - mTLS — non-allow-listed SPIFFE ID → connection rejected at sidecar.
- Webhook signature — synthetic invalid
X-Hub-Signature-256→ 401; no body parse; counter increments. - Audit chain — tamper one row's payload; daily verifier breaks chain within 24 h; alert fires.
- MSISDN erasure — request erasure; assert
recipient_profilesrow deleted,conversations.msisdn_hashre-tokenised, cached entries purged within 30 d. - Consent-bypass attempts — synthetic
RouteWithFallbackto opted-out recipient →REFUSED_NO_CHANNEL; metrics + audit captured. - Cost-cap evasion — synthetic ladder mutation mid-flight rejected; admin audit captured.
- Multi-tenant profile leakage — fuzz
GET /v1/channel/tenants/{tenantId}/profiles?msisdnHash=with cross-tenant hash → 404, not 200.
9. CI gates
| Stage | Gate |
|---|---|
| PR | unit + buf breaking + oasdiff + lint + test coverage report |
| Merge to develop | integration suite (Testcontainers) + contract verification |
| Nightly | full E2E + chaos suite |
| Pre-release | load test + security scan (OWASP ZAP) + RLS test harness |
| Production deploy | smoke + canary (10% traffic for 30 min, no SLO regression) |
Coverage threshold enforced in CI: PR fails if coverage drops by > 1 pp.
10. Test data
- Synthetic MSISDNs:
+9379000XXXXreserved test range; never billed. - WhatsApp test phone-number-id from Meta sandbox app.
- Telegram bot test handle
@ghasi_chan_test_bot. - Viber sandbox PA token rotated quarterly.
- Voice OTP mock returns deterministic outcomes per recipient suffix.
Test fixtures versioned in services/channel-router-service/test/fixtures/.