Channel Router Service — AI Integration
Version: 1.0 Status: Draft Owner: Messaging Core + Platform ML Last Updated: 2026-04-21 Companion: APPLICATION_LOGIC · SECURITY_MODEL · DOMAIN_MODEL
ML in the channel-router is applied narrowly: channel-preference learning (predicting which channel is most likely to succeed for a given recipient), adaptive ladder ordering, and an optional session intent classifier for conversational sessions. No cloud LLM is on the hot path; no raw MSISDN, body, or PII is ever sent to a remote inference endpoint.
1. Scope and non-scope
In scope
- Channel-preference scoring: learn per-recipient
P(success | channel)and adaptively reorder the ladder. - STOP-keyword extension (language-detection): identify local-language STOP equivalents beyond the hard-coded set.
- Session-intent classification: detect "question", "complaint", "STOP", "out-of-office" patterns on MO text to annotate tenant webhook payload.
- Voice-OTP success prediction: feature
voice_answer_rateper recipient (time-of-day, past ANSWERED history).
Out of scope
- Content-policy classification — owned by
compliance-engine. - Fraud / SIM-box detection — owned by
sms-firewall-serviceandfraud-intel-service. - Sender-ID verification — owned by
sender-id-registry-service.
2. Model inventory
| Model | Type | Hosting | Inference budget | Cache TTL | Fallback |
|---|---|---|---|---|---|
channel-preference-v1 | Gradient-boosted tree (LightGBM) on hashed features | On-prem model-server (Triton) | 5 ms P95 | 300 s per (tenantId, msisdnHash, useCase) | Static preference order from recipient_profiles.channel_preferences |
stop-keyword-multilang-v1 | Small fine-tuned BERT (multilingual-mini), fine-tuned on Pashto/Dari/Farsi/Arabic STOP corpora | On-prem Triton | 15 ms P95 | 60 s per (body_hash) | Exact-match keyword list (US-CHAN-021 defaults) |
session-intent-v1 | Small classification head on multilingual-mini | On-prem Triton | 15 ms P95 | 60 s per (body_hash) | Intent marked UNKNOWN |
All models run on-prem in the np-ml namespace. No model weights leave the data-sovereignty boundary (ADR-0004 §11).
3. Feature engineering
All features are hashed / aggregated — no raw MSISDN or body.
3.1 Channel-preference features
| Feature | Source | Type |
|---|---|---|
msisdn_bucket | First 5 chars of msisdn_hash | categorical (hashed) |
has_wa_business_tristate | recipient_profiles.has_whatsapp_business | categorical |
voice_answer_rate_7d | delivery_attempts rolling aggregate | numeric |
sms_delivery_rate_7d | dlr-processor feed | numeric |
last_successful_channel | recipient_profiles.last_successful_channel | categorical |
time_of_day_bucket | request time → {morning, afternoon, evening, night} | categorical |
use_case | `otp | txn |
mno | from number-intelligence-service (numint) | categorical |
language | tenant-default or detected from template | categorical |
segments | SMS segment count | numeric |
tenant_tier | tenant's compliance tier | categorical |
3.2 Session-intent features
- Body SHA-256 (as cache key only; model sees body tokens only inside the sandboxed inference pod)
- Body length, segment count
- Detected script (
fa,ps,ar,en,mixed) - Prior MT template class (OTP, marketing, alert) — from
conversations.last_mt_message_idlookup
4. Online inference architecture
Fail-closed for ML: every inference has a 10 ms deadline (stop-keyword: 15 ms). Budget exhaustion → fallback to static preference; metric chan_ml_budget_exceeded_total incremented. The fallback never changes a BLOCK decision (compliance owns BLOCK), but it may degrade ladder ordering quality.
Feature flag CHAN_ML_PREFERENCE_ORDERING_ENABLED (default true in prod; default false during shadow mode). When disabled, the ladder uses the static channel_preferences order from recipient_profiles.
5. Training
- Training data:
delivery_attempts+fallback_executionspartitions (13 m hot). - Labels: binary success (
delivered/delivered_read/ANSWEREDwith played OTP) vs failure. - Pipeline: Nightly job in
np-ml(/ml-pipelines/channel-preference-v1.yml) using Kubeflow; model artefacts written tos3://ghasi-model-registry/channel/channel-preference-v1/{version}/. - Rollout: canary deploy 10% → 50% → 100% over 72 h; guardrails on
notification_delivery_success_rateregression (must not drop > 0.5 pp). - Drift monitoring: PSI (Population Stability Index) on top-5 features; alert
ChannelMlFeatureDriftwhen PSI > 0.2.
6. Adaptive ladder ordering
The ML model returns a per-channel score p_success ∈ [0, 1]. The ordering algorithm:
- Start from the tenant policy's static ladder.
- Filter by consent + compliance gating (UC-01 step 5–6).
- If
CHAN_ML_PREFERENCE_ORDERING_ENABLEDanddiscoveryState = STABLE:- Compute per-step expected-value score:
p_success * weight(step) - cost(step) * costWeight. - Stable-sort the filtered ladder by descending score within a bounded reorder window of 2 positions (to respect tenant-declared order for compliance reasons).
- Compute per-step expected-value score:
- Truncate to cost-cap budget (UC-01 step 11).
Guardrail — the first step of a tenant policy with useCase = otp may never be reordered out of position 0 (OTP delivery has regulatory and UX implications that override ML preference).
7. Session-intent classification
Invoked on MO ingest in chan-mo-router after tenant-webhook delivery attempt (so classification latency never impacts MO delivery SLA):
- Input:
{ bodyTokens, script, prior_mt_class, turn_count } - Output:
{ intent: STOP|QUESTION|COMPLAINT|CONFIRMATION|OUT_OF_OFFICE|UNKNOWN, confidence: [0..1] } - Result appended to the tenant webhook payload as optional
aiIntentfield; tenants opt-in per-route.
STOP handling note. The ML STOP classifier is additive to the exact-match STOP keyword list — it may UPGRADE a message to STOP but NEVER downgrade a keyword-matched STOP. That is: if either the exact-match OR the ML classifier says STOP (with confidence ≥ 0.85), the session is closed.
8. Privacy, PII, sovereignty
- No external LLM calls on the hot path. A startup check reads
CHAN_EXTERNAL_LLM_ENABLED; the pod refuses to start if set totrue(same posture assms-firewall-service). - Bodies leave the pod only into on-prem Triton over mTLS; Triton logs are PII-masked.
- Feature hashing guarantees no raw MSISDN is written to the feature-store (
s3://ghasi-feature-store/). - Model weights never cross data-sovereignty boundary; training on-prem only.
- Audit: every inference call increments
chan_ml_inference_total{model, cache_hit}; each stale-cache refresh is traced.
9. Observability (ML-specific)
| Metric | Type | Purpose |
|---|---|---|
chan_ml_inference_duration_seconds | Histogram | Per-model latency |
chan_ml_inference_total{model, cache_hit} | Counter | Call volume + cache effectiveness |
chan_ml_budget_exceeded_total{model} | Counter | Fallbacks triggered |
chan_ml_prediction_quality | Gauge | Rolling 24 h AUC vs offline eval |
chan_ml_feature_drift_psi{feature} | Gauge | Drift per feature |
chan_ml_fallback_rate | Gauge | Ratio of decisions using static fallback (target ≤ 5%) |
Alerts:
ChannelMlLatencyHigh—inference_duration P95 > 20 msfor 10 min.ChannelMlFeatureDrift—PSI > 0.2for top-5 features.ChannelMlFallbackRateHigh—fallback_rate > 15%for 30 min.
10. Governance
- Model cards maintained in
docs/ml/model-cards/channel-preference-v1.md(data sources, bias analysis, performance tiers). - ML retrospective every quarter — Messaging Core + Platform ML review aggregate impact on delivery-success-rate.
- Tenant opt-out: any tenant may set
fallback_policies.ml_ordering_enabled = falseto disable ML-assisted ordering for their traffic.
11. Future work (not in v1)
- Multi-armed-bandit for active exploration on
UNSEENrecipients. - Voice-call timing optimisation (best time-of-day predictor).
- Template-language matching (learn recipient-preferred Pashto vs Dari from prior interactions).
All future work must preserve the on-prem, data-sovereign, fail-closed posture established in v1.