Channel Router Service — AI Integration

Version: 1.0 Status: Draft Owner: Messaging Core + Platform ML Last Updated: 2026-04-21 Companion: APPLICATION_LOGIC · SECURITY_MODEL · DOMAIN_MODEL

ML in the channel-router is applied narrowly: channel-preference learning (predicting which channel is most likely to succeed for a given recipient), adaptive ladder ordering, and an optional session intent classifier for conversational sessions. No cloud LLM is on the hot path; no raw MSISDN, body, or PII is ever sent to a remote inference endpoint.

1. Scope and non-scope

In scope

Channel-preference scoring: learn per-recipient P(success | channel) and adaptively reorder the ladder.
STOP-keyword extension (language-detection): identify local-language STOP equivalents beyond the hard-coded set.
Session-intent classification: detect "question", "complaint", "STOP", "out-of-office" patterns on MO text to annotate tenant webhook payload.
Voice-OTP success prediction: feature voice_answer_rate per recipient (time-of-day, past ANSWERED history).

Out of scope

Content-policy classification — owned by compliance-engine.
Fraud / SIM-box detection — owned by sms-firewall-service and fraud-intel-service.
Sender-ID verification — owned by sender-id-registry-service.

2. Model inventory

Model	Type	Hosting	Inference budget	Cache TTL	Fallback
`channel-preference-v1`	Gradient-boosted tree (LightGBM) on hashed features	On-prem model-server (Triton)	5 ms P95	300 s per `(tenantId, msisdnHash, useCase)`	Static preference order from `recipient_profiles.channel_preferences`
`stop-keyword-multilang-v1`	Small fine-tuned BERT (multilingual-mini), fine-tuned on Pashto/Dari/Farsi/Arabic STOP corpora	On-prem Triton	15 ms P95	60 s per `(body_hash)`	Exact-match keyword list (US-CHAN-021 defaults)
`session-intent-v1`	Small classification head on multilingual-mini	On-prem Triton	15 ms P95	60 s per `(body_hash)`	Intent marked `UNKNOWN`

All models run on-prem in the np-ml namespace. No model weights leave the data-sovereignty boundary (ADR-0004 §11).

3. Feature engineering

All features are hashed / aggregated — no raw MSISDN or body.

3.1 Channel-preference features

Feature	Source	Type
`msisdn_bucket`	First 5 chars of `msisdn_hash`	categorical (hashed)
`has_wa_business_tristate`	`recipient_profiles.has_whatsapp_business`	categorical
`voice_answer_rate_7d`	`delivery_attempts` rolling aggregate	numeric
`sms_delivery_rate_7d`	`dlr-processor` feed	numeric
`last_successful_channel`	`recipient_profiles.last_successful_channel`	categorical
`time_of_day_bucket`	request time → {morning, afternoon, evening, night}	categorical
`use_case`	`otp	txn
`mno`	from `number-intelligence-service` (numint)	categorical
`language`	tenant-default or detected from template	categorical
`segments`	SMS segment count	numeric
`tenant_tier`	tenant's compliance tier	categorical

3.2 Session-intent features

Body SHA-256 (as cache key only; model sees body tokens only inside the sandboxed inference pod)
Body length, segment count
Detected script (fa, ps, ar, en, mixed)
Prior MT template class (OTP, marketing, alert) — from conversations.last_mt_message_id lookup

4. Online inference architecture

Fail-closed for ML: every inference has a 10 ms deadline (stop-keyword: 15 ms). Budget exhaustion → fallback to static preference; metric chan_ml_budget_exceeded_total incremented. The fallback never changes a BLOCK decision (compliance owns BLOCK), but it may degrade ladder ordering quality.

Feature flag CHAN_ML_PREFERENCE_ORDERING_ENABLED (default true in prod; default false during shadow mode). When disabled, the ladder uses the static channel_preferences order from recipient_profiles.

5. Training

Training data: delivery_attempts + fallback_executions partitions (13 m hot).
Labels: binary success (delivered / delivered_read / ANSWERED with played OTP) vs failure.
Pipeline: Nightly job in np-ml (/ml-pipelines/channel-preference-v1.yml) using Kubeflow; model artefacts written to s3://ghasi-model-registry/channel/channel-preference-v1/{version}/.
Rollout: canary deploy 10% → 50% → 100% over 72 h; guardrails on notification_delivery_success_rate regression (must not drop > 0.5 pp).
Drift monitoring: PSI (Population Stability Index) on top-5 features; alert ChannelMlFeatureDrift when PSI > 0.2.

6. Adaptive ladder ordering

The ML model returns a per-channel score p_success ∈ [0, 1]. The ordering algorithm:

Start from the tenant policy's static ladder.
Filter by consent + compliance gating (UC-01 step 5–6).
If CHAN_ML_PREFERENCE_ORDERING_ENABLED and discoveryState = STABLE:
- Compute per-step expected-value score: p_success * weight(step) - cost(step) * costWeight.
- Stable-sort the filtered ladder by descending score within a bounded reorder window of 2 positions (to respect tenant-declared order for compliance reasons).
Truncate to cost-cap budget (UC-01 step 11).

Guardrail — the first step of a tenant policy with useCase = otp may never be reordered out of position 0 (OTP delivery has regulatory and UX implications that override ML preference).

7. Session-intent classification

Invoked on MO ingest in chan-mo-router after tenant-webhook delivery attempt (so classification latency never impacts MO delivery SLA):

Input: { bodyTokens, script, prior_mt_class, turn_count }
Output: { intent: STOP|QUESTION|COMPLAINT|CONFIRMATION|OUT_OF_OFFICE|UNKNOWN, confidence: [0..1] }
Result appended to the tenant webhook payload as optional aiIntent field; tenants opt-in per-route.

STOP handling note. The ML STOP classifier is additive to the exact-match STOP keyword list — it may UPGRADE a message to STOP but NEVER downgrade a keyword-matched STOP. That is: if either the exact-match OR the ML classifier says STOP (with confidence ≥ 0.85), the session is closed.

8. Privacy, PII, sovereignty

No external LLM calls on the hot path. A startup check reads CHAN_EXTERNAL_LLM_ENABLED; the pod refuses to start if set to true (same posture as sms-firewall-service).
Bodies leave the pod only into on-prem Triton over mTLS; Triton logs are PII-masked.
Feature hashing guarantees no raw MSISDN is written to the feature-store (s3://ghasi-feature-store/).
Model weights never cross data-sovereignty boundary; training on-prem only.
Audit: every inference call increments chan_ml_inference_total{model, cache_hit}; each stale-cache refresh is traced.

9. Observability (ML-specific)

Metric	Type	Purpose
`chan_ml_inference_duration_seconds`	Histogram	Per-model latency
`chan_ml_inference_total{model, cache_hit}`	Counter	Call volume + cache effectiveness
`chan_ml_budget_exceeded_total{model}`	Counter	Fallbacks triggered
`chan_ml_prediction_quality`	Gauge	Rolling 24 h AUC vs offline eval
`chan_ml_feature_drift_psi{feature}`	Gauge	Drift per feature
`chan_ml_fallback_rate`	Gauge	Ratio of decisions using static fallback (target ≤ 5%)

Alerts:

ChannelMlLatencyHigh — inference_duration P95 > 20 ms for 10 min.
ChannelMlFeatureDrift — PSI > 0.2 for top-5 features.
ChannelMlFallbackRateHigh — fallback_rate > 15% for 30 min.

10. Governance

Model cards maintained in docs/ml/model-cards/channel-preference-v1.md (data sources, bias analysis, performance tiers).
ML retrospective every quarter — Messaging Core + Platform ML review aggregate impact on delivery-success-rate.
Tenant opt-out: any tenant may set fallback_policies.ml_ordering_enabled = false to disable ML-assisted ordering for their traffic.

11. Future work (not in v1)

Multi-armed-bandit for active exploration on UNSEEN recipients.
Voice-call timing optimisation (best time-of-day predictor).
Template-language matching (learn recipient-preferred Pashto vs Dari from prior interactions).

All future work must preserve the on-prem, data-sovereign, fail-closed posture established in v1.

1. Scope and non-scope​

2. Model inventory​

3. Feature engineering​

3.1 Channel-preference features​

3.2 Session-intent features​

4. Online inference architecture​

5. Training​

6. Adaptive ladder ordering​

7. Session-intent classification​

8. Privacy, PII, sovereignty​

9. Observability (ML-specific)​

10. Governance​

11. Future work (not in v1)​