SMS Firewall Service — AI Integration
Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · SECURITY_MODEL · APPLICATION_LOGIC · OBSERVABILITY
1. Purpose
The sms-firewall-service uses machine-learned signals in three ways:
-
Consume ML detections from
fraud-intel-service. The firewall is a consumer, not a producer — it subscribes tofraud.detected.*events carrying AIT, SIM-box, grey-route, and OTP-harvesting classifications produced byfraud-intel-service's training pipelines, and it materialises those detections intofirewall.simbox_signals,firewall.ait_patterns, andfirewall.peer_quarantinefor deterministic lookup during verdict evaluation. -
Invoke a lightweight on-cluster content classifier.
CLASSIFIER-type firewall rules may call the local LLM service for inline content classification (spam/phishing/fraud patterns) within the per-call data-plane budget. Classifier output never exceeds FLAG as a standalone signal; promoting to BLOCK requires pairing with a deterministic rule hit. -
A/B shadow-model evaluation. Rules authored with
shadowMode=trueinvoke the classifier but do not affect the verdict; they emitfirewall.rule.shadow.v1events for drift analysis.
The firewall does not train models. Training, labelling, and versioning happen in fraud-intel-service per its own AI_INTEGRATION document. The firewall stamps every AI-influenced verdict with aiProvenance for regulator-grade auditability.
2. Consumed fraud-intel signals
| Event | Signal type | Effect on firewall |
|---|---|---|
fraud.detected.simbox.v1 | Graph + ML: SIM-box originator detection (IMEI churn, A-number rotation, cell-ID volatility) | Upsert firewall.simbox_signals; matching srcMsisdn auto-BLOCK with SIMBOX_SIGNATURE |
fraud.detected.ait.v1 | Pattern recognition: Artificially Inflated Traffic destination ranges (per GSMA FF.21 taxonomy — OTP harvest, pumped traffic, IRSF) | Upsert firewall.ait_patterns; matching dstMsisdn range auto-BLOCK with AIT_SIGNATURE |
fraud.detected.greyroute.v1 | ML: grey-route arbitrage peer detection | Add peer to firewall.peer_quarantine; subsequent MT from peer auto-QUARANTINE |
fraud.detected.otp_harvest.v1 | Pattern: OTP-harvesting campaign | Add associated dstMsisdnRange + content regex to dynamic AIT rule |
All events are signed by fraud-intel-service with an Ed25519 key; signature verified against the service's JWKS before state mutation.
2.1 Signal lifecycle
- Signals carry
confidence∈ [0, 1]. Only signals withconfidence ≥ 0.7auto-promote to hard BLOCK;0.5–0.7promote to QUARANTINE;< 0.5are advisory (FLAG only). - Signals sliding-expire after 60 days of no re-confirmation;
active=FALSEafter expiry.fraud-intel-servicere-emits fresh events for sustained detections.
3. Local LLM for CLASSIFIER rules
3.1 Deployment model
The content classifier is the same local-llm-service deployment used by compliance-engine and fraud-intel-service. The firewall calls it over in-cluster HTTP with a strict latency ceiling.
┌───────────────────────┐ ┌─────────────────────────┐
│ sms-firewall-service │────▶│ local-llm-service │
│ (NestJS, 5–15 pods) │HTTP │ vLLM + Llama-3.1-8B │
│ │ ≤15ms│ AWQ 4-bit, GPU nodes │
└───────────────────────┘ └─────────────────────────┘
│
▼
Redis classifier cache
(24h body-hash → result)
3.2 Classifier model configuration
| Attribute | Value |
|---|---|
| Inference engine | vLLM (primary) with grammar-constrained JSON decoding |
| Model | Llama-3.1-8B-Instruct (AWQ 4-bit) — shared with compliance-engine |
| GPU | NVIDIA L4 / A10 (24 GB) |
| Max tokens | 200 input + 64 output |
| Temperature | 0.0 (deterministic) |
| Timeout | 15 ms P95 — aggressive relative to compliance-engine's 2000 ms because the firewall's total budget is 30 ms |
| Concurrency cap | 50 in-flight per firewall pod |
| Cache | Redis fw:classifier:{sha256(pii_redacted_body)}, TTL 24h |
3.3 When CLASSIFIER rules run
CLASSIFIER rules are disabled by default in PANIC mode and are excluded from the hot path unless:
- Rule
type = CLASSIFIER enabled = TRUE- Current operating mode ∈ {
NORMAL,DEGRADED} - Eval-budget remaining ≥ 20 ms at the point of rule invocation (soft-skip otherwise; emits
flag=BUDGET_SKIP_CLASSIFIER)
3.4 Prompt template
Single-turn, structured-output prompt. Response enforced via grammar-constrained JSON:
System:
You classify inbound SMS content for a national telecom firewall. Given
an SMS body (may contain [PHONE], [AMOUNT], [OTP_PLACEHOLDER] redactions),
return a JSON object mapping each of the following categories to a
confidence score in [0.0, 1.0]:
OTP_HARVEST, PHISHING, SPAM, MALWARE_LINK, HATE_SPEECH,
FINANCIAL_FRAUD, POLITICAL_INCITEMENT, GAMBLING.
Return ONLY the JSON object, no explanation.
User:
[PII-REDACTED MESSAGE BODY]
Expected response (enforced by grammar):
{
"OTP_HARVEST": 0.05,
"PHISHING": 0.88,
"SPAM": 0.12,
"MALWARE_LINK": 0.72,
"HATE_SPEECH": 0.00,
"FINANCIAL_FRAUD": 0.03,
"POLITICAL_INCITEMENT": 0.00,
"GAMBLING": 0.00
}
3.5 PII anonymisation before inference
Per SECURITY_MODEL §3.4, the body is anonymised before the LLM sees it (defence-in-depth even though the LLM runs on-cluster — no cloud-LLM calls are ever permitted for firewall traffic):
| Pattern | Replacement |
|---|---|
| E.164 phone numbers | [PHONE] |
| 5+ digit sequences | [OTP_PLACEHOLDER] |
| Monetary amounts (AFN, USD, EUR) | [AMOUNT] |
| URLs | [URL] (presence preserved) |
| Name tokens (curated Dari/Pashto/English list) | [NAME] |
Anonymisation is content-hashed before the replacement so the cache key is stable across identical-but-differently-instantiated messages.
4. Provenance stamping
Every AI-influenced verdict carries provenance in the ruleHits[].evidence field and in the firewall.audit row's rule_hits JSONB:
{
"ruleId": "fr_01HKX...",
"ruleType": "CLASSIFIER",
"action": "FLAG",
"severity": "MEDIUM",
"evidence": "ai:model=llama-3.1-8b-awq@rev-2026-04-01; cat=PHISHING; conf=0.88",
"confidence": 0.88,
"aiProvenance": {
"modelId": "llama-3.1-8b-instruct-awq",
"modelVersion": "rev-2026-04-01",
"promptTemplateId": "classify-content.v1",
"promptHash": "sha256:abc123...",
"bodyHashRedacted": "sha256:def456...",
"inferenceLatencyMs": 12,
"classifiedAt": "2026-04-21T10:14:23.123Z",
"cacheHit": false
}
}
The model version is surfaced in the firewall.audit.v1 event so downstream consumers (regulator-portal-service, analytics-service) can reconstruct model-at-time-of-verdict during audit.
5. Moderation policy — fairness & false-positive controls
5.1 Standalone classifier → FLAG only
A verdict of BLOCK or QUARANTINE driven solely by a classifier result is forbidden. Required pairings:
| Classifier category | Minimum confidence | Co-signal required to escalate | Escalation verdict |
|---|---|---|---|
PHISHING | 0.85 | + blocklist hit on sender or URL | BLOCK |
PHISHING | 0.85 | (alone) | FLAG |
OTP_HARVEST | 0.80 | + AIT pattern match on dst range | BLOCK |
OTP_HARVEST | 0.80 | (alone) | QUARANTINE |
SPAM | 0.90 | + rate-governor tripped | BLOCK |
SPAM | 0.90 | (alone) | FLAG |
FINANCIAL_FRAUD | 0.85 | + sender-id spoof | BLOCK |
GAMBLING | 0.90 | (alone, blanket illegal per ATRA) | BLOCK |
5.2 MNO-fairness monitor
To avoid a bias against any one MNO (e.g. if the training data disproportionately represented one operator's traffic), a daily fairness job computes:
ai_block_rate_by_mno = count(BLOCK verdicts with CLASSIFIER hit, last 24h) /
count(total MO verdicts, last 24h)
group by mnoId
If stdev(ai_block_rate_by_mno) / mean(...) > 0.5, alert FirewallAiFairnessDrift (HIGH). The standing response is: disable the CLASSIFIER rule pending T&S review and model retraining at fraud-intel-service.
5.3 Shadow mode
Rules may be authored with shadowMode=true:
- The classifier runs; the verdict is computed as if the rule were active; but the actual returned verdict is the non-shadow pipeline's result.
- Shadow-mode decisions are emitted as
firewall.rule.shadow.v1events for offline comparison with the live decision. - T&S promotes a shadow rule to live only after a 7-day comparison shows false-positive rate < 1% and false-negative delta < 0.5%.
6. HITL (human-in-the-loop) flow
Classifier-driven QUARANTINE verdicts always route to NOC review:
CLASSIFIER match → QUARANTINE
→ firewall.quarantine.held.v1
→ NOC console displays: decrypted PDU, classifier JSON, model version
→ NOC → RELEASE or REJECT with notes
→ Review outcome emitted as firewall.quarantine.{released|rejected}.v1
→ fraud-intel-service subscribes to these and feeds them back as
labelled training examples for the next model revision
This closes the loop: operator decisions become training signal for the next classifier version.
7. Latency budget
| Stage | Budget | Operating mode |
|---|---|---|
| PII anonymisation pre-process | 0.5 ms | all modes |
Cache lookup (fw:classifier:{hash}) | 0.5 ms | all modes |
| LLM inference (on cache MISS) | 15 ms P95 | NORMAL, DEGRADED |
| LLM inference | disabled | PANIC, MAINTENANCE |
| Total classifier overhead on verdict path | ≤ 16 ms | — |
If remaining evaluation budget at the classifier invocation point is < 16 ms, the rule is soft-skipped and the verdict assembles without it (flag BUDGET_SKIP_CLASSIFIER added).
8. Cache strategy
Classifier responses are cached aggressively because SMS templates (OTPs, alerts, campaigns) are highly repetitive:
| Key | TTL | Expected hit rate |
|---|---|---|
fw:classifier:{sha256(piiRedactedBody)} | 24 h | ≥ 95% for OTP/template traffic |
fw:classifier:coldset:{sha256(piiRedactedBody)} | 7 days (secondary cold cache) | ≥ 98% including cross-pod |
On LLM model-version change, the cache key format becomes fw:classifier:{modelVersion}:{sha256} — a version change implicitly bypasses stale cache without an explicit purge.
9. Failure modes
| Failure | Detection | Firewall behaviour |
|---|---|---|
local-llm-service returns 5xx | firewall_classifier_errors_total | Circuit breaker (5 errors / 10 s → open 60 s); CLASSIFIER rules skip with flag=CLASSIFIER_UNAVAILABLE; verdict proceeds without classifier input |
| LLM timeout > 15 ms | histogram overflow | Same as above; metric firewall_classifier_timeout_total++ |
| Cache (Redis) unavailable | firewall_redis_errors_total | Inference runs live; latency impact ~15 ms per call; caller adjusts effective budget down |
| Model version mismatch between pods (rollout in progress) | firewall_classifier_model_version gauge divergent | Audit rows record the specific model version that produced each verdict — no corrective action needed beyond observability |
Fraud-intel signal pipeline paused (no fraud.detected.* for 1 h) | firewall_fraudintel_event_age_seconds > 3600 | Alert FirewallFraudIntelStale; existing signatures remain active (conservative); NOC notified |
10. No cloud-LLM for firewall traffic
Unlike compliance-engine (which has an opt-in external-LLM failover for non-regulated tenants), the sms-firewall-service is forbidden from calling any external LLM API. Rationale:
- Firewall traffic includes inbound MO from national subscribers — this is the most PII-sensitive plane on the platform (A-number + B-number + body, all real subscribers).
- ADR-0004 §3 classifies the national perimeter as a sovereign-asset data class; no content may cross the national boundary for analysis.
- A start-up guard in the firewall binary refuses to boot if
EXTERNAL_LLM_ENABLED=trueis set; there is no configuration path to enable it.
Enforced at code level:
if (process.env.EXTERNAL_LLM_ENABLED === 'true') {
throw new Error('External LLM is architecturally forbidden in sms-firewall-service; refusing to start');
}
11. Operations runbook (summary)
| Task | Owner | Cadence |
|---|---|---|
| Classifier accuracy audit against labelled SMS sample (1000 msgs) | Trust & Safety + ML | Monthly |
| MNO-fairness drift review | Trust & Safety | Weekly |
| Shadow-mode promotion review | Trust & Safety lead | Weekly |
| Model version upgrade (coordinated with compliance-engine) | Platform Engineering | As released, with 7-day shadow first |
| Fraud-intel signal volume review | Trust & Safety | Weekly |
| Training-set update (from NOC review outcomes) | ML (fraud-intel-service) | Monthly |