Skip to main content

SMS Firewall Service — AI Integration

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · SECURITY_MODEL · APPLICATION_LOGIC · OBSERVABILITY


1. Purpose

The sms-firewall-service uses machine-learned signals in three ways:

  1. Consume ML detections from fraud-intel-service. The firewall is a consumer, not a producer — it subscribes to fraud.detected.* events carrying AIT, SIM-box, grey-route, and OTP-harvesting classifications produced by fraud-intel-service's training pipelines, and it materialises those detections into firewall.simbox_signals, firewall.ait_patterns, and firewall.peer_quarantine for deterministic lookup during verdict evaluation.

  2. Invoke a lightweight on-cluster content classifier. CLASSIFIER-type firewall rules may call the local LLM service for inline content classification (spam/phishing/fraud patterns) within the per-call data-plane budget. Classifier output never exceeds FLAG as a standalone signal; promoting to BLOCK requires pairing with a deterministic rule hit.

  3. A/B shadow-model evaluation. Rules authored with shadowMode=true invoke the classifier but do not affect the verdict; they emit firewall.rule.shadow.v1 events for drift analysis.

The firewall does not train models. Training, labelling, and versioning happen in fraud-intel-service per its own AI_INTEGRATION document. The firewall stamps every AI-influenced verdict with aiProvenance for regulator-grade auditability.


2. Consumed fraud-intel signals

EventSignal typeEffect on firewall
fraud.detected.simbox.v1Graph + ML: SIM-box originator detection (IMEI churn, A-number rotation, cell-ID volatility)Upsert firewall.simbox_signals; matching srcMsisdn auto-BLOCK with SIMBOX_SIGNATURE
fraud.detected.ait.v1Pattern recognition: Artificially Inflated Traffic destination ranges (per GSMA FF.21 taxonomy — OTP harvest, pumped traffic, IRSF)Upsert firewall.ait_patterns; matching dstMsisdn range auto-BLOCK with AIT_SIGNATURE
fraud.detected.greyroute.v1ML: grey-route arbitrage peer detectionAdd peer to firewall.peer_quarantine; subsequent MT from peer auto-QUARANTINE
fraud.detected.otp_harvest.v1Pattern: OTP-harvesting campaignAdd associated dstMsisdnRange + content regex to dynamic AIT rule

All events are signed by fraud-intel-service with an Ed25519 key; signature verified against the service's JWKS before state mutation.

2.1 Signal lifecycle

  • Signals carry confidence ∈ [0, 1]. Only signals with confidence ≥ 0.7 auto-promote to hard BLOCK; 0.5–0.7 promote to QUARANTINE; < 0.5 are advisory (FLAG only).
  • Signals sliding-expire after 60 days of no re-confirmation; active=FALSE after expiry. fraud-intel-service re-emits fresh events for sustained detections.

3. Local LLM for CLASSIFIER rules

3.1 Deployment model

The content classifier is the same local-llm-service deployment used by compliance-engine and fraud-intel-service. The firewall calls it over in-cluster HTTP with a strict latency ceiling.

┌───────────────────────┐ ┌─────────────────────────┐
│ sms-firewall-service │────▶│ local-llm-service │
│ (NestJS, 5–15 pods) │HTTP │ vLLM + Llama-3.1-8B │
│ │ ≤15ms│ AWQ 4-bit, GPU nodes │
└───────────────────────┘ └─────────────────────────┘


Redis classifier cache
(24h body-hash → result)

3.2 Classifier model configuration

AttributeValue
Inference enginevLLM (primary) with grammar-constrained JSON decoding
ModelLlama-3.1-8B-Instruct (AWQ 4-bit) — shared with compliance-engine
GPUNVIDIA L4 / A10 (24 GB)
Max tokens200 input + 64 output
Temperature0.0 (deterministic)
Timeout15 ms P95 — aggressive relative to compliance-engine's 2000 ms because the firewall's total budget is 30 ms
Concurrency cap50 in-flight per firewall pod
CacheRedis fw:classifier:{sha256(pii_redacted_body)}, TTL 24h

3.3 When CLASSIFIER rules run

CLASSIFIER rules are disabled by default in PANIC mode and are excluded from the hot path unless:

  • Rule type = CLASSIFIER
  • enabled = TRUE
  • Current operating mode ∈ {NORMAL, DEGRADED}
  • Eval-budget remaining ≥ 20 ms at the point of rule invocation (soft-skip otherwise; emits flag=BUDGET_SKIP_CLASSIFIER)

3.4 Prompt template

Single-turn, structured-output prompt. Response enforced via grammar-constrained JSON:

System:
You classify inbound SMS content for a national telecom firewall. Given
an SMS body (may contain [PHONE], [AMOUNT], [OTP_PLACEHOLDER] redactions),
return a JSON object mapping each of the following categories to a
confidence score in [0.0, 1.0]:
OTP_HARVEST, PHISHING, SPAM, MALWARE_LINK, HATE_SPEECH,
FINANCIAL_FRAUD, POLITICAL_INCITEMENT, GAMBLING.
Return ONLY the JSON object, no explanation.

User:
[PII-REDACTED MESSAGE BODY]

Expected response (enforced by grammar):

{
"OTP_HARVEST": 0.05,
"PHISHING": 0.88,
"SPAM": 0.12,
"MALWARE_LINK": 0.72,
"HATE_SPEECH": 0.00,
"FINANCIAL_FRAUD": 0.03,
"POLITICAL_INCITEMENT": 0.00,
"GAMBLING": 0.00
}

3.5 PII anonymisation before inference

Per SECURITY_MODEL §3.4, the body is anonymised before the LLM sees it (defence-in-depth even though the LLM runs on-cluster — no cloud-LLM calls are ever permitted for firewall traffic):

PatternReplacement
E.164 phone numbers[PHONE]
5+ digit sequences[OTP_PLACEHOLDER]
Monetary amounts (AFN, USD, EUR)[AMOUNT]
URLs[URL] (presence preserved)
Name tokens (curated Dari/Pashto/English list)[NAME]

Anonymisation is content-hashed before the replacement so the cache key is stable across identical-but-differently-instantiated messages.


4. Provenance stamping

Every AI-influenced verdict carries provenance in the ruleHits[].evidence field and in the firewall.audit row's rule_hits JSONB:

{
"ruleId": "fr_01HKX...",
"ruleType": "CLASSIFIER",
"action": "FLAG",
"severity": "MEDIUM",
"evidence": "ai:model=llama-3.1-8b-awq@rev-2026-04-01; cat=PHISHING; conf=0.88",
"confidence": 0.88,
"aiProvenance": {
"modelId": "llama-3.1-8b-instruct-awq",
"modelVersion": "rev-2026-04-01",
"promptTemplateId": "classify-content.v1",
"promptHash": "sha256:abc123...",
"bodyHashRedacted": "sha256:def456...",
"inferenceLatencyMs": 12,
"classifiedAt": "2026-04-21T10:14:23.123Z",
"cacheHit": false
}
}

The model version is surfaced in the firewall.audit.v1 event so downstream consumers (regulator-portal-service, analytics-service) can reconstruct model-at-time-of-verdict during audit.


5. Moderation policy — fairness & false-positive controls

5.1 Standalone classifier → FLAG only

A verdict of BLOCK or QUARANTINE driven solely by a classifier result is forbidden. Required pairings:

Classifier categoryMinimum confidenceCo-signal required to escalateEscalation verdict
PHISHING0.85+ blocklist hit on sender or URLBLOCK
PHISHING0.85(alone)FLAG
OTP_HARVEST0.80+ AIT pattern match on dst rangeBLOCK
OTP_HARVEST0.80(alone)QUARANTINE
SPAM0.90+ rate-governor trippedBLOCK
SPAM0.90(alone)FLAG
FINANCIAL_FRAUD0.85+ sender-id spoofBLOCK
GAMBLING0.90(alone, blanket illegal per ATRA)BLOCK

5.2 MNO-fairness monitor

To avoid a bias against any one MNO (e.g. if the training data disproportionately represented one operator's traffic), a daily fairness job computes:

ai_block_rate_by_mno = count(BLOCK verdicts with CLASSIFIER hit, last 24h) /
count(total MO verdicts, last 24h)
group by mnoId

If stdev(ai_block_rate_by_mno) / mean(...) > 0.5, alert FirewallAiFairnessDrift (HIGH). The standing response is: disable the CLASSIFIER rule pending T&S review and model retraining at fraud-intel-service.

5.3 Shadow mode

Rules may be authored with shadowMode=true:

  • The classifier runs; the verdict is computed as if the rule were active; but the actual returned verdict is the non-shadow pipeline's result.
  • Shadow-mode decisions are emitted as firewall.rule.shadow.v1 events for offline comparison with the live decision.
  • T&S promotes a shadow rule to live only after a 7-day comparison shows false-positive rate < 1% and false-negative delta < 0.5%.

6. HITL (human-in-the-loop) flow

Classifier-driven QUARANTINE verdicts always route to NOC review:

CLASSIFIER match → QUARANTINE
→ firewall.quarantine.held.v1
→ NOC console displays: decrypted PDU, classifier JSON, model version
→ NOC → RELEASE or REJECT with notes
→ Review outcome emitted as firewall.quarantine.{released|rejected}.v1
→ fraud-intel-service subscribes to these and feeds them back as
labelled training examples for the next model revision

This closes the loop: operator decisions become training signal for the next classifier version.


7. Latency budget

StageBudgetOperating mode
PII anonymisation pre-process0.5 msall modes
Cache lookup (fw:classifier:{hash})0.5 msall modes
LLM inference (on cache MISS)15 ms P95NORMAL, DEGRADED
LLM inferencedisabledPANIC, MAINTENANCE
Total classifier overhead on verdict path≤ 16 ms

If remaining evaluation budget at the classifier invocation point is < 16 ms, the rule is soft-skipped and the verdict assembles without it (flag BUDGET_SKIP_CLASSIFIER added).


8. Cache strategy

Classifier responses are cached aggressively because SMS templates (OTPs, alerts, campaigns) are highly repetitive:

KeyTTLExpected hit rate
fw:classifier:{sha256(piiRedactedBody)}24 h≥ 95% for OTP/template traffic
fw:classifier:coldset:{sha256(piiRedactedBody)}7 days (secondary cold cache)≥ 98% including cross-pod

On LLM model-version change, the cache key format becomes fw:classifier:{modelVersion}:{sha256} — a version change implicitly bypasses stale cache without an explicit purge.


9. Failure modes

FailureDetectionFirewall behaviour
local-llm-service returns 5xxfirewall_classifier_errors_totalCircuit breaker (5 errors / 10 s → open 60 s); CLASSIFIER rules skip with flag=CLASSIFIER_UNAVAILABLE; verdict proceeds without classifier input
LLM timeout > 15 mshistogram overflowSame as above; metric firewall_classifier_timeout_total++
Cache (Redis) unavailablefirewall_redis_errors_totalInference runs live; latency impact ~15 ms per call; caller adjusts effective budget down
Model version mismatch between pods (rollout in progress)firewall_classifier_model_version gauge divergentAudit rows record the specific model version that produced each verdict — no corrective action needed beyond observability
Fraud-intel signal pipeline paused (no fraud.detected.* for 1 h)firewall_fraudintel_event_age_seconds > 3600Alert FirewallFraudIntelStale; existing signatures remain active (conservative); NOC notified

10. No cloud-LLM for firewall traffic

Unlike compliance-engine (which has an opt-in external-LLM failover for non-regulated tenants), the sms-firewall-service is forbidden from calling any external LLM API. Rationale:

  • Firewall traffic includes inbound MO from national subscribers — this is the most PII-sensitive plane on the platform (A-number + B-number + body, all real subscribers).
  • ADR-0004 §3 classifies the national perimeter as a sovereign-asset data class; no content may cross the national boundary for analysis.
  • A start-up guard in the firewall binary refuses to boot if EXTERNAL_LLM_ENABLED=true is set; there is no configuration path to enable it.

Enforced at code level:

if (process.env.EXTERNAL_LLM_ENABLED === 'true') {
throw new Error('External LLM is architecturally forbidden in sms-firewall-service; refusing to start');
}

11. Operations runbook (summary)

TaskOwnerCadence
Classifier accuracy audit against labelled SMS sample (1000 msgs)Trust & Safety + MLMonthly
MNO-fairness drift reviewTrust & SafetyWeekly
Shadow-mode promotion reviewTrust & Safety leadWeekly
Model version upgrade (coordinated with compliance-engine)Platform EngineeringAs released, with 7-day shadow first
Fraud-intel signal volume reviewTrust & SafetyWeekly
Training-set update (from NOC review outcomes)ML (fraud-intel-service)Monthly