SMS Firewall Service — AI Integration

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · SECURITY_MODEL · APPLICATION_LOGIC · OBSERVABILITY

1. Purpose

The sms-firewall-service uses machine-learned signals in three ways:

Consume ML detections from fraud-intel-service. The firewall is a consumer, not a producer — it subscribes to fraud.detected.* events carrying AIT, SIM-box, grey-route, and OTP-harvesting classifications produced by fraud-intel-service's training pipelines, and it materialises those detections into firewall.simbox_signals, firewall.ait_patterns, and firewall.peer_quarantine for deterministic lookup during verdict evaluation.
Invoke a lightweight on-cluster content classifier. CLASSIFIER-type firewall rules may call the local LLM service for inline content classification (spam/phishing/fraud patterns) within the per-call data-plane budget. Classifier output never exceeds FLAG as a standalone signal; promoting to BLOCK requires pairing with a deterministic rule hit.
A/B shadow-model evaluation. Rules authored with shadowMode=true invoke the classifier but do not affect the verdict; they emit firewall.rule.shadow.v1 events for drift analysis.

The firewall does not train models. Training, labelling, and versioning happen in fraud-intel-service per its own AI_INTEGRATION document. The firewall stamps every AI-influenced verdict with aiProvenance for regulator-grade auditability.

2. Consumed fraud-intel signals

Event	Signal type	Effect on firewall
`fraud.detected.simbox.v1`	Graph + ML: SIM-box originator detection (IMEI churn, A-number rotation, cell-ID volatility)	Upsert `firewall.simbox_signals`; matching `srcMsisdn` auto-BLOCK with `SIMBOX_SIGNATURE`
`fraud.detected.ait.v1`	Pattern recognition: Artificially Inflated Traffic destination ranges (per GSMA FF.21 taxonomy — OTP harvest, pumped traffic, IRSF)	Upsert `firewall.ait_patterns`; matching `dstMsisdn` range auto-BLOCK with `AIT_SIGNATURE`
`fraud.detected.greyroute.v1`	ML: grey-route arbitrage peer detection	Add peer to `firewall.peer_quarantine`; subsequent MT from peer auto-QUARANTINE
`fraud.detected.otp_harvest.v1`	Pattern: OTP-harvesting campaign	Add associated `dstMsisdnRange` + content regex to dynamic AIT rule

All events are signed by fraud-intel-service with an Ed25519 key; signature verified against the service's JWKS before state mutation.

2.1 Signal lifecycle

Signals carry confidence ∈ [0, 1]. Only signals with confidence ≥ 0.7 auto-promote to hard BLOCK; 0.5–0.7 promote to QUARANTINE; < 0.5 are advisory (FLAG only).
Signals sliding-expire after 60 days of no re-confirmation; active=FALSE after expiry. fraud-intel-service re-emits fresh events for sustained detections.

3. Local LLM for CLASSIFIER rules

3.1 Deployment model

The content classifier is the same local-llm-service deployment used by compliance-engine and fraud-intel-service. The firewall calls it over in-cluster HTTP with a strict latency ceiling.

┌───────────────────────┐     ┌─────────────────────────┐
│  sms-firewall-service │────▶│   local-llm-service     │
│  (NestJS, 5–15 pods)  │HTTP │   vLLM + Llama-3.1-8B   │
│                       │ ≤15ms│   AWQ 4-bit, GPU nodes │
└───────────────────────┘     └─────────────────────────┘
         │
         ▼
   Redis classifier cache
     (24h body-hash → result)

3.2 Classifier model configuration

Attribute	Value
Inference engine	vLLM (primary) with grammar-constrained JSON decoding
Model	Llama-3.1-8B-Instruct (AWQ 4-bit) — shared with compliance-engine
GPU	NVIDIA L4 / A10 (24 GB)
Max tokens	200 input + 64 output
Temperature	0.0 (deterministic)
Timeout	15 ms P95 — aggressive relative to compliance-engine's 2000 ms because the firewall's total budget is 30 ms
Concurrency cap	50 in-flight per firewall pod
Cache	Redis `fw:classifier:{sha256(pii_redacted_body)}`, TTL 24h

3.3 When CLASSIFIER rules run

CLASSIFIER rules are disabled by default in PANIC mode and are excluded from the hot path unless:

Rule type = CLASSIFIER
enabled = TRUE
Current operating mode ∈ {NORMAL, DEGRADED}
Eval-budget remaining ≥ 20 ms at the point of rule invocation (soft-skip otherwise; emits flag=BUDGET_SKIP_CLASSIFIER)

3.4 Prompt template

Single-turn, structured-output prompt. Response enforced via grammar-constrained JSON:

System:
You classify inbound SMS content for a national telecom firewall. Given
an SMS body (may contain [PHONE], [AMOUNT], [OTP_PLACEHOLDER] redactions),
return a JSON object mapping each of the following categories to a
confidence score in [0.0, 1.0]:
  OTP_HARVEST, PHISHING, SPAM, MALWARE_LINK, HATE_SPEECH,
  FINANCIAL_FRAUD, POLITICAL_INCITEMENT, GAMBLING.
Return ONLY the JSON object, no explanation.

User:
[PII-REDACTED MESSAGE BODY]

Expected response (enforced by grammar):

{
  "OTP_HARVEST": 0.05,
  "PHISHING": 0.88,
  "SPAM": 0.12,
  "MALWARE_LINK": 0.72,
  "HATE_SPEECH": 0.00,
  "FINANCIAL_FRAUD": 0.03,
  "POLITICAL_INCITEMENT": 0.00,
  "GAMBLING": 0.00
}

3.5 PII anonymisation before inference

Per SECURITY_MODEL §3.4, the body is anonymised before the LLM sees it (defence-in-depth even though the LLM runs on-cluster — no cloud-LLM calls are ever permitted for firewall traffic):

Pattern	Replacement
E.164 phone numbers	`[PHONE]`
5+ digit sequences	`[OTP_PLACEHOLDER]`
Monetary amounts (AFN, USD, EUR)	`[AMOUNT]`
URLs	`[URL]` (presence preserved)
Name tokens (curated Dari/Pashto/English list)	`[NAME]`

Anonymisation is content-hashed before the replacement so the cache key is stable across identical-but-differently-instantiated messages.

4. Provenance stamping

Every AI-influenced verdict carries provenance in the ruleHits[].evidence field and in the firewall.audit row's rule_hits JSONB:

{
  "ruleId": "fr_01HKX...",
  "ruleType": "CLASSIFIER",
  "action": "FLAG",
  "severity": "MEDIUM",
  "evidence": "ai:model=llama-3.1-8b-awq@rev-2026-04-01; cat=PHISHING; conf=0.88",
  "confidence": 0.88,
  "aiProvenance": {
    "modelId": "llama-3.1-8b-instruct-awq",
    "modelVersion": "rev-2026-04-01",
    "promptTemplateId": "classify-content.v1",
    "promptHash": "sha256:abc123...",
    "bodyHashRedacted": "sha256:def456...",
    "inferenceLatencyMs": 12,
    "classifiedAt": "2026-04-21T10:14:23.123Z",
    "cacheHit": false
  }
}

The model version is surfaced in the firewall.audit.v1 event so downstream consumers (regulator-portal-service, analytics-service) can reconstruct model-at-time-of-verdict during audit.

5. Moderation policy — fairness & false-positive controls

5.1 Standalone classifier → FLAG only

A verdict of BLOCK or QUARANTINE driven solely by a classifier result is forbidden. Required pairings:

Classifier category	Minimum confidence	Co-signal required to escalate	Escalation verdict
`PHISHING`	0.85	+ blocklist hit on sender or URL	BLOCK
`PHISHING`	0.85	(alone)	FLAG
`OTP_HARVEST`	0.80	+ AIT pattern match on dst range	BLOCK
`OTP_HARVEST`	0.80	(alone)	QUARANTINE
`SPAM`	0.90	+ rate-governor tripped	BLOCK
`SPAM`	0.90	(alone)	FLAG
`FINANCIAL_FRAUD`	0.85	+ sender-id spoof	BLOCK
`GAMBLING`	0.90	(alone, blanket illegal per ATRA)	BLOCK

5.2 MNO-fairness monitor

To avoid a bias against any one MNO (e.g. if the training data disproportionately represented one operator's traffic), a daily fairness job computes:

ai_block_rate_by_mno = count(BLOCK verdicts with CLASSIFIER hit, last 24h) /
                       count(total MO verdicts, last 24h)
                       group by mnoId

If stdev(ai_block_rate_by_mno) / mean(...) > 0.5, alert FirewallAiFairnessDrift (HIGH). The standing response is: disable the CLASSIFIER rule pending T&S review and model retraining at fraud-intel-service.

5.3 Shadow mode

Rules may be authored with shadowMode=true:

The classifier runs; the verdict is computed as if the rule were active; but the actual returned verdict is the non-shadow pipeline's result.
Shadow-mode decisions are emitted as firewall.rule.shadow.v1 events for offline comparison with the live decision.
T&S promotes a shadow rule to live only after a 7-day comparison shows false-positive rate < 1% and false-negative delta < 0.5%.

6. HITL (human-in-the-loop) flow

Classifier-driven QUARANTINE verdicts always route to NOC review:

CLASSIFIER match → QUARANTINE
    → firewall.quarantine.held.v1
    → NOC console displays: decrypted PDU, classifier JSON, model version
    → NOC → RELEASE or REJECT with notes
    → Review outcome emitted as firewall.quarantine.{released|rejected}.v1
    → fraud-intel-service subscribes to these and feeds them back as
      labelled training examples for the next model revision

This closes the loop: operator decisions become training signal for the next classifier version.

7. Latency budget

Stage	Budget	Operating mode
PII anonymisation pre-process	0.5 ms	all modes
Cache lookup (`fw:classifier:{hash}`)	0.5 ms	all modes
LLM inference (on cache MISS)	15 ms P95	`NORMAL`, `DEGRADED`
LLM inference	disabled	`PANIC`, `MAINTENANCE`
Total classifier overhead on verdict path	≤ 16 ms	—

If remaining evaluation budget at the classifier invocation point is < 16 ms, the rule is soft-skipped and the verdict assembles without it (flag BUDGET_SKIP_CLASSIFIER added).

8. Cache strategy

Classifier responses are cached aggressively because SMS templates (OTPs, alerts, campaigns) are highly repetitive:

Key	TTL	Expected hit rate
`fw:classifier:{sha256(piiRedactedBody)}`	24 h	≥ 95% for OTP/template traffic
`fw:classifier:coldset:{sha256(piiRedactedBody)}`	7 days (secondary cold cache)	≥ 98% including cross-pod

On LLM model-version change, the cache key format becomes fw:classifier:{modelVersion}:{sha256} — a version change implicitly bypasses stale cache without an explicit purge.

9. Failure modes

Failure	Detection	Firewall behaviour
`local-llm-service` returns 5xx	`firewall_classifier_errors_total`	Circuit breaker (5 errors / 10 s → open 60 s); CLASSIFIER rules skip with `flag=CLASSIFIER_UNAVAILABLE`; verdict proceeds without classifier input
LLM timeout > 15 ms	histogram overflow	Same as above; metric `firewall_classifier_timeout_total++`
Cache (Redis) unavailable	`firewall_redis_errors_total`	Inference runs live; latency impact ~15 ms per call; caller adjusts effective budget down
Model version mismatch between pods (rollout in progress)	`firewall_classifier_model_version` gauge divergent	Audit rows record the specific model version that produced each verdict — no corrective action needed beyond observability
Fraud-intel signal pipeline paused (no `fraud.detected.*` for 1 h)	`firewall_fraudintel_event_age_seconds > 3600`	Alert `FirewallFraudIntelStale`; existing signatures remain active (conservative); NOC notified

10. No cloud-LLM for firewall traffic

Unlike compliance-engine (which has an opt-in external-LLM failover for non-regulated tenants), the sms-firewall-service is forbidden from calling any external LLM API. Rationale:

Firewall traffic includes inbound MO from national subscribers — this is the most PII-sensitive plane on the platform (A-number + B-number + body, all real subscribers).
ADR-0004 §3 classifies the national perimeter as a sovereign-asset data class; no content may cross the national boundary for analysis.
A start-up guard in the firewall binary refuses to boot if EXTERNAL_LLM_ENABLED=true is set; there is no configuration path to enable it.

Enforced at code level:

if (process.env.EXTERNAL_LLM_ENABLED === 'true') {
  throw new Error('External LLM is architecturally forbidden in sms-firewall-service; refusing to start');
}

11. Operations runbook (summary)

Task	Owner	Cadence
Classifier accuracy audit against labelled SMS sample (1000 msgs)	Trust & Safety + ML	Monthly
MNO-fairness drift review	Trust & Safety	Weekly
Shadow-mode promotion review	Trust & Safety lead	Weekly
Model version upgrade (coordinated with compliance-engine)	Platform Engineering	As released, with 7-day shadow first
Fraud-intel signal volume review	Trust & Safety	Weekly
Training-set update (from NOC review outcomes)	ML (fraud-intel-service)	Monthly

1. Purpose​

2. Consumed fraud-intel signals​

2.1 Signal lifecycle​

3. Local LLM for CLASSIFIER rules​

3.1 Deployment model​

3.2 Classifier model configuration​

3.3 When CLASSIFIER rules run​

3.4 Prompt template​

3.5 PII anonymisation before inference​

4. Provenance stamping​

5. Moderation policy — fairness & false-positive controls​

5.1 Standalone classifier → FLAG only​

5.2 MNO-fairness monitor​

5.3 Shadow mode​

6. HITL (human-in-the-loop) flow​

7. Latency budget​

8. Cache strategy​

9. Failure modes​

10. No cloud-LLM for firewall traffic​

11. Operations runbook (summary)​