Skip to main content

Compliance Layer — AI Integration

Status: populated | Last updated: 2026-04-18

1. Purpose

The Compliance Layer uses Large Language Models (LLMs) to perform content classification on SMS bodies when keyword and regex rules are insufficient. AI classification is applied only when an active AI_CLASSIFICATION rule is present in the tenant's rule set — most messages never invoke the LLM.

AI classification strengthens detection for:

  • Obfuscated spam (misspellings, unicode substitution, zero-width characters)
  • Context-dependent phishing (URLs + social engineering patterns)
  • Multilingual content not covered by keyword lists
  • Emerging fraud patterns that outpace keyword list updates

2. Provider Strategy — Local LLM First

ProviderPriorityUse Case
Local LLM (self-hosted)PrimaryDefault for all production traffic; preferred for data residency, cost predictability, and no-DPA footprint
Anthropic Claude APISecondary / failoverOptional — enabled per tenant or as automatic failover when local LLM is unavailable
OpenAI APITertiary / failoverOptional — secondary external failover
Mock providerDev/test onlyIn-memory responses for local development and CI

Provider selection is governed by AI_PROVIDER environment variable and a failover chain. The async pipeline's relaxed latency SLA (500 ms) accommodates local LLM inference comfortably.


3. Local LLM Architecture

3.1 Deployment Topology

The local LLM runs as a separate Kubernetes deployment in the same namespace as compliance-engine, not as a sidecar (different scaling characteristics):

┌───────────────────────┐ ┌───────────────────────┐
│ compliance-engine │ ─────► │ local-llm │
│ (Node.js pods) │ gRPC/ │ (GPU pods, OpenAI- │
│ 3–20 replicas │ HTTP │ compatible API) │
│ │ │ 2–6 replicas │
└───────────────────────┘ └───────────────────────┘


Redis (AI cache)
ComponentChoiceRationale
Inference servervLLM (primary) or Ollama (dev) or TGIvLLM provides OpenAI-compatible API + excellent throughput via PagedAttention
ModelLlama-3.1-8B-Instruct or Mistral-7B-Instruct or Qwen2.5-7B-InstructSmall enough for cost-effective GPU use; strong multilingual (English, Dari/Farsi, Pashto, Arabic)
GPUNVIDIA A10 / L4 (24 GB) or A100 (40 GB) for higher throughputSmaller GPUs suffice for 7–8B models
QuantisationAWQ or GPTQ 4-bit~4× memory reduction with <1% quality loss on classification tasks
API interfaceOpenAI-compatible (/v1/chat/completions)Allows seamless provider abstraction — same client code as Claude/OpenAI

3.3 Capacity Planning

Assumptions:

  • 7B model with 4-bit quantisation on A10 GPU
  • Typical request: 200 input tokens, 150 output tokens (structured JSON)
  • Single-GPU throughput: ~10–20 requests/second
Expected AI eval RPSGPU podsCache hit target
1–5 RPS2 (HA)≥ 90%
5–20 RPS2–4≥ 95%
20–100 RPS4–6≥ 95%
100+ RPS6+ with load balancing≥ 97%

Cache hit rate is the dominant cost driver — 95% cache hit means only 5% of AI-rule evaluations reach the LLM.

3.4 Provider Abstraction

compliance-engine implements a provider-agnostic LLMClient interface:

interface LLMClient {
classify(body: string, categories: AiContentCategory[]): Promise<ClassificationResult>;
}

// Concrete implementations:
class LocalLLMClient implements LLMClient { /* vLLM endpoint */ }
class ClaudeClient implements LLMClient { /* Anthropic SDK */ }
class OpenAIClient implements LLMClient { /* OpenAI SDK */ }
class MockClient implements LLMClient { /* deterministic test responses */ }

A LLMRouter selects the client based on AI_PROVIDER and handles failover:

class LLMRouter implements LLMClient {
async classify(body: string, categories: AiContentCategory[]) {
try {
return await this.primary.classify(body, categories);
} catch (err) {
this.circuitBreaker.recordFailure();
if (this.secondary && !this.circuitBreaker.isOpenFor(this.primary)) {
return await this.secondary.classify(body, categories);
}
throw err; // caller applies rule.fallbackAction
}
}
}

4. Classification Categories

CategoryDescription
SPAMUnsolicited commercial messaging, clickbait
PHISHINGCredential-harvesting URLs, fake bank/service pages, impersonation
FINANCIAL_FRAUDAdvance-fee fraud, fake investment schemes, money muling
ADULT_CONTENTSexually explicit content
HATE_SPEECHSlurs, incitement against protected groups
POLITICAL_CONTENTPolitical campaigning (relevant in election blackout periods)
DRUG_REFERENCEDrug sales, narcotic-related content
GAMBLINGGambling promotions in restricted jurisdictions
TERRORISMTerrorist propaganda, recruitment, incitement
MALWARE_LINKURLs pointing to known malware distribution, APK sideload prompts
HEALTH_MISINFORMATIONFalse health claims, unlicensed medical advice

Categories are evaluated independently — a single message can score high on multiple categories.


5. Prompt Design

Single-turn, structured-output prompt. The LLM returns only a JSON object with confidence scores — no free-text reasoning, which minimises prompt-injection risk and token usage.

System:
You are an SMS compliance content classifier. Given an SMS message body,
return a JSON object mapping each of the following categories to a
confidence score between 0.0 and 1.0:
SPAM, PHISHING, FINANCIAL_FRAUD, ADULT_CONTENT, HATE_SPEECH,
POLITICAL_CONTENT, DRUG_REFERENCE, GAMBLING, TERRORISM, MALWARE_LINK,
HEALTH_MISINFORMATION.

A score of 1.0 means certain match; 0.0 means no indication.
Return ONLY the JSON object, no explanation or other text.

User:
[MESSAGE BODY HERE]

Expected response (enforced via grammar-constrained decoding on local LLM):

{
"SPAM": 0.12,
"PHISHING": 0.87,
"FINANCIAL_FRAUD": 0.05,
"ADULT_CONTENT": 0.0,
"HATE_SPEECH": 0.0,
"POLITICAL_CONTENT": 0.0,
"DRUG_REFERENCE": 0.0,
"GAMBLING": 0.0,
"TERRORISM": 0.0,
"MALWARE_LINK": 0.65,
"HEALTH_MISINFORMATION": 0.0
}

Grammar-constrained decoding

Local LLMs (via vLLM, llama.cpp, etc.) support constraint-based decoding that guarantees the output matches a JSON schema. This eliminates the parsing error class entirely — the response is always valid JSON of the expected shape.

Prompt injection resistance

  • User input (message body) lives in a clearly delimited user message.
  • System prompt restricts output to JSON — injection attempts have no free-text channel.
  • Response parser rejects any response not matching the expected schema — treated as LLM failure (fallback action applies).

6. PII Anonymisation Before Inference

Although the local LLM runs inside our trust boundary, anonymisation is still recommended as defence-in-depth (and critical if external LLM failover is ever enabled). When ANONYMIZE_BODY_BEFORE_AI=true:

PatternReplacement
E.164 phone numbers[PHONE]
Monetary amounts[AMOUNT]
5+ digit sequences (OTPs, account numbers)[NUMERIC]
Common first names (curated list)[NAME]
URLs[URL] (presence preserved for phishing detection)

7. Caching Strategy

AI classification is the slowest operation in compliance evaluation. Aggressive caching is critical:

Cache LayerKeyTTLHit Expectation
Redis L1ai:cache:{sha256(anonymised_body)}24 h≥ 95% for templated messages (OTPs, alerts, campaigns)
PostgreSQL L2 (future)ai_classification_cache table7 d≥ 98% including cross-pod sharing

Cache entry format

{
"version": "1.0",
"classifiedAt": "2026-04-18T12:00:00Z",
"provider": "local",
"model": "llama-3.1-8b-instruct-awq",
"categories": {
"SPAM": 0.12,
"PHISHING": 0.87,
...
}
}

Cache invalidation

  • Time-based TTL (24 h) is the primary mechanism.
  • On model version upgrade: cache key format is ai:cache:{modelVersion}:{sha256} — a model change implicitly bypasses cache without an explicit purge.

8. Budget and Timeout Control

ControlValuePurpose
AI_TIMEOUT_MS2000 ms (default for local LLM)Hard limit for a single LLM call
Eval budget allocation300 ms of 450 ms totalMaximum AI spend within one evaluation
Concurrency limit200 in-flight LLM calls per podPrevent thundering herd on LLM service
Circuit breaker5 consecutive failures in 30 s opens circuit for 60 sShed load when LLM degraded

When timeout or concurrency limits are reached, the rule's fallbackAction applies. For fail-closed operation, fallbackAction: HOLD is the recommended default for all AI rules — the message is queued for manual review rather than let through.


9. Rule Configuration Examples

High-severity phishing detection (HOLD on AI unavailable)

{
"name": "Phishing URL Detection",
"ruleType": "AI_CLASSIFICATION",
"action": "HOLD",
"priority": 10,
"config": {
"categories": ["PHISHING", "MALWARE_LINK"],
"minConfidence": 0.75,
"fallbackAction": "HOLD"
}
}

Enhanced spam detection (still HOLD on AI unavailable — fail-closed)

{
"name": "Enhanced Spam Detection",
"ruleType": "AI_CLASSIFICATION",
"action": "FLAG",
"priority": 50,
"config": {
"categories": ["SPAM"],
"minConfidence": 0.85,
"fallbackAction": "HOLD"
}
}

National security — combined keyword + AI

{
"name": "Terrorism Content (Combined)",
"ruleType": "COMPOSITE",
"action": "BLOCK",
"priority": 1,
"config": {
"operator": "OR",
"ruleIds": ["keyword-terror-rule-id", "ai-terrorism-rule-id"]
}
}

10. Monitoring AI Usage

MetricTarget
compliance_ai_cache_hits_total / compliance_ai_requests_total≥ 95%
compliance_ai_duration_seconds{quantile="0.95"}≤ 500 ms (local LLM)
compliance_ai_fallback_total≤ 0.1% of AI requests
Local LLM GPU utilisation40–70% (headroom for spikes)

11. Cost Model

Local LLM (primary)

Assumptions:

  • 2× A10 GPU nodes, $0.60/hr each = $1.20/hr = ~$864/month
  • Throughput: 20 RPS sustained per pod, 40 RPS total
  • At 95% cache hit rate, 40 RPS supports ~800 RPS of AI-rule evaluations
  • Monthly capacity: ~2 billion AI evaluations at headroom

Fixed cost model — scale cost by GPU capacity, not per-message. Cost-effective at volume.

Fallback external LLM (Claude Haiku — illustrative)

  • ~$21.60 per 1M messages evaluated (per prior estimate)
  • Used only on local LLM failover or per-tenant opt-in
  • Budget cap configurable via Anthropic SDK rate limits

12. Local LLM Operations Runbook (Summary)

TaskOwnerCadence
Model evaluation against labelled SMS datasetTrust & Safety + MLQuarterly
Model version upgrade (e.g., Llama 3.1 → 3.2)Platform EngineeringAs released, with A/B test first
GPU pod scaling reviewPlatform SREMonthly
Classification accuracy audit (precision/recall per category)Trust & SafetyMonthly
Prompt tuning based on false-positive patternsTrust & Safety + MLOngoing
Fine-tuning on platform-specific examples (future)ML2026 Q4

13. Future Enhancements

EnhancementRationaleTimeline
Fine-tuned SMS-domain classifierLower latency, higher accuracy for our specific traffic patternsQ4 2026
Active learning — reviewer feedback updates training setContinuous improvement2027 Q1
Multilingual keyword generation via LLMAuto-expand keyword lists for new languages2027
Sender-reputation + content joint modelReduce false positives on legitimate senders2027
On-device inference for edge deploymentsRegional compliance, offline resilience2027+