Compliance Layer — AI Integration

Status: populated | Last updated: 2026-04-18

1. Purpose

The Compliance Layer uses Large Language Models (LLMs) to perform content classification on SMS bodies when keyword and regex rules are insufficient. AI classification is applied only when an active AI_CLASSIFICATION rule is present in the tenant's rule set — most messages never invoke the LLM.

AI classification strengthens detection for:

Obfuscated spam (misspellings, unicode substitution, zero-width characters)
Context-dependent phishing (URLs + social engineering patterns)
Multilingual content not covered by keyword lists
Emerging fraud patterns that outpace keyword list updates

2. Provider Strategy — Local LLM First

Provider	Priority	Use Case
Local LLM (self-hosted)	Primary	Default for all production traffic; preferred for data residency, cost predictability, and no-DPA footprint
Anthropic Claude API	Secondary / failover	Optional — enabled per tenant or as automatic failover when local LLM is unavailable
OpenAI API	Tertiary / failover	Optional — secondary external failover
Mock provider	Dev/test only	In-memory responses for local development and CI

Provider selection is governed by AI_PROVIDER environment variable and a failover chain. The async pipeline's relaxed latency SLA (500 ms) accommodates local LLM inference comfortably.

3. Local LLM Architecture

3.1 Deployment Topology

The local LLM runs as a separate Kubernetes deployment in the same namespace as compliance-engine, not as a sidecar (different scaling characteristics):

┌───────────────────────┐        ┌───────────────────────┐
│   compliance-engine   │ ─────► │      local-llm        │
│   (Node.js pods)      │ gRPC/  │   (GPU pods, OpenAI-  │
│   3–20 replicas       │ HTTP   │    compatible API)    │
│                       │        │   2–6 replicas        │
└───────────────────────┘        └───────────────────────┘
         │
         ▼
      Redis (AI cache)

3.2 Recommended Serving Stack

Component	Choice	Rationale
Inference server	vLLM (primary) or Ollama (dev) or TGI	vLLM provides OpenAI-compatible API + excellent throughput via PagedAttention
Model	Llama-3.1-8B-Instruct or Mistral-7B-Instruct or Qwen2.5-7B-Instruct	Small enough for cost-effective GPU use; strong multilingual (English, Dari/Farsi, Pashto, Arabic)
GPU	NVIDIA A10 / L4 (24 GB) or A100 (40 GB) for higher throughput	Smaller GPUs suffice for 7–8B models
Quantisation	AWQ or GPTQ 4-bit	~4× memory reduction with <1% quality loss on classification tasks
API interface	OpenAI-compatible (`/v1/chat/completions`)	Allows seamless provider abstraction — same client code as Claude/OpenAI

3.3 Capacity Planning

Assumptions:

7B model with 4-bit quantisation on A10 GPU
Typical request: 200 input tokens, 150 output tokens (structured JSON)
Single-GPU throughput: ~10–20 requests/second

Expected AI eval RPS	GPU pods	Cache hit target
1–5 RPS	2 (HA)	≥ 90%
5–20 RPS	2–4	≥ 95%
20–100 RPS	4–6	≥ 95%
100+ RPS	6+ with load balancing	≥ 97%

Cache hit rate is the dominant cost driver — 95% cache hit means only 5% of AI-rule evaluations reach the LLM.

3.4 Provider Abstraction

compliance-engine implements a provider-agnostic LLMClient interface:

interface LLMClient {
  classify(body: string, categories: AiContentCategory[]): Promise<ClassificationResult>;
}

// Concrete implementations:
class LocalLLMClient implements LLMClient { /* vLLM endpoint */ }
class ClaudeClient   implements LLMClient { /* Anthropic SDK */ }
class OpenAIClient   implements LLMClient { /* OpenAI SDK */ }
class MockClient     implements LLMClient { /* deterministic test responses */ }

A LLMRouter selects the client based on AI_PROVIDER and handles failover:

class LLMRouter implements LLMClient {
  async classify(body: string, categories: AiContentCategory[]) {
    try {
      return await this.primary.classify(body, categories);
    } catch (err) {
      this.circuitBreaker.recordFailure();
      if (this.secondary && !this.circuitBreaker.isOpenFor(this.primary)) {
        return await this.secondary.classify(body, categories);
      }
      throw err; // caller applies rule.fallbackAction
    }
  }
}

4. Classification Categories

Category	Description
`SPAM`	Unsolicited commercial messaging, clickbait
`PHISHING`	Credential-harvesting URLs, fake bank/service pages, impersonation
`FINANCIAL_FRAUD`	Advance-fee fraud, fake investment schemes, money muling
`ADULT_CONTENT`	Sexually explicit content
`HATE_SPEECH`	Slurs, incitement against protected groups
`POLITICAL_CONTENT`	Political campaigning (relevant in election blackout periods)
`DRUG_REFERENCE`	Drug sales, narcotic-related content
`GAMBLING`	Gambling promotions in restricted jurisdictions
`TERRORISM`	Terrorist propaganda, recruitment, incitement
`MALWARE_LINK`	URLs pointing to known malware distribution, APK sideload prompts
`HEALTH_MISINFORMATION`	False health claims, unlicensed medical advice

Categories are evaluated independently — a single message can score high on multiple categories.

5. Prompt Design

Single-turn, structured-output prompt. The LLM returns only a JSON object with confidence scores — no free-text reasoning, which minimises prompt-injection risk and token usage.

System:
You are an SMS compliance content classifier. Given an SMS message body,
return a JSON object mapping each of the following categories to a
confidence score between 0.0 and 1.0:
SPAM, PHISHING, FINANCIAL_FRAUD, ADULT_CONTENT, HATE_SPEECH,
POLITICAL_CONTENT, DRUG_REFERENCE, GAMBLING, TERRORISM, MALWARE_LINK,
HEALTH_MISINFORMATION.

A score of 1.0 means certain match; 0.0 means no indication.
Return ONLY the JSON object, no explanation or other text.

User:
[MESSAGE BODY HERE]

Expected response (enforced via grammar-constrained decoding on local LLM):

{
  "SPAM": 0.12,
  "PHISHING": 0.87,
  "FINANCIAL_FRAUD": 0.05,
  "ADULT_CONTENT": 0.0,
  "HATE_SPEECH": 0.0,
  "POLITICAL_CONTENT": 0.0,
  "DRUG_REFERENCE": 0.0,
  "GAMBLING": 0.0,
  "TERRORISM": 0.0,
  "MALWARE_LINK": 0.65,
  "HEALTH_MISINFORMATION": 0.0
}

Grammar-constrained decoding

Local LLMs (via vLLM, llama.cpp, etc.) support constraint-based decoding that guarantees the output matches a JSON schema. This eliminates the parsing error class entirely — the response is always valid JSON of the expected shape.

Prompt injection resistance

User input (message body) lives in a clearly delimited user message.
System prompt restricts output to JSON — injection attempts have no free-text channel.
Response parser rejects any response not matching the expected schema — treated as LLM failure (fallback action applies).

6. PII Anonymisation Before Inference

Although the local LLM runs inside our trust boundary, anonymisation is still recommended as defence-in-depth (and critical if external LLM failover is ever enabled). When ANONYMIZE_BODY_BEFORE_AI=true:

Pattern	Replacement
E.164 phone numbers	`[PHONE]`
Monetary amounts	`[AMOUNT]`
5+ digit sequences (OTPs, account numbers)	`[NUMERIC]`
Common first names (curated list)	`[NAME]`
URLs	`[URL]` (presence preserved for phishing detection)

7. Caching Strategy

AI classification is the slowest operation in compliance evaluation. Aggressive caching is critical:

Cache Layer	Key	TTL	Hit Expectation
Redis L1	`ai:cache:{sha256(anonymised_body)}`	24 h	≥ 95% for templated messages (OTPs, alerts, campaigns)
PostgreSQL L2 (future)	`ai_classification_cache` table	7 d	≥ 98% including cross-pod sharing

Cache entry format

{
  "version": "1.0",
  "classifiedAt": "2026-04-18T12:00:00Z",
  "provider": "local",
  "model": "llama-3.1-8b-instruct-awq",
  "categories": {
    "SPAM": 0.12,
    "PHISHING": 0.87,
    ...
  }
}

Cache invalidation

Time-based TTL (24 h) is the primary mechanism.
On model version upgrade: cache key format is ai:cache:{modelVersion}:{sha256} — a model change implicitly bypasses cache without an explicit purge.

8. Budget and Timeout Control

Control	Value	Purpose
`AI_TIMEOUT_MS`	2000 ms (default for local LLM)	Hard limit for a single LLM call
Eval budget allocation	300 ms of 450 ms total	Maximum AI spend within one evaluation
Concurrency limit	200 in-flight LLM calls per pod	Prevent thundering herd on LLM service
Circuit breaker	5 consecutive failures in 30 s opens circuit for 60 s	Shed load when LLM degraded

When timeout or concurrency limits are reached, the rule's fallbackAction applies. For fail-closed operation, fallbackAction: HOLD is the recommended default for all AI rules — the message is queued for manual review rather than let through.

9. Rule Configuration Examples

High-severity phishing detection (HOLD on AI unavailable)

{
  "name": "Phishing URL Detection",
  "ruleType": "AI_CLASSIFICATION",
  "action": "HOLD",
  "priority": 10,
  "config": {
    "categories": ["PHISHING", "MALWARE_LINK"],
    "minConfidence": 0.75,
    "fallbackAction": "HOLD"
  }
}

Enhanced spam detection (still HOLD on AI unavailable — fail-closed)

{
  "name": "Enhanced Spam Detection",
  "ruleType": "AI_CLASSIFICATION",
  "action": "FLAG",
  "priority": 50,
  "config": {
    "categories": ["SPAM"],
    "minConfidence": 0.85,
    "fallbackAction": "HOLD"
  }
}

National security — combined keyword + AI

{
  "name": "Terrorism Content (Combined)",
  "ruleType": "COMPOSITE",
  "action": "BLOCK",
  "priority": 1,
  "config": {
    "operator": "OR",
    "ruleIds": ["keyword-terror-rule-id", "ai-terrorism-rule-id"]
  }
}

10. Monitoring AI Usage

Metric	Target
`compliance_ai_cache_hits_total / compliance_ai_requests_total`	≥ 95%
`compliance_ai_duration_seconds{quantile="0.95"}`	≤ 500 ms (local LLM)
`compliance_ai_fallback_total`	≤ 0.1% of AI requests
Local LLM GPU utilisation	40–70% (headroom for spikes)

11. Cost Model

Local LLM (primary)

Assumptions:

2× A10 GPU nodes, $0.60/hr each = $1.20/hr = ~$864/month
Throughput: 20 RPS sustained per pod, 40 RPS total
At 95% cache hit rate, 40 RPS supports ~800 RPS of AI-rule evaluations
Monthly capacity: ~2 billion AI evaluations at headroom

Fixed cost model — scale cost by GPU capacity, not per-message. Cost-effective at volume.

Fallback external LLM (Claude Haiku — illustrative)

~$21.60 per 1M messages evaluated (per prior estimate)
Used only on local LLM failover or per-tenant opt-in
Budget cap configurable via Anthropic SDK rate limits

12. Local LLM Operations Runbook (Summary)

Task	Owner	Cadence
Model evaluation against labelled SMS dataset	Trust & Safety + ML	Quarterly
Model version upgrade (e.g., Llama 3.1 → 3.2)	Platform Engineering	As released, with A/B test first
GPU pod scaling review	Platform SRE	Monthly
Classification accuracy audit (precision/recall per category)	Trust & Safety	Monthly
Prompt tuning based on false-positive patterns	Trust & Safety + ML	Ongoing
Fine-tuning on platform-specific examples (future)	ML	2026 Q4

13. Future Enhancements

Enhancement	Rationale	Timeline
Fine-tuned SMS-domain classifier	Lower latency, higher accuracy for our specific traffic patterns	Q4 2026
Active learning — reviewer feedback updates training set	Continuous improvement	2027 Q1
Multilingual keyword generation via LLM	Auto-expand keyword lists for new languages	2027
Sender-reputation + content joint model	Reduce false positives on legitimate senders	2027
On-device inference for edge deployments	Regional compliance, offline resilience	2027+

1. Purpose​

2. Provider Strategy — Local LLM First​

3. Local LLM Architecture​

3.1 Deployment Topology​

3.2 Recommended Serving Stack​

3.3 Capacity Planning​

3.4 Provider Abstraction​

4. Classification Categories​

5. Prompt Design​

Grammar-constrained decoding​

Prompt injection resistance​

6. PII Anonymisation Before Inference​

7. Caching Strategy​

Cache entry format​

Cache invalidation​

8. Budget and Timeout Control​

9. Rule Configuration Examples​

High-severity phishing detection (HOLD on AI unavailable)​

Enhanced spam detection (still HOLD on AI unavailable — fail-closed)​

National security — combined keyword + AI​

10. Monitoring AI Usage​

11. Cost Model​

Local LLM (primary)​

Fallback external LLM (Claude Haiku — illustrative)​

12. Local LLM Operations Runbook (Summary)​

13. Future Enhancements​