Skip to main content

Consent Ledger Service — AI Integration

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SECURITY_MODEL · APPLICATION_LOGIC · SERVICE_RISK_REGISTER

1. Posture: AI is minimal and offline-only

consent-ledger-service is intentionally an AI-light service. Consent decisions, audit integrity, DND mirroring, and STOP-keyword matching are all deterministic and rule-based. AI is used only in two narrowly-scoped offline / advisory roles described below. Neither AI use case is on the hot path; neither AI use case sends MSISDN, MO body, audit content, or any other PII to any cloud LLM or third-party model API.

Non-use guarantees (explicit)

  • No cloud LLM, ever, for PII-bearing content. The service does not call Anthropic, OpenAI, Google, or any other cloud LLM with subscriber MSISDN, MO body, consent records, audit rows, or false-positive feedback. This is enforced at the egress NetworkPolicy (see DEPLOYMENT_TOPOLOGY §1 NetworkPolicy) which whitelists only Postgres, Redis, NATS, ATRA SFTP, Vault, and the on-cluster compliance-ai LLM service.
  • No real-time AI in CheckConsent. The hot path is pure SQL + Redis with a 5 ms P95 budget; injecting AI here would violate the SLA and the determinism the regulator expects.
  • No AI authoring of audit rows. Every consent.audit row is produced from deterministic state changes; AI does not draft or summarise audit content.
  • No PII leaves Afghanistan. Per ADR-0004 §3 and the consent residency invariant. Even the on-cluster compliance-ai runs in Afghan regions; no model weights are pulled at runtime from offshore.

2. AI use case A — STOP-keyword variant suggestion (offline batch, advisory)

2.1 Purpose

Help Trust & Safety admins discover new STOP keyword variants as they appear in the wild — slang, dialect shifts, transliterations, ZWJ-bracketed obfuscations. Output is a suggestion list for human review (CONS-US-009 §1 admin-driven catalog). The model never auto-adds keywords.

2.2 Topology

  • Runs as a scheduled Kubernetes CronJob consent-keyword-suggester, weekly at Sunday 04:00 Asia/Kabul.
  • Reads consent.false_positive_feedback and consent.audit rows of STOP_MO_RECEIVED events from the past 30 days where tenantsRevoked is empty (= no match in the catalog yet but a STOP was attempted on the same MSISDN within ±60 s of a successful STOP).
  • The model is the same on-cluster compliance-ai deployment that compliance-engine uses (vLLM serving Llama-3.1-8B-Instruct-AWQ; see compliance-engine AI_INTEGRATION §3). No new infra.

2.3 Input redaction

Before any token leaves consent-ledger-service for the LLM:

PatternReplacement
MSISDN[PHONE]
Tenant identifier (sender ID)[SENDER]
Numeric sequences (≥ 5 digits — likely OTP)[NUMERIC]
URLs[URL]
Names matching the curated PS/DR/AR/EN name list[NAME]

The redactor is a strict allow-by-default-deny pipeline in services/consent-ledger-service/src/ai/redactor.ts; an ESLint rule forbids calling the LLM client with raw input.

2.4 Prompt (single-turn, JSON-constrained output)

System:
You are a SMS opt-out keyword variant detector for a national SMS gateway in
Afghanistan. The gateway recognises these keywords (per language) as opt-out
signals: <CATALOG_DUMP>. Given a list of recently received SMS bodies that did
NOT match the catalog but were sent by subscribers who later issued a confirmed
STOP, propose up to 10 candidate keywords per language that should be added.

For each candidate, return:
- keyword (NFKC-normalised, lowercase)
- language (EN | DR | PS | AR)
- evidence_count (number of inputs that contained it)
- example_redacted (one example body with PII redacted)
- confidence (0.0 - 1.0)

Reply with ONLY the JSON object {candidates: [...]}, no explanation.

User:
<REDACTED INPUTS, 1 PER LINE>

vLLM grammar-constrained decoding enforces the JSON shape. A response that does not parse is dropped with a metric consent_ai_keyword_suggester_parse_error_total.

2.5 HITL (human-in-the-loop)

  1. Output written to consent_keyword_suggestions (a non-DDL workshop table, not part of consent schema).
  2. A daily Slack/Email digest to T&S leads lists the top 20 candidates.
  3. T&S admin reviews each suggestion in the admin dashboard. Approval triggers POST /v1/admin/consent/stop-keywords with attribution addedBy = AI_SUGGESTED_REVIEWED_BY:{userId}.
  4. Rejected candidates feed back as negative examples for the next run.

No keyword is ever added to the catalog without explicit human approval. AI's role is candidate generation; the human is the decision authority.

3. AI use case B — Multi-language NLU enhancement (deferred to Phase 2)

3.1 Purpose (deferred)

Recognise free-text natural-language opt-outs that current keyword matching misses (e.g., "stop sending me messages please" or its Pashto/Dari/Arabic equivalents).

3.2 Status

Out of scope for v1. The risk of false-positive opt-outs from misclassification is too high to deploy without significant red-team validation. Acceptance bar for Phase 2:

  • ≥ 99.5% precision on a Trust & Safety-curated 10,000-message labelled dataset (per language).
  • ≥ 95% recall on the same dataset.
  • < 50 ms P95 inference latency on the on-cluster LLM (so it could be added to the STOP MO consumer without breaching the 2 s end-to-end SLA).
  • Dual-track verification: any AI opt-out is held for 60 s and only commits if no human rescind arrives — gives subscribers a "wait, no, undo" window.

3.3 Architecture (when activated)

If activated in Phase 2, the consumer would:

  1. Run keyword match first (deterministic, current behaviour). If matched, no AI invocation.
  2. On no match, send the redacted body to the local LLM with a classification prompt ({INTENT: STOP|UNSUBSCRIBE|OTHER, confidence}).
  3. If INTENT == STOP && confidence >= 0.9, place the revoke into a "pending NLU revocation" queue with 60 s defer.
  4. After 60 s with no further MO from the same MSISDN, commit the revocation with verificationMethod = NLU_AI_REVIEWED (a new method that consumers can treat differently).

This use case is documented here for forward-compatibility; it ships off in v1.

4. AI provenance

When AI is used (case A), every record carries provenance fields:

FieldValue
aiProviderlocal-vllm
aiModele.g., llama-3.1-8b-instruct-awq
aiModelVersionSemantic version + content hash
aiPromptTemplateVersione.g., keyword_suggester.v3
aiInferredAtRFC 3339 UTC
aiConfidenceScore0.0–1.0
humanReviewedByUUID of the admin who approved
humanReviewedAtRFC 3339 UTC

Provenance is stored on consent.stop_keywords.metadata.ai_provenance (JSON). The audit row for the keyword addition (KEYWORD_CATALOG_CHANGED) embeds the same provenance, so a regulator query can trace any catalog entry back to the model + prompt + reviewer.

5. Moderation policy

The on-cluster LLM is the same model + system prompt used by compliance-engine's content classifier; its safety posture inherits from that service's moderation envelope (no harmful generation, JSON-only output, prompt-injection resistance via constrained decoding).

consent-ledger-service adds these specific moderation guards:

  • Output filter: Suggestions matching any platform-default keyword are dropped silently (already in catalog).
  • Length cap: Each suggested keyword ≤ 32 chars; longer suggestions dropped.
  • Script consistency: Each suggestion's script must match its declared language (Latin for EN; Arabic-script for DR/PS/AR). Mixed-script suggestions dropped.
  • Profanity filter: Suggestions matching the platform profanity list are dropped — opt-out keywords should not double as slurs.
  • Rate cap: ≤ 50 candidate suggestions per run, regardless of LLM output length.

6. Observability

Metrics (Prometheus) for the keyword-suggester job:

MetricTypeNotes
consent_ai_keyword_suggester_runs_totalCounterPer status (success, failed, skipped_no_input)
consent_ai_keyword_suggester_candidates_totalCounterPer language
consent_ai_keyword_suggester_accepted_totalCounterAfter human review
consent_ai_keyword_suggester_latency_secondsHistogramPer run
consent_ai_keyword_suggester_parse_error_totalCounterLLM responses that failed JSON parse
consent_ai_redactor_violations_totalCounterInputs that the redactor flagged for raw PII (CRITICAL — should be zero)

Logs are JSON, redacted; the prompt is logged only with redacted content; the response JSON is logged in full because it does not contain PII (only candidate keywords).

7. Cost & capacity model

The keyword-suggester is one weekly batch run with at most a few thousand input lines and a few KB of output. It uses < 1 GPU-minute per run. No incremental cost over the existing compliance-ai deployment.

8. Future enhancements (post-v1)

EnhancementRationaleTimeline
Phase 2 NLU opt-out (case B above)Capture free-text opt-outs not in catalog2027 Q1 (subject to red-team validation)
Multi-language ack-back personalisationAdjust ack-back template per dialect2027 Q2
Anomaly detection on STOP rates per tenantFlag surge as potential adversarial / spam-induced storm2027 Q1
AI-assisted regulator query summarisationTake a regulator question and propose the SQL/audit window2027 Q3 — only with HITL

All future enhancements remain bound by:

  • No PII to cloud LLM.
  • Human approval for any consent-state-affecting decision.
  • Deterministic primary path; AI is always advisory.