Consent Ledger Service — AI Integration
Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SECURITY_MODEL · APPLICATION_LOGIC · SERVICE_RISK_REGISTER
1. Posture: AI is minimal and offline-only
consent-ledger-service is intentionally an AI-light service. Consent decisions, audit integrity, DND mirroring, and STOP-keyword matching are all deterministic and rule-based. AI is used only in two narrowly-scoped offline / advisory roles described below. Neither AI use case is on the hot path; neither AI use case sends MSISDN, MO body, audit content, or any other PII to any cloud LLM or third-party model API.
Non-use guarantees (explicit)
- No cloud LLM, ever, for PII-bearing content. The service does not call Anthropic, OpenAI, Google, or any other cloud LLM with subscriber MSISDN, MO body, consent records, audit rows, or false-positive feedback. This is enforced at the egress NetworkPolicy (see DEPLOYMENT_TOPOLOGY §1 NetworkPolicy) which whitelists only Postgres, Redis, NATS, ATRA SFTP, Vault, and the on-cluster
compliance-aiLLM service. - No real-time AI in
CheckConsent. The hot path is pure SQL + Redis with a 5 ms P95 budget; injecting AI here would violate the SLA and the determinism the regulator expects. - No AI authoring of audit rows. Every
consent.auditrow is produced from deterministic state changes; AI does not draft or summarise audit content. - No PII leaves Afghanistan. Per ADR-0004 §3 and the consent residency invariant. Even the on-cluster
compliance-airuns in Afghan regions; no model weights are pulled at runtime from offshore.
2. AI use case A — STOP-keyword variant suggestion (offline batch, advisory)
2.1 Purpose
Help Trust & Safety admins discover new STOP keyword variants as they appear in the wild — slang, dialect shifts, transliterations, ZWJ-bracketed obfuscations. Output is a suggestion list for human review (CONS-US-009 §1 admin-driven catalog). The model never auto-adds keywords.
2.2 Topology
- Runs as a scheduled Kubernetes CronJob
consent-keyword-suggester, weekly at Sunday 04:00 Asia/Kabul. - Reads
consent.false_positive_feedbackandconsent.auditrows ofSTOP_MO_RECEIVEDevents from the past 30 days wheretenantsRevokedis empty (= no match in the catalog yet but a STOP was attempted on the same MSISDN within ±60 s of a successful STOP). - The model is the same on-cluster
compliance-aideployment thatcompliance-engineuses (vLLM serving Llama-3.1-8B-Instruct-AWQ; see compliance-engine AI_INTEGRATION §3). No new infra.
2.3 Input redaction
Before any token leaves consent-ledger-service for the LLM:
| Pattern | Replacement |
|---|---|
| MSISDN | [PHONE] |
| Tenant identifier (sender ID) | [SENDER] |
| Numeric sequences (≥ 5 digits — likely OTP) | [NUMERIC] |
| URLs | [URL] |
| Names matching the curated PS/DR/AR/EN name list | [NAME] |
The redactor is a strict allow-by-default-deny pipeline in services/consent-ledger-service/src/ai/redactor.ts; an ESLint rule forbids calling the LLM client with raw input.
2.4 Prompt (single-turn, JSON-constrained output)
System:
You are a SMS opt-out keyword variant detector for a national SMS gateway in
Afghanistan. The gateway recognises these keywords (per language) as opt-out
signals: <CATALOG_DUMP>. Given a list of recently received SMS bodies that did
NOT match the catalog but were sent by subscribers who later issued a confirmed
STOP, propose up to 10 candidate keywords per language that should be added.
For each candidate, return:
- keyword (NFKC-normalised, lowercase)
- language (EN | DR | PS | AR)
- evidence_count (number of inputs that contained it)
- example_redacted (one example body with PII redacted)
- confidence (0.0 - 1.0)
Reply with ONLY the JSON object {candidates: [...]}, no explanation.
User:
<REDACTED INPUTS, 1 PER LINE>
vLLM grammar-constrained decoding enforces the JSON shape. A response that does not parse is dropped with a metric consent_ai_keyword_suggester_parse_error_total.
2.5 HITL (human-in-the-loop)
- Output written to
consent_keyword_suggestions(a non-DDL workshop table, not part ofconsentschema). - A daily Slack/Email digest to T&S leads lists the top 20 candidates.
- T&S admin reviews each suggestion in the admin dashboard. Approval triggers
POST /v1/admin/consent/stop-keywordswith attributionaddedBy = AI_SUGGESTED_REVIEWED_BY:{userId}. - Rejected candidates feed back as negative examples for the next run.
No keyword is ever added to the catalog without explicit human approval. AI's role is candidate generation; the human is the decision authority.
3. AI use case B — Multi-language NLU enhancement (deferred to Phase 2)
3.1 Purpose (deferred)
Recognise free-text natural-language opt-outs that current keyword matching misses (e.g., "stop sending me messages please" or its Pashto/Dari/Arabic equivalents).
3.2 Status
Out of scope for v1. The risk of false-positive opt-outs from misclassification is too high to deploy without significant red-team validation. Acceptance bar for Phase 2:
- ≥ 99.5% precision on a Trust & Safety-curated 10,000-message labelled dataset (per language).
- ≥ 95% recall on the same dataset.
- < 50 ms P95 inference latency on the on-cluster LLM (so it could be added to the STOP MO consumer without breaching the 2 s end-to-end SLA).
- Dual-track verification: any AI opt-out is held for 60 s and only commits if no human rescind arrives — gives subscribers a "wait, no, undo" window.
3.3 Architecture (when activated)
If activated in Phase 2, the consumer would:
- Run keyword match first (deterministic, current behaviour). If matched, no AI invocation.
- On no match, send the redacted body to the local LLM with a classification prompt (
{INTENT: STOP|UNSUBSCRIBE|OTHER, confidence}). - If
INTENT == STOP && confidence >= 0.9, place the revoke into a "pending NLU revocation" queue with 60 s defer. - After 60 s with no further MO from the same MSISDN, commit the revocation with
verificationMethod = NLU_AI_REVIEWED(a new method that consumers can treat differently).
This use case is documented here for forward-compatibility; it ships off in v1.
4. AI provenance
When AI is used (case A), every record carries provenance fields:
| Field | Value |
|---|---|
aiProvider | local-vllm |
aiModel | e.g., llama-3.1-8b-instruct-awq |
aiModelVersion | Semantic version + content hash |
aiPromptTemplateVersion | e.g., keyword_suggester.v3 |
aiInferredAt | RFC 3339 UTC |
aiConfidenceScore | 0.0–1.0 |
humanReviewedBy | UUID of the admin who approved |
humanReviewedAt | RFC 3339 UTC |
Provenance is stored on consent.stop_keywords.metadata.ai_provenance (JSON). The audit row for the keyword addition (KEYWORD_CATALOG_CHANGED) embeds the same provenance, so a regulator query can trace any catalog entry back to the model + prompt + reviewer.
5. Moderation policy
The on-cluster LLM is the same model + system prompt used by compliance-engine's content classifier; its safety posture inherits from that service's moderation envelope (no harmful generation, JSON-only output, prompt-injection resistance via constrained decoding).
consent-ledger-service adds these specific moderation guards:
- Output filter: Suggestions matching any platform-default keyword are dropped silently (already in catalog).
- Length cap: Each suggested keyword ≤ 32 chars; longer suggestions dropped.
- Script consistency: Each suggestion's script must match its declared
language(Latin for EN; Arabic-script for DR/PS/AR). Mixed-script suggestions dropped. - Profanity filter: Suggestions matching the platform profanity list are dropped — opt-out keywords should not double as slurs.
- Rate cap: ≤ 50 candidate suggestions per run, regardless of LLM output length.
6. Observability
Metrics (Prometheus) for the keyword-suggester job:
| Metric | Type | Notes |
|---|---|---|
consent_ai_keyword_suggester_runs_total | Counter | Per status (success, failed, skipped_no_input) |
consent_ai_keyword_suggester_candidates_total | Counter | Per language |
consent_ai_keyword_suggester_accepted_total | Counter | After human review |
consent_ai_keyword_suggester_latency_seconds | Histogram | Per run |
consent_ai_keyword_suggester_parse_error_total | Counter | LLM responses that failed JSON parse |
consent_ai_redactor_violations_total | Counter | Inputs that the redactor flagged for raw PII (CRITICAL — should be zero) |
Logs are JSON, redacted; the prompt is logged only with redacted content; the response JSON is logged in full because it does not contain PII (only candidate keywords).
7. Cost & capacity model
The keyword-suggester is one weekly batch run with at most a few thousand input lines and a few KB of output. It uses < 1 GPU-minute per run. No incremental cost over the existing compliance-ai deployment.
8. Future enhancements (post-v1)
| Enhancement | Rationale | Timeline |
|---|---|---|
| Phase 2 NLU opt-out (case B above) | Capture free-text opt-outs not in catalog | 2027 Q1 (subject to red-team validation) |
| Multi-language ack-back personalisation | Adjust ack-back template per dialect | 2027 Q2 |
| Anomaly detection on STOP rates per tenant | Flag surge as potential adversarial / spam-induced storm | 2027 Q1 |
| AI-assisted regulator query summarisation | Take a regulator question and propose the SQL/audit window | 2027 Q3 — only with HITL |
All future enhancements remain bound by:
- No PII to cloud LLM.
- Human approval for any consent-state-affecting decision.
- Deterministic primary path; AI is always advisory.