Consent Ledger Service — AI Integration

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SECURITY_MODEL · APPLICATION_LOGIC · SERVICE_RISK_REGISTER

1. Posture: AI is minimal and offline-only

consent-ledger-service is intentionally an AI-light service. Consent decisions, audit integrity, DND mirroring, and STOP-keyword matching are all deterministic and rule-based. AI is used only in two narrowly-scoped offline / advisory roles described below. Neither AI use case is on the hot path; neither AI use case sends MSISDN, MO body, audit content, or any other PII to any cloud LLM or third-party model API.

Non-use guarantees (explicit)

No cloud LLM, ever, for PII-bearing content. The service does not call Anthropic, OpenAI, Google, or any other cloud LLM with subscriber MSISDN, MO body, consent records, audit rows, or false-positive feedback. This is enforced at the egress NetworkPolicy (see DEPLOYMENT_TOPOLOGY §1 NetworkPolicy) which whitelists only Postgres, Redis, NATS, ATRA SFTP, Vault, and the on-cluster compliance-ai LLM service.
No real-time AI in CheckConsent. The hot path is pure SQL + Redis with a 5 ms P95 budget; injecting AI here would violate the SLA and the determinism the regulator expects.
No AI authoring of audit rows. Every consent.audit row is produced from deterministic state changes; AI does not draft or summarise audit content.
No PII leaves Afghanistan. Per ADR-0004 §3 and the consent residency invariant. Even the on-cluster compliance-ai runs in Afghan regions; no model weights are pulled at runtime from offshore.

2. AI use case A — STOP-keyword variant suggestion (offline batch, advisory)

2.1 Purpose

Help Trust & Safety admins discover new STOP keyword variants as they appear in the wild — slang, dialect shifts, transliterations, ZWJ-bracketed obfuscations. Output is a suggestion list for human review (CONS-US-009 §1 admin-driven catalog). The model never auto-adds keywords.

2.2 Topology

Runs as a scheduled Kubernetes CronJob consent-keyword-suggester, weekly at Sunday 04:00 Asia/Kabul.
Reads consent.false_positive_feedback and consent.audit rows of STOP_MO_RECEIVED events from the past 30 days where tenantsRevoked is empty (= no match in the catalog yet but a STOP was attempted on the same MSISDN within ±60 s of a successful STOP).
The model is the same on-cluster compliance-ai deployment that compliance-engine uses (vLLM serving Llama-3.1-8B-Instruct-AWQ; see compliance-engine AI_INTEGRATION §3). No new infra.

2.3 Input redaction

Before any token leaves consent-ledger-service for the LLM:

Pattern	Replacement
MSISDN	`[PHONE]`
Tenant identifier (sender ID)	`[SENDER]`
Numeric sequences (≥ 5 digits — likely OTP)	`[NUMERIC]`
URLs	`[URL]`
Names matching the curated PS/DR/AR/EN name list	`[NAME]`

The redactor is a strict allow-by-default-deny pipeline in services/consent-ledger-service/src/ai/redactor.ts; an ESLint rule forbids calling the LLM client with raw input.

2.4 Prompt (single-turn, JSON-constrained output)

System:
You are a SMS opt-out keyword variant detector for a national SMS gateway in
Afghanistan. The gateway recognises these keywords (per language) as opt-out
signals: <CATALOG_DUMP>. Given a list of recently received SMS bodies that did
NOT match the catalog but were sent by subscribers who later issued a confirmed
STOP, propose up to 10 candidate keywords per language that should be added.

For each candidate, return:
- keyword (NFKC-normalised, lowercase)
- language (EN | DR | PS | AR)
- evidence_count (number of inputs that contained it)
- example_redacted (one example body with PII redacted)
- confidence (0.0 - 1.0)

Reply with ONLY the JSON object {candidates: [...]}, no explanation.

User:
<REDACTED INPUTS, 1 PER LINE>

vLLM grammar-constrained decoding enforces the JSON shape. A response that does not parse is dropped with a metric consent_ai_keyword_suggester_parse_error_total.

2.5 HITL (human-in-the-loop)

Output written to consent_keyword_suggestions (a non-DDL workshop table, not part of consent schema).
A daily Slack/Email digest to T&S leads lists the top 20 candidates.
T&S admin reviews each suggestion in the admin dashboard. Approval triggers POST /v1/admin/consent/stop-keywords with attribution addedBy = AI_SUGGESTED_REVIEWED_BY:{userId}.
Rejected candidates feed back as negative examples for the next run.

No keyword is ever added to the catalog without explicit human approval. AI's role is candidate generation; the human is the decision authority.

3. AI use case B — Multi-language NLU enhancement (deferred to Phase 2)

3.1 Purpose (deferred)

Recognise free-text natural-language opt-outs that current keyword matching misses (e.g., "stop sending me messages please" or its Pashto/Dari/Arabic equivalents).

3.2 Status

Out of scope for v1. The risk of false-positive opt-outs from misclassification is too high to deploy without significant red-team validation. Acceptance bar for Phase 2:

≥ 99.5% precision on a Trust & Safety-curated 10,000-message labelled dataset (per language).
≥ 95% recall on the same dataset.
< 50 ms P95 inference latency on the on-cluster LLM (so it could be added to the STOP MO consumer without breaching the 2 s end-to-end SLA).
Dual-track verification: any AI opt-out is held for 60 s and only commits if no human rescind arrives — gives subscribers a "wait, no, undo" window.

3.3 Architecture (when activated)

If activated in Phase 2, the consumer would:

Run keyword match first (deterministic, current behaviour). If matched, no AI invocation.
On no match, send the redacted body to the local LLM with a classification prompt ({INTENT: STOP|UNSUBSCRIBE|OTHER, confidence}).
If INTENT == STOP && confidence >= 0.9, place the revoke into a "pending NLU revocation" queue with 60 s defer.
After 60 s with no further MO from the same MSISDN, commit the revocation with verificationMethod = NLU_AI_REVIEWED (a new method that consumers can treat differently).

This use case is documented here for forward-compatibility; it ships off in v1.

4. AI provenance

When AI is used (case A), every record carries provenance fields:

Field	Value
`aiProvider`	`local-vllm`
`aiModel`	e.g., `llama-3.1-8b-instruct-awq`
`aiModelVersion`	Semantic version + content hash
`aiPromptTemplateVersion`	e.g., `keyword_suggester.v3`
`aiInferredAt`	RFC 3339 UTC
`aiConfidenceScore`	0.0–1.0
`humanReviewedBy`	UUID of the admin who approved
`humanReviewedAt`	RFC 3339 UTC

Provenance is stored on consent.stop_keywords.metadata.ai_provenance (JSON). The audit row for the keyword addition (KEYWORD_CATALOG_CHANGED) embeds the same provenance, so a regulator query can trace any catalog entry back to the model + prompt + reviewer.

5. Moderation policy

The on-cluster LLM is the same model + system prompt used by compliance-engine's content classifier; its safety posture inherits from that service's moderation envelope (no harmful generation, JSON-only output, prompt-injection resistance via constrained decoding).

consent-ledger-service adds these specific moderation guards:

Output filter: Suggestions matching any platform-default keyword are dropped silently (already in catalog).
Length cap: Each suggested keyword ≤ 32 chars; longer suggestions dropped.
Script consistency: Each suggestion's script must match its declared language (Latin for EN; Arabic-script for DR/PS/AR). Mixed-script suggestions dropped.
Profanity filter: Suggestions matching the platform profanity list are dropped — opt-out keywords should not double as slurs.
Rate cap: ≤ 50 candidate suggestions per run, regardless of LLM output length.

6. Observability

Metrics (Prometheus) for the keyword-suggester job:

Metric	Type	Notes
`consent_ai_keyword_suggester_runs_total`	Counter	Per `status` (`success`, `failed`, `skipped_no_input`)
`consent_ai_keyword_suggester_candidates_total`	Counter	Per `language`
`consent_ai_keyword_suggester_accepted_total`	Counter	After human review
`consent_ai_keyword_suggester_latency_seconds`	Histogram	Per run
`consent_ai_keyword_suggester_parse_error_total`	Counter	LLM responses that failed JSON parse
`consent_ai_redactor_violations_total`	Counter	Inputs that the redactor flagged for raw PII (CRITICAL — should be zero)

Logs are JSON, redacted; the prompt is logged only with redacted content; the response JSON is logged in full because it does not contain PII (only candidate keywords).

7. Cost & capacity model

The keyword-suggester is one weekly batch run with at most a few thousand input lines and a few KB of output. It uses < 1 GPU-minute per run. No incremental cost over the existing compliance-ai deployment.

8. Future enhancements (post-v1)

Enhancement	Rationale	Timeline
Phase 2 NLU opt-out (case B above)	Capture free-text opt-outs not in catalog	2027 Q1 (subject to red-team validation)
Multi-language ack-back personalisation	Adjust ack-back template per dialect	2027 Q2
Anomaly detection on STOP rates per tenant	Flag surge as potential adversarial / spam-induced storm	2027 Q1
AI-assisted regulator query summarisation	Take a regulator question and propose the SQL/audit window	2027 Q3 — only with HITL

All future enhancements remain bound by:

No PII to cloud LLM.
Human approval for any consent-state-affecting decision.
Deterministic primary path; AI is always advisory.

1. Posture: AI is minimal and offline-only​

Non-use guarantees (explicit)​

2. AI use case A — STOP-keyword variant suggestion (offline batch, advisory)​

2.1 Purpose​

2.2 Topology​

2.3 Input redaction​

2.4 Prompt (single-turn, JSON-constrained output)​

2.5 HITL (human-in-the-loop)​

3. AI use case B — Multi-language NLU enhancement (deferred to Phase 2)​

3.1 Purpose (deferred)​

3.2 Status​

3.3 Architecture (when activated)​

4. AI provenance​

5. Moderation policy​

6. Observability​

7. Cost & capacity model​

8. Future enhancements (post-v1)​