CDR Mediation Service — AI Integration

Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · SECURITY_MODEL · TESTING_STRATEGY

1. Position statement — AI-minimal by design

The cdr-mediation-service is intentionally AI-free at launch and on the hot path for the foreseeable future. Two principles drive this:

Byte-deterministic evidence. Every CDR field, every hash chain link, every TAP/RAP byte must be exactly reproducible by an auditor holding the original DLR stream + pricing snapshot + operator registry state. AI inference in the generation path would make reproduction impossible, which is incompatible with ATRA's regulator-grade evidence requirement (see SERVICE_OVERVIEW §11 Key Design Decisions).
No customer content exposure. CDRs are metadata — no SMS message body is ever present. Consequently there is no content-classification problem for AI to solve on the critical path.

This document therefore describes what is explicitly not done today, and a minimal Phase 2 enhancement that is additive, advisory, and kept strictly off the generation path.

2. AI surface-area matrix

Surface	Phase 1 (launch)	Phase 2 (post-GA, optional)	Rationale
DLR → CDR projection (UC-01)	No AI	No AI	Deterministic field mapping only
Pricing snapshot lookup	No AI	No AI	Sourced from `billing-service` authoritative tables
Hourly rollup + hash chain	No AI	No AI	Pure SHA-256 / Merkle math
TAP/RAP encoding	No AI	No AI	ASN.1 schema-driven
Signature generation (HSM)	No AI	No AI	Ed25519 primitive
Adjustment content	No AI	No AI	Operator-initiated, ticket-linked
Export anomaly detection (advisory)	Not deployed	Optional — flag unusual file size, duration, row count vs baseline	Advisory only; can delay human review but never alters file content
Fraud-pattern sniff → `fraud-intel-service`	Not deployed	Optional — forward aggregated rollup stats to `fraud-intel-service` for anomaly detection	Runs in `fraud-intel-service`, not here
Regulator-portal NL query assistance	Not deployed	N/A — lives in `regulator-portal-service`	Outside this service

Launch posture: AI is turned off entirely. No LLM endpoints are configured, no ML models are loaded, no inference workers run. The AI_ENABLED=false environment flag is a start-up guard — pod refuses to boot if ever set to true in production without a security-approved model registry entry (see §6).

3. Phase 2 — Export anomaly detection (post-GA roadmap)

Goal. Give on-call operators an early warning when the next daily TAP file will be "unusual" in size or duration — e.g., a 10× spike in row count, or a file taking 8× the P95 encoding time. Anomalies may indicate a regulator schema drift, an operator-registry misconfiguration, or a DLR ingest flood.

Non-goal. The detector never modifies file content, holds exports, or issues adjustments. Its only output is a Prometheus annotation and an entry in cdr.audit with entryType=SCHEMA_CHANGED (reclassified ANOMALY_ADVISORY once added as enum).

Topology.

┌───────────────────────┐       ┌───────────────────────┐
│  cdr.exported.v1      │       │  anomaly-detector     │
│  (NATS)               │──────►│  (batch worker,       │
│                       │       │   runs post-UC-07)    │
└───────────────────────┘       └─────────┬─────────────┘
                                          │
                                          ▼
                                ┌───────────────────────┐
                                │  Prometheus +         │
                                │  cdr.audit row        │
                                │  (advisory only)      │
                                └───────────────────────┘

Model. Simple statistical baseline — rolling 14-day median + IQR per (operatorId, exportType, dayOfWeek). No deep learning required. Can be implemented as a Python worker or inline SQL query; zero external dependencies. MAD (Median Absolute Deviation) threshold of 4 flags an anomaly.

If statistical baseline proves insufficient (e.g., platform volume grows non-linearly), a supervised model trained on labelled historical exports is a candidate. Training data is exclusively platform-internal aggregate counts; no PII, no message bodies, no MSISDN.

Explicit constraints.

Training data locality. All training data stays on-platform. No data is sent to any cloud AI provider.
No LLMs. The detector is numeric / statistical. There is no natural-language component and no prompt.
No AI in the verify path. POST /v1/cdr/chain/verify is pure SHA-256 / Merkle math; no ML component touches verification.

4. AI-Provenance record (reserved for Phase 2)

Should anomaly detection or any future AI-adjacent feature be activated, every AI-sourced signal carries an AIProvenance record attached to the cdr.audit entry:

{
  "provenanceId":       "aud_01HZ...",
  "modelId":            "cdr-anomaly-stat-v1",
  "modelVersion":       "1.3.0",
  "modelSha256":        "ba3f...",
  "inputFeaturesHash":  "f09a...",
  "inferenceLatencyMs": 38,
  "confidence":         0.82,
  "reviewerUserId":     null,
  "reviewedAt":         null,
  "actionTaken":        "ADVISORY_ONLY",
  "modelArtifactUri":   "s3://ghasi-model-registry/cdr-anomaly/v1.3.0/model.pkl"
}

No AI signal is ever converted into an authoritative CDR mutation — the actionTaken vocabulary is restricted to ADVISORY_ONLY, NOTIFIED_ONCALL, SUGGESTED_TICKET. Mutations remain operator-initiated.

5. Human-in-the-loop (HITL)

When Phase 2 anomaly detection is active:

Detector emits an advisory; cdr.audit row recorded with provenance.
Prometheus cdr_export_anomaly_flag counter increments (labelled by operator and type).
Grafana panel shows the anomaly in the on-call dashboard (see OBSERVABILITY §5).
On-call engineer decides whether to investigate, add a ticket, or clear.
The advisory does not block the export. The file is signed and delivered per normal UC-10 / UC-11.

There is no automated action on AI output. All mitigations are human-initiated.

6. Moderation, training data, and sub-processors

No generative AI. No LLM endpoint is called from this service.
No user-content classification. CDRs contain metadata only; content classification is owned by compliance-engine on the outbound message path.
No external sub-processors for AI. The service's model registry (s3://ghasi-model-registry/) is on-platform, Vault-encrypted, and subject to the same audit-log regime as code deployments.
No personal data in training. Any Phase 2 statistical baseline uses only aggregate counts (recordCount, byteSize, encodeDurationMs). No MSISDN, no sender ID, no tenant identity.

7. Future enhancements (2027+)

Enhancement	Rationale	Blocking factors
Per-operator fraud-pattern detection fed from CDR rollups	Strengthens `fraud-intel-service` signal set	Model training data governance; ATRA acceptance of AI-generated fraud flags
Regulator-facing NL query helper (in regulator-portal, not here)	Lets regulator auditors ask "show me CDRs for MSISDN X on date Y" in plain Dari/Pashto	Lives in `regulator-portal-service`; cdr-mediation only exposes structured REST
Automatic schema drift detection on ATRA responses	Detect when ATRA silently changes a field expectation	Requires labelled corpus of ATRA rejection responses

8. Security & compliance boundaries

The service's K8s NetworkPolicy explicitly does not whitelist any external LLM egress (see DEPLOYMENT_TOPOLOGY §3).
If Phase 2 models are loaded, they are read from s3://ghasi-model-registry/ only — Vault-signed, SHA-256-pinned per environment.
AI activation in production requires: (a) Security review of the model card; (b) Commerce + Regulator Liaison sign-off; (c) cdr.audit row with entryType=SCHEMA_CHANGED capturing the activation.

9. Cross-References

SERVICE_OVERVIEW.md §11 — design decisions including no-AI on generation path
SECURITY_MODEL.md — network egress + data residency
OBSERVABILITY.md §5 — Grafana panel for anomaly flag
TESTING_STRATEGY.md — determinism tests that AI cannot be silently added without failing
compliance-engine/AI_INTEGRATION.md — contrast service where AI is on the critical path
fraud-intel-service — downstream consumer of optional CDR aggregate statistics
docs/policies/ai-usage-policy.md — platform-wide AI governance

End of AI_INTEGRATION.md

1. Position statement — AI-minimal by design​

2. AI surface-area matrix​

3. Phase 2 — Export anomaly detection (post-GA roadmap)​

4. AI-Provenance record (reserved for Phase 2)​

5. Human-in-the-loop (HITL)​

6. Moderation, training data, and sub-processors​

7. Future enhancements (2027+)​

8. Security & compliance boundaries​

9. Cross-References​