Skip to main content

cdr-mediation-service — Service Risk Register

Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison + Security + SRE Last Updated: 2026-04-21 References: FAILURE_MODES.md, SECURITY_MODEL.md, ADR-0004 §15

Scored 1–5 Likelihood × Impact; residual ≤ Medium required for GA.


1. Risk Summary

IDRiskCategoryLikelihoodImpactPre-mitigationResidualOwner
CDR-RISK-01ATRA schema changes without noticeDependency34HighMediumRegulator Liaison
CDR-RISK-02SFTP credentials rotation mismatched with ATRADependency33MediumLowSRE + Regulator Liaison
CDR-RISK-03Hash chain break invalidates regulator claimCorrectness25HighLowCommerce + Security
CDR-RISK-04S3 bucket misconfiguration loses archived recordsOperations15MediumLowSRE
CDR-RISK-05Adjustment abuse (voiding legitimate records)Insider24MediumLowSecurity + Commerce
CDR-RISK-06CDR MSISDN correlation + reverse engineering (privacy)Privacy33MediumLowSecurity + Legal
CDR-RISK-07Exports delayed during ATRA outage (regulator penalty)Availability34HighMediumSRE + Regulator Liaison
CDR-RISK-08Rollup cron failure leaves gapOps23MediumLowSRE
CDR-RISK-09HSM outage during export windowDependency23MediumLowSRE + Security
CDR-RISK-10NATS consumer lag during traffic surgePerformance32MediumLowSRE
CDR-RISK-11Multi-region replication conflictCorrectness22LowLowPlatform Arch
CDR-RISK-12Regulator changes SFTP → HTTPS (or vice-versa) mid-scheduleDependency23MediumMediumRegulator Liaison
CDR-RISK-13CDR format mismatch with billing events (revenue leakage)Correctness24MediumLowCommerce
CDR-RISK-14ClickHouse analytics tier lag during spikePerformance32MediumLowData Eng
CDR-RISK-15Export-file tampering between HSM sign + ATRA receiveSecurity15MediumLowSecurity

2. Risk Details

CDR-RISK-01 — ATRA schema changes

ATRA publishes new schema with 30-day notice (or less).

Mitigation. Adapter pattern per regulator schema; Regulator Liaison maintains 90-day rolling forecast; parallel-schema window during transition; schema-adapter versioning.

Residual. Medium.


CDR-RISK-02 — SFTP credentials rotation mismatch

Ghasi rotates SFTP key; ATRA still has old public key cached.

Mitigation. Rotation runbook with 30-day overlap window (both old + new keys active); Regulator Liaison coordinates; monitoring catches first-failure quickly.

Residual. Low.


CDR-RISK-03 — Chain break

Bug or tamper in chain.

Mitigation. Canonicalisation per RFC 8785 + two-implementation cross-check; Postgres trigger rejects UPDATE/DELETE; daily verifier; property-based tests; CISO-paging alert.

Residual. Low.


CDR-RISK-04 — S3 bucket loss

Wrong lifecycle or versioning policy deletes archived CDRs.

Mitigation. Object-lock in governance mode (7 y); versioning on; cross-region replication; weekly policy scan; bucket-policy change dual-control.

Residual. Low.


CDR-RISK-05 — Adjustment abuse

Insider creates void adjustments on legitimate records to falsify regulator data.

Mitigation. Adjustment requires platform-finance role + reason; high-volume adjustments trigger anomaly alert (CdrAdjustmentAnomaly); audit row immutable; monthly audit review by finance + commerce leads.

Residual. Low.


CDR-RISK-06 — MSISDN correlation

CDR contains hashed MSISDN but sophisticated analysis may correlate hashed MSISDN with other data sources.

Mitigation. SHA-256 hash with per-tenant salt (not reversible without salt); data-processing agreements with regulator; Legal-signed DPIA.

Residual. Low.


CDR-RISK-07 — Regulator penalty for delayed export

ATRA requires daily submission; a multi-day outage incurs regulator penalty.

Mitigation. Retry + manual-delivery fallback runbook; Regulator Liaison notified within 6 h of first failure; export queued locally until delivery possible; 36 h SLA commitment.

Residual. Medium.


CDR-RISK-08 — Rollup failure gap

Cron fails; hour missing from aggregates.

Mitigation. CronJob backoffLimit: 3; idempotent rollup (re-run produces same result); alert fires; manual re-trigger.

Residual. Low.


CDR-RISK-09 — HSM outage during export window

HSM unavailable during nightly export window.

Mitigation. HSM HA with regional quorum; export job retries on HSM recovery; 6 h manual-delivery fallback window before regulator SLA breach.

Residual. Low.


CDR-RISK-10 — NATS surge lag

DLR event volume spike exceeds ingest capacity.

Mitigation. HPA on cdr_nats_consumer_lag; scale to 12 replicas; JetStream retains events 7 d.

Residual. Low.


CDR-RISK-11 — Multi-region conflict

Replication conflict on same-row update.

Mitigation. CDR rows are append-only (no conflict possible); adjustments are new rows; no multi-master write path for same key.

Residual. Low.


CDR-RISK-12 — Regulator endpoint switch

ATRA changes transport mid-schedule (e.g., deprecates SFTP for HTTPS).

Mitigation. Adapter abstraction supports both; configurable per destination; Regulator Liaison tracks regulator roadmap.

Residual. Medium.


CDR-RISK-13 — CDR vs. billing mismatch

A CDR is generated but no corresponding billing event (or vice versa).

Mitigation. Revenue-assurance reconciliation job (EP-BILL-09 US-BILL-055) runs daily; flags discrepancies for finance review.

Residual. Low.


CDR-RISK-14 — ClickHouse lag

Analytics mirror falls behind.

Mitigation. Buffer in Redis during lag; alert at > 10 min; ClickHouse replica fail-over.

Residual. Low.


CDR-RISK-15 — Mid-flight file tamper

Attacker intercepts signed file between Ghasi and ATRA and modifies.

Mitigation. Detached PKCS#7 signature; ATRA verifies before processing; TLS 1.3 on HTTPS; SFTP with strong ciphers; per-file manifest with SHA-256 cross-checked at ATRA.

Residual. Low.


3. Residual-Risk Summary

ResidualCount
Low12
Medium3
High0

4. Risk Review Cadence

  • Weekly during dev (Platform Arch).
  • Monthly post-GA (Commerce + Regulator Liaison + SRE + Security).
  • Quarterly (CTO + Regulator Liaison for dependency risks).