cdr-mediation-service — Service Risk Register
Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison + Security + SRE Last Updated: 2026-04-21 References: FAILURE_MODES.md, SECURITY_MODEL.md, ADR-0004 §15
Scored 1–5 Likelihood × Impact; residual ≤ Medium required for GA.
1. Risk Summary
| ID | Risk | Category | Likelihood | Impact | Pre-mitigation | Residual | Owner |
|---|---|---|---|---|---|---|---|
| CDR-RISK-01 | ATRA schema changes without notice | Dependency | 3 | 4 | High | Medium | Regulator Liaison |
| CDR-RISK-02 | SFTP credentials rotation mismatched with ATRA | Dependency | 3 | 3 | Medium | Low | SRE + Regulator Liaison |
| CDR-RISK-03 | Hash chain break invalidates regulator claim | Correctness | 2 | 5 | High | Low | Commerce + Security |
| CDR-RISK-04 | S3 bucket misconfiguration loses archived records | Operations | 1 | 5 | Medium | Low | SRE |
| CDR-RISK-05 | Adjustment abuse (voiding legitimate records) | Insider | 2 | 4 | Medium | Low | Security + Commerce |
| CDR-RISK-06 | CDR MSISDN correlation + reverse engineering (privacy) | Privacy | 3 | 3 | Medium | Low | Security + Legal |
| CDR-RISK-07 | Exports delayed during ATRA outage (regulator penalty) | Availability | 3 | 4 | High | Medium | SRE + Regulator Liaison |
| CDR-RISK-08 | Rollup cron failure leaves gap | Ops | 2 | 3 | Medium | Low | SRE |
| CDR-RISK-09 | HSM outage during export window | Dependency | 2 | 3 | Medium | Low | SRE + Security |
| CDR-RISK-10 | NATS consumer lag during traffic surge | Performance | 3 | 2 | Medium | Low | SRE |
| CDR-RISK-11 | Multi-region replication conflict | Correctness | 2 | 2 | Low | Low | Platform Arch |
| CDR-RISK-12 | Regulator changes SFTP → HTTPS (or vice-versa) mid-schedule | Dependency | 2 | 3 | Medium | Medium | Regulator Liaison |
| CDR-RISK-13 | CDR format mismatch with billing events (revenue leakage) | Correctness | 2 | 4 | Medium | Low | Commerce |
| CDR-RISK-14 | ClickHouse analytics tier lag during spike | Performance | 3 | 2 | Medium | Low | Data Eng |
| CDR-RISK-15 | Export-file tampering between HSM sign + ATRA receive | Security | 1 | 5 | Medium | Low | Security |
2. Risk Details
CDR-RISK-01 — ATRA schema changes
ATRA publishes new schema with 30-day notice (or less).
Mitigation. Adapter pattern per regulator schema; Regulator Liaison maintains 90-day rolling forecast; parallel-schema window during transition; schema-adapter versioning.
Residual. Medium.
CDR-RISK-02 — SFTP credentials rotation mismatch
Ghasi rotates SFTP key; ATRA still has old public key cached.
Mitigation. Rotation runbook with 30-day overlap window (both old + new keys active); Regulator Liaison coordinates; monitoring catches first-failure quickly.
Residual. Low.
CDR-RISK-03 — Chain break
Bug or tamper in chain.
Mitigation. Canonicalisation per RFC 8785 + two-implementation cross-check; Postgres trigger rejects UPDATE/DELETE; daily verifier; property-based tests; CISO-paging alert.
Residual. Low.
CDR-RISK-04 — S3 bucket loss
Wrong lifecycle or versioning policy deletes archived CDRs.
Mitigation. Object-lock in governance mode (7 y); versioning on; cross-region replication; weekly policy scan; bucket-policy change dual-control.
Residual. Low.
CDR-RISK-05 — Adjustment abuse
Insider creates void adjustments on legitimate records to falsify regulator data.
Mitigation. Adjustment requires platform-finance role + reason; high-volume adjustments trigger anomaly alert (CdrAdjustmentAnomaly); audit row immutable; monthly audit review by finance + commerce leads.
Residual. Low.
CDR-RISK-06 — MSISDN correlation
CDR contains hashed MSISDN but sophisticated analysis may correlate hashed MSISDN with other data sources.
Mitigation. SHA-256 hash with per-tenant salt (not reversible without salt); data-processing agreements with regulator; Legal-signed DPIA.
Residual. Low.
CDR-RISK-07 — Regulator penalty for delayed export
ATRA requires daily submission; a multi-day outage incurs regulator penalty.
Mitigation. Retry + manual-delivery fallback runbook; Regulator Liaison notified within 6 h of first failure; export queued locally until delivery possible; 36 h SLA commitment.
Residual. Medium.
CDR-RISK-08 — Rollup failure gap
Cron fails; hour missing from aggregates.
Mitigation. CronJob backoffLimit: 3; idempotent rollup (re-run produces same result); alert fires; manual re-trigger.
Residual. Low.
CDR-RISK-09 — HSM outage during export window
HSM unavailable during nightly export window.
Mitigation. HSM HA with regional quorum; export job retries on HSM recovery; 6 h manual-delivery fallback window before regulator SLA breach.
Residual. Low.
CDR-RISK-10 — NATS surge lag
DLR event volume spike exceeds ingest capacity.
Mitigation. HPA on cdr_nats_consumer_lag; scale to 12 replicas; JetStream retains events 7 d.
Residual. Low.
CDR-RISK-11 — Multi-region conflict
Replication conflict on same-row update.
Mitigation. CDR rows are append-only (no conflict possible); adjustments are new rows; no multi-master write path for same key.
Residual. Low.
CDR-RISK-12 — Regulator endpoint switch
ATRA changes transport mid-schedule (e.g., deprecates SFTP for HTTPS).
Mitigation. Adapter abstraction supports both; configurable per destination; Regulator Liaison tracks regulator roadmap.
Residual. Medium.
CDR-RISK-13 — CDR vs. billing mismatch
A CDR is generated but no corresponding billing event (or vice versa).
Mitigation. Revenue-assurance reconciliation job (EP-BILL-09 US-BILL-055) runs daily; flags discrepancies for finance review.
Residual. Low.
CDR-RISK-14 — ClickHouse lag
Analytics mirror falls behind.
Mitigation. Buffer in Redis during lag; alert at > 10 min; ClickHouse replica fail-over.
Residual. Low.
CDR-RISK-15 — Mid-flight file tamper
Attacker intercepts signed file between Ghasi and ATRA and modifies.
Mitigation. Detached PKCS#7 signature; ATRA verifies before processing; TLS 1.3 on HTTPS; SFTP with strong ciphers; per-file manifest with SHA-256 cross-checked at ATRA.
Residual. Low.
3. Residual-Risk Summary
| Residual | Count |
|---|---|
| Low | 12 |
| Medium | 3 |
| High | 0 |
4. Risk Review Cadence
- Weekly during dev (Platform Arch).
- Monthly post-GA (Commerce + Regulator Liaison + SRE + Security).
- Quarterly (CTO + Regulator Liaison for dependency risks).