Skip to main content

cdr-mediation-service — Migration Plan

Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison + Platform Engineering Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, _report.md, SERVICE_READINESS.md

The service is greenfield. Migration focuses on (1) ATRA regulator handshake, (2) schema validation dry-run, (3) bootstrap of archive pipeline with object-lock S3 buckets, and (4) enabling daily exports only after confidence is established.


1. What Is Migrating

InputSourceVolumeNotes
DLR events (ongoing)sms.dlr.inbound NATS stream~100 M events/month at steady statePrimary ingest
Compliance audit eventscompliance.audit.v1~10 M/monthCompliance-context on CDR
Billing eventsbilling.events.v1Per-message basisBilling cross-reference
ATRA schemaRegulator Liaison engagement~1 schema docTAP 3.12 + any Afghan variants
SFTP credentials for ATRA drop-boxMoU exchange1 keypair per destinationVault-stored
Initial 30-d retrospectivePlatform logsBootstrap datasetFor seed archive

2. Migration Phases

Phase 0 — Pre-migration engagement (30 days)

StepOwnerOutput
ATRA MoU for CDR submissionRegulator Liaison + LegalSigned MoU
ATRA schema-dry-run: exchange 7 days of sample CDRs in proposed formatRegulator Liaison + EngineeringSchema approval / feedback
SFTP credentials exchangedSRE + Regulator LiaisonKeys in Vault
HSM provisioning + key generation for export signingSecurity + SREKey in HSM; dual-control backup
S3 buckets (hot + cold) created with object-lock + cross-region replicationSREBuckets operational
ClickHouse cluster provisionedData EngCDR schema deployed
3 Deployments (ingest + batch + exporter) deployed to stagingSREStaging healthy
Adapter implementations for ATRA schema variants completeEngineeringTests pass against ATRA staging

Phase 1 — Shadow (30 days)

StepOwnerOutput
Ingest NATS streams; generate CDRs; retain in hot tierServiceCDR volume growing
Hourly rollups activeServiceAggregates populated
Daily ClickHouse sync activeServiceAnalytics working
Hash-chain verifier running dailyServiceClean-run log
cdr.* events published (except cdr.exported.v1)ServiceRegulator-portal can query status
Feature flag CDR_EXPORT_ENABLED=falseSRENo ATRA delivery yet

Exit criteria. CDR ingest lag P99 ≤ 30 s for 14 consecutive days; chain verifier 100% clean; ClickHouse lag < 10 min; volume matches forecast.

Phase 2 — Export Live (30 days)

StepOwnerOutput
CDR_EXPORT_ENABLED=true for ATRA SFTP destinationSREDaily exports begin
First live export delivered + ATRA ACK receivedService + Regulator LiaisonConfirmation logged
Monitoring: export ACK SLA (100% within 36 h)SRESLO attainment
Weekly regulator call to verify data qualityRegulator LiaisonIssue tracker
Chain verifier continues dailyServiceContinued clean

Exit criteria. 14 consecutive days of ATRA ACKs within 36 h; zero rejections; data quality sign-off from ATRA.

Phase 3 — Full Production (ongoing)

StepOwnerOutput
Adjustment (VOID/CORRECT) enabledCommerce + FinanceAdmin workflow live
Tenant-facing CDR queries (via analytics-service)ProductSelf-serve analytics
Revenue-assurance reconciliation with billing (EP-BILL-09) liveCommerce + FinanceLeakage alerts
Cold-tier restore drill quarterlySREVerified recovery
ATRA partnership ongoingRegulator LiaisonQuarterly review

Rollback flags.

  • CDR_EXPORT_ENABLED: daily export on/off.
  • CDR_ADJUSTMENT_ENABLED: adjustment workflow on/off.
  • CDR_CHAIN_VERIFY_FAIL_FAST: verifier halts on first break (prod: continue + report).

3. ATRA Handshake (Phase 0 detail)

3.1 Schema dry-run exchange

  1. Ghasi generates 7 days of retrospective CDRs in proposed TAP 3.12 format.
  2. Ghasi signs with HSM; delivers via SFTP to ATRA staging.
  3. ATRA team parses + validates; returns feedback within 14 d.
  4. Ghasi addresses any schema issues + re-submits if needed.
  5. ATRA formally approves schema — this becomes the contracted schema in MoU.

3.2 SFTP exchange

  1. Ghasi generates SSH keypair (Ed25519); private key in Vault.
  2. Ghasi public key shared with ATRA.
  3. ATRA SFTP drop-box created; Ghasi gets upload path.
  4. Test upload + retrieval confirmed by both sides.
  5. Rotation policy: annual; 30-day overlap during rotation.

3.3 HTTPS alternative (future)

  • If ATRA offers HTTPS endpoint, same flow with mTLS client cert.
  • Adapter supports either per destination.
  • Fall-back to SFTP if HTTPS fails.

4. Bootstrap Retrospective CDR

Pre-launch one-shot:

  1. Extract 30 days of sms.dlr.inbound from NATS archive.
  2. Run through CDR encoder in batch mode (not real-time).
  3. Hash-chain the bootstrap rows into a genesis partition.
  4. Archive to S3 cold tier.
  5. ATRA notified that historical 30-d of CDRs may be requested retrospectively (typically not — ATRA expects forward-only submission).

Bootstrap is audit-tagged source=BOOTSTRAP_RETROSPECTIVE_30D.


5. Downstream Consumer Migration

ConsumerChangeTiming
regulator-portal-serviceConsume cdr.exported.v1 for submission-status panelPhase 1
billing-serviceRevenue-assurance reconciliation via cdr.generated.v1Phase 2
analytics-serviceClickHouse CDR mirror for long-range queriesPhase 1
admin-dashboardCDR admin UI (list, rollup status, export status, adjustments)Phase 1

6. Success Metrics for Migration

MetricTargetMeasurement
Phase 0 ATRA MoU signedYesContract
Phase 0 schema dry-run approvedYesATRA feedback
Phase 1 ingest lag P99≤ 30 sPrometheus
Phase 1 chain-verifier breaks0Daily
Phase 2 export ACK rate100% within 36 hPer-export log
Phase 3 adjustment rate< 2% of original CDRsMonthly
Phase 3 rev-assurance discrepancy< 0.1%Daily reconciliation

7. Rollback Plan

7.1 During Phase 1 (Shadow)

  • No rollback needed. Export stays off by default.

7.2 During Phase 2 (Export Live)

  • CDR_EXPORT_ENABLED=false stops new ATRA submissions.
  • In-flight exports complete their retry cycle.
  • Regulator Liaison notified immediately.

7.3 During Phase 3 (Full Production)

  • CDR_ADJUSTMENT_ENABLED=false stops new adjustments; existing persist.
  • Export continues or pauses per Phase 2 rollback.

7.4 Catastrophic (chain break detected)

  • Quarantine affected partition.
  • Notify regulator within 24 h.
  • Investigate root cause.
  • Resume exports with new chain partition + audit row documenting the incident.

8. Dependencies

  • ATRA MoU (blocker for Phase 2).
  • HSM operational (blocker for export).
  • S3 with object-lock (blocker for archive).
  • ClickHouse cluster (blocker for analytics mirror).
  • sms.dlr.inbound NATS stream operational (blocker for ingest).
  • billing-service EP-BILL-09 (blocker for revenue-assurance, Phase 3).
  • regulator-portal-service EP-REG-01 (blocker for regulator view, Phase 1).

9. Post-Launch Refinement

Within 90 days of Phase 3:

  • Regulator feedback loop: quarterly data-quality review with ATRA.
  • Tune rollup windows based on observed query patterns.
  • Optimise S3 archive granularity (hourly files) based on restore frequency.
  • ClickHouse query SLA refinement per regulator-portal long-range queries.
  • Adjustment playbook refinement based on real-world cases.