cdr-mediation-service — Service Readiness
Version: 1.0
Status: Draft
Owner: Commerce + Regulator Liaison + SRE
Last Updated: 2026-04-21
References: SERVICE_OVERVIEW.md, _report.md, FAILURE_MODES.md, TESTING_STRATEGY.md
Production-readiness checklist. The bar emphasises regulator-facing correctness (ATRA handshake + export SLA), hash-chain integrity, HSM availability, and hot→cold archive pipeline.
1. Code Readiness
| Criterion | Status |
|---|
| Ingest (NATS → Postgres) with dedup + idempotency | ☐ |
| Hourly rollup with distributed lock + idempotency | ☐ |
| Daily regulator export (build → HSM sign → SFTP/HTTPS deliver → ACK tracking) | ☐ |
| TAP 3.12 encoder + RAP encoder | ☐ |
| Schema-adapter abstraction (ATRA_TAP_312_V1 + configurable alternates) | ☐ |
| Adjustment (VOID/CORRECT) semantics with audit | ☐ |
| Hash-chain (prev_hash, record_hash) with canonical JSON (RFC 8785) | ☐ |
| Daily chain verifier | ☐ |
| S3 cold archive (13 m hot → S3 with object-lock 7 y) | ☐ |
| ClickHouse mirror (analytics-tier) | ☐ |
| Admin REST (CDR list, rollup status, export status, adjustment create, audit query) | ☐ |
| Chain-break alert + incident playbook | ☐ |
| mTLS on service mesh | ☐ |
| Idempotency on admin writes | ☐ |
2. Testing Readiness
| Criterion | Target | Status |
|---|
| Unit coverage | ≥ 90% line (domain), ≥ 80% branch | ☐ |
| TAP 3.12 + RAP encoder golden tests | ≥ 25 each | ☐ |
| Hash-chain tests | ≥ 15 | ☐ |
| Adjustment semantics tests | ≥ 20 | ☐ |
| Rollup idempotency tests | ≥ 15 | ☐ |
| Property-based tests (fast-check) | ≥ 10 × 500 runs | ☐ |
| Integration: NATS → PG → rollup → S3 → SFTP happy | Passed | ☐ |
| Integration: ATRA mock ACK/reject/timeout | Passed | ☐ |
| Integration: HSM sign with softhsm2 | Passed | ☐ |
| Integration: adjustment + audit append + chain verify | Passed | ☐ |
| Integration: cold-tier archive + restore | Passed | ☐ |
| Contract with dlr-processor + compliance-engine + billing-service | Passed | ☐ |
| E2E: 10 k DLR → daily export → ATRA ACK | Passed | ☐ |
| Chaos: Postgres out → ingest queues; no CDR loss on recovery | Passed | ☐ |
| Chaos: HSM out → exports queue | Passed | ☐ |
| Chaos: ATRA unreachable → retries + manual fallback | Passed | ☐ |
| Chaos: S3 out → hot retention extends; manual intervention | Passed | ☐ |
| Security: chain tamper detected | Passed | ☐ |
| Security: UPDATE/DELETE rejected on CDR + audit | Passed | ☐ |
| Security: signed-file tamper detected at ATRA | Passed | ☐ |
| Load: 500 k DLR/h sustained 1 h, ingest lag < 30 s P99 | Passed | ☐ |
| Load: 10 M-row export builds + delivers within 30 min | Passed | ☐ |
3. Observability Readiness
| Criterion | Status |
|---|
| Prometheus metrics all emitting (OBSERVABILITY §1) | ☐ |
| Grafana dashboard deployed (Commerce + Regulator + SRE rows) | ☐ |
| All alerts configured with runbook links | ☐ |
| Structured JSON logs with MSISDN hashing | ☐ |
| OTel trace propagation verified (dlr → cdr → regulator-portal) | ☐ |
SIEM forwarding of cdr.audit.v1 verified | ☐ |
4. Security Readiness
| Criterion | Status |
|---|
| mTLS on service mesh; SPIRE SVIDs | ☐ |
| NetworkPolicy restricts egress to PG + Redis + NATS + S3 + CH + HSM + ATRA CIDRs | ☐ |
| ATRA SFTP key-auth (no password) | ☐ |
| ATRA HTTPS mTLS if supported by ATRA endpoint | ☐ |
| HSM signing key provisioned; dual-control on key rotation | ☐ |
| Postgres UPDATE/DELETE trigger rejects mutation on CDR + audit | ☐ |
| S3 bucket object-lock 7 y; bucket policy change dual-control | ☐ |
| Signed-file tamper detected at ATRA verified | ☐ |
| Pen test against REST admin | ☐ |
| Security team sign-off | ☐ |
5. Operational Readiness
| Criterion | Status |
|---|
| 3 Deployments (ingest, batch, exporter) reviewed | ☐ |
| HPA on ingest (lag-driven) | ☐ |
| PDB per Deployment | ☐ |
| Rolling update: zero-drop on ingest at 100k DLR/h | ☐ |
| Graceful shutdown: batch finishes current job before restart | ☐ |
| Postgres conn pool sized (pgbouncer) | ☐ |
| Redis conn pool sized | ☐ |
| CronJobs for rollup, archive, verifier, daily export | ☐ |
| ClickHouse mirror operational | ☐ |
| Runbook set (§OBSERVABILITY.md §6) complete | ☐ |
| On-call: Commerce primary; SRE secondary; Regulator Liaison exec | ☐ |
6. Documentation Readiness
All 16 SERVICE_TEMPLATE docs at "Complete". Plus:
7. Compliance / Regulatory Readiness
| Criterion | Status |
|---|
| ATRA MoU for CDR submission | ☐ |
| ATRA schema dry-run passed (T-7d) | ☐ |
| SFTP credentials exchanged with ATRA (dual-control at both ends) | ☐ |
| 7-year retention configured (S3 object-lock governance mode) | ☐ |
| Audit retention (13 m hot + 7 y cold) tested via restore drill | ☐ |
Revenue-assurance reconciliation with billing-service (EP-BILL-09) verified | ☐ |
| Signed-file format approved (PKCS#7 detached signature on ZIP) | ☐ |
8. Go/No-Go Criteria Summary
Production deployment is GO when:
9. Post-Launch Review
Within 30 days:
10. Phased Rollout
| Phase | Duration | Behaviour | Exit criteria |
|---|
| P0 — Pre-migration | 30 d | ATRA engagement; schema-dry-run; SFTP handshake; HSM provisioning | MoU signed; dry-run exports ACK |
| P1 — Shadow | 30 d | Generate CDRs; retain locally; no ATRA export | Schema validated; volume matches forecast |
| P2 — Export Live | 30 d | Daily ATRA exports begin; observation mode | 3 consecutive daily exports ACKed |
| P3 — Full Production | Ongoing | Adjustments live; tenant-facing CDR queries (via analytics); ongoing ATRA partnership | Steady state |
Rollback flags: CDR_EXPORT_ENABLED, CDR_ADJUSTMENT_ENABLED, CDR_CHAIN_VERIFY_FAIL_FAST.