CDR Mediation Service — Sync Contract
Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison Last Updated: 2026-04-21 Companion: DATA_MODEL · EVENT_SCHEMAS · FAILURE_MODES · DEPLOYMENT_TOPOLOGY
This document defines:
- Per-aggregate conflict-resolution policy (what happens when two writes race).
- Multi-region synchronisation — hot region vs DR, regional divergence rules.
- The outbox-to-NATS publication contract.
- The ATRA schema negotiation lifecycle.
- Backwards-compatibility guarantees for the REST API and NATS event payloads.
The service is designed around one core simplification: every domain object is append-only. Conflict resolution collapses to an ordering problem, not a merging problem.
1. Per-aggregate conflict policy
| Aggregate | Policy | Rationale |
|---|---|---|
CdrRecord | append_only | Originals immutable; corrections are new rows keyed by (messageId, adjustmentOf) |
CdrRollup | append_only | One row per (bucketHour, operatorId); computed once by leader |
CdrChainEntry (chain inside CdrRollup) | single_writer_leader | Only the leader's scheduled UC-03 computes chain hashes |
CdrExport | single_writer_leader | File generation is leader-only; fileSequenceNumber monotonic |
ExportDelivery | append_only | Each attempt logged; latest status derived from MAX(started_at) |
CdrAdjustment | append_only | Adjustments can themselves be adjusted — chained via adjustmentOf |
CdrAuditEntry | append_only | Pure audit; UPDATE/DELETE rejected at DB layer |
TapSequence (per recording_entity) | single_writer_serial | UPDATE ... RETURNING in serialised transaction; collisions impossible by construction |
TransparencyAnchor | append_only | One per (day, operatorId); once submitted, never revoked |
RegulatorSchema (activation) | server_authoritative + four_eyes | Exactly one active variant per exportType |
MsisdnVault | append_only | Per-CDR encrypted MSISDN; never mutated |
2. Multi-region synchronisation
Per ADR-0004 §14, the platform runs hot in Kabul with warm-standby in Mazar (both Afghanistan) and async DR to dxb (Dubai sovereign cloud).
2.1 Writes
- Regional locality. CDRs are written in the region where the terminating DLR landed (usually Kabul for domestic, dxb for international MT).
- Postgres replication. Intra-region: synchronous Patroni replicas. Inter-region: logical replication
kabul → mazar → dxb; dxb is read-only fail-over only (never promoted to primary without Commerce-director approval). - Leader election is region-scoped: each region elects its own export leader via
pg_advisory_lock. Exports are produced from the region that owns the data for that settlement day.
2.2 Reads
admin-dashboardandregulator-portal-serviceread from the region-local primary.- Cross-region reads are served by the DR read-replica with a
region=dxbheader and are informational only — verification APIs (POST /v1/cdr/chain/verify) always execute on the writing region to guarantee up-to-date chain state.
2.3 Multi-region conflicts
Because every CDR aggregate is append-only and keyed on (bucketHour, cdrId) with a region-scoped operatorId, two regions cannot produce conflicting rows for the same CDR. The scenarios that could produce divergence are:
| Scenario | Behaviour |
|---|---|
| Same DLR delivered to two regional consumers simultaneously (NATS mirror) | ux_records_source_event unique index rejects the second insert; NATS ACKs both to avoid redelivery loops |
| Kabul writes a CDR; Mazar's logical replica has a 60 s lag when a query arrives | Query returns stale data; client may retry. No chain divergence because Mazar does not write. |
| DR failover to dxb before Kabul fully drained | Re-sync procedure: promote dxb, verify chain walk via UC-12, publish cdr.audit.v1 entryType=CHAIN_VERIFY_OK. Any bucket on Kabul not yet shipped to dxb is reconstructed from S3 (UC-05 archive is the authoritative record). |
2.4 S3 cross-region replication
- Hot-write region (Kabul) writes to
s3://ghasi-cdr-archive-kabul/…with Object-Lock Compliance. - DR replicates to
s3://ghasi-cdr-archive-dxb/…via asyncBucketReplicationwith retention preserved (replica also Object-Locked). - Replication lag target: ≤ 15 min P95, ≤ 1 h P99. Breach → SEV2
CdrReplicationLagHigh. - The DR bucket is read-only until a confirmed failover runbook is executed; write access requires break-glass credentials.
3. Outbox to NATS publication contract
Per the outbox pattern (see EVENT_SCHEMAS §6):
- Aggregate state change + outbox row are a single Postgres transaction.
- A single leader-elected relay process polls
SELECT event_id, subject, partition_key, payload FROM cdr.outbox WHERE published_at IS NULL ORDER BY created_at LIMIT 1000every 250 ms. - For each row, publish to NATS JetStream with
Nats-Msg-Id = event_id(JetStream dedup),Nats-Partition-Key = operatorId, body = JSON payload. - On ACK from JetStream,
UPDATE cdr.outbox SET published_at = now() WHERE event_id = $1. - Rows unpublished > 7 days → SEV1
CdrOutboxStalled. Rows unpublished > 30 days → Legal notification (regulator evidence at risk).
Ordering guarantee. Within a given partitionKey (= operatorId), events are delivered to consumers in the order they were committed to Postgres. Cross-operator ordering is not guaranteed.
Exactly-once delivery. The combination of (outbox transaction, JetStream Msg-Id dedup, consumer durable AckExplicit) yields effectively-exactly-once semantics. Consumers still implement idempotency on eventId.
4. ATRA schema negotiation lifecycle
ATRA may evolve its TAP/RAP schema. The service handles this with a pluggable adapter pattern:
┌─────────────────────────────────────────────────────────────────┐
│ Phase A — ATRA announces new variant (e.g. atra-tap-v3) │
│ Commerce + Regulator Liaison receive schema docs. │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase B — Developer ships adapter class + ASN.1 module │
│ `AtraTapV3Adapter` + config/tap-schemas/atra-tap-v3.asn1│
│ cdr.regulator_schemas row INSERTed with active=FALSE. │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase C — Dry-run validation │
│ GET /v1/cdr/regulator-schemas/atra-tap-v3/validate │
│ encodes 100 sample CDRs; returns per-row success. │
│ ATRA runs cross-check against their ingestion tool. │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase D — Shadow generation (optional, 3 days) │
│ Generate atra-tap-v3 files alongside atra-tap-v2, │
│ upload to ATRA stage URL, verify ACK. │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase E — Cutover │
│ POST /v1/cdr/regulator-schemas/atra-tap-v3:activate │
│ (four-eyes required). Sets active=TRUE for TAP_3_12. │
│ Old variant rows retained; next TAP encoder run uses v3.│
│ Publishes cdr.config.changed.v1 and cdr.audit.v1. │
└─────────────────────────────────────────────────────────────────┘
Reversibility. Activation is reversible — re-activating the previous variant is allowed for the next encoder run. Files already delivered under the new variant remain authoritative; no retroactive re-encoding.
Pin at generation time, not CDR-creation time. When UC-07 runs for settlement day D-1, it pins the schema variant value as of job start. This prevents a mid-job activation from producing inconsistent output.
5. Schema stability guarantees
5.1 REST API
| Surface | Stability |
|---|---|
/v1/cdr/* path | Additive changes non-breaking — new endpoints, new optional fields |
Response envelope (items, nextCursor, total) | Stable |
Error envelope (error.code, error.message, error.details) | Stable |
schemaVariant parameter in export trigger body | Independent of REST version; drives TAP/RAP emission only |
| New required fields | Breaking → /v2/cdr/*, 90-day overlap |
5.2 NATS events
| Attribute | Stability |
|---|---|
Subjects (cdr.*.v1) | Stable within major version |
| Additive optional fields | Non-breaking |
Enum expansion (e.g., new adjustmentType) | Non-breaking — consumers must treat unknowns as UNKNOWN |
| Required field removal | Breaking → bump to .v2 subject, co-publish 90 days |
partitionKey NATS header semantics | Stable — always operatorId |
5.3 File format (TAP/RAP)
| Attribute | Stability |
|---|---|
| File-name pattern | Stable (variant defines sequence format) |
| Signature sidecar format | Stable; new signers via KeyId addition |
| ASN.1 module | Managed per variant — variant swap is the evolution mechanism |
6. Idempotency guarantees
| Operation | Idempotency key | Behaviour on replay |
|---|---|---|
| DLR → CDR projection (UC-01) | source_event_id (NATS event id) | Second insert rejected by unique index; NATS ACKs both |
| Hourly rollup seal (UC-03) | (bucketHour, operatorId) | Unique constraint; retry finds existing row and ACKs |
| TAP file generation (UC-07) | (recordingEntity, exportType, settlementDay) | Second run with X-Force-Regen: true required; else 409 ALREADY_GENERATED |
| Adjustment issue (UC-06) | Idempotency-Key header (24 h retention) | Replay returns original response |
| Bulk re-rate enqueue | Idempotency-Key | — |
| Export redrop | exportId | Second redrop within 60 s is rejected as REDROP_COOLDOWN |
| Outbox publication | event_id == NATS Msg-Id | JetStream dedup window 5 min; subsequent redelivery is a no-op |
7. Failure-mode interactions with sync contract
See FAILURE_MODES for full catalog. Sync-relevant modes:
- FM-01 NATS consumer lag → CDR generation behind; outbox relay continues but publication to downstream consumers (regulator-portal-service, analytics-service) may lag.
- FM-10 Schema-version mismatch during ATRA schema change window → exports pause until cutover completes; inbound generation unaffected.
- FM-11 Multi-region replication conflict → resolved by append-only policy; any divergence reconciled from the S3 archive (authoritative).
- FM-13 Duplicate CDR from NATS redelivery → unique index on
source_event_idrejects second insert.
8. Cross-References
- DATA_MODEL.md §1 + §3 — append-only DB constraints
- EVENT_SCHEMAS.md §6 — outbox pattern
- APPLICATION_LOGIC.md §20 — concurrency & leadership
- ADR-0004 §14 + §15
- compliance-engine/SYNC_CONTRACT.md — gRPC hot-path contract pattern (not adopted here)
- consent-ledger-service/SYNC_CONTRACT.md — sibling append-only audit model
End of SYNC_CONTRACT.md