Skip to main content

CDR Mediation Service — Sync Contract

Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison Last Updated: 2026-04-21 Companion: DATA_MODEL · EVENT_SCHEMAS · FAILURE_MODES · DEPLOYMENT_TOPOLOGY

This document defines:

  1. Per-aggregate conflict-resolution policy (what happens when two writes race).
  2. Multi-region synchronisation — hot region vs DR, regional divergence rules.
  3. The outbox-to-NATS publication contract.
  4. The ATRA schema negotiation lifecycle.
  5. Backwards-compatibility guarantees for the REST API and NATS event payloads.

The service is designed around one core simplification: every domain object is append-only. Conflict resolution collapses to an ordering problem, not a merging problem.


1. Per-aggregate conflict policy

AggregatePolicyRationale
CdrRecordappend_onlyOriginals immutable; corrections are new rows keyed by (messageId, adjustmentOf)
CdrRollupappend_onlyOne row per (bucketHour, operatorId); computed once by leader
CdrChainEntry (chain inside CdrRollup)single_writer_leaderOnly the leader's scheduled UC-03 computes chain hashes
CdrExportsingle_writer_leaderFile generation is leader-only; fileSequenceNumber monotonic
ExportDeliveryappend_onlyEach attempt logged; latest status derived from MAX(started_at)
CdrAdjustmentappend_onlyAdjustments can themselves be adjusted — chained via adjustmentOf
CdrAuditEntryappend_onlyPure audit; UPDATE/DELETE rejected at DB layer
TapSequence (per recording_entity)single_writer_serialUPDATE ... RETURNING in serialised transaction; collisions impossible by construction
TransparencyAnchorappend_onlyOne per (day, operatorId); once submitted, never revoked
RegulatorSchema (activation)server_authoritative + four_eyesExactly one active variant per exportType
MsisdnVaultappend_onlyPer-CDR encrypted MSISDN; never mutated

2. Multi-region synchronisation

Per ADR-0004 §14, the platform runs hot in Kabul with warm-standby in Mazar (both Afghanistan) and async DR to dxb (Dubai sovereign cloud).

2.1 Writes

  • Regional locality. CDRs are written in the region where the terminating DLR landed (usually Kabul for domestic, dxb for international MT).
  • Postgres replication. Intra-region: synchronous Patroni replicas. Inter-region: logical replication kabul → mazar → dxb; dxb is read-only fail-over only (never promoted to primary without Commerce-director approval).
  • Leader election is region-scoped: each region elects its own export leader via pg_advisory_lock. Exports are produced from the region that owns the data for that settlement day.

2.2 Reads

  • admin-dashboard and regulator-portal-service read from the region-local primary.
  • Cross-region reads are served by the DR read-replica with a region=dxb header and are informational only — verification APIs (POST /v1/cdr/chain/verify) always execute on the writing region to guarantee up-to-date chain state.

2.3 Multi-region conflicts

Because every CDR aggregate is append-only and keyed on (bucketHour, cdrId) with a region-scoped operatorId, two regions cannot produce conflicting rows for the same CDR. The scenarios that could produce divergence are:

ScenarioBehaviour
Same DLR delivered to two regional consumers simultaneously (NATS mirror)ux_records_source_event unique index rejects the second insert; NATS ACKs both to avoid redelivery loops
Kabul writes a CDR; Mazar's logical replica has a 60 s lag when a query arrivesQuery returns stale data; client may retry. No chain divergence because Mazar does not write.
DR failover to dxb before Kabul fully drainedRe-sync procedure: promote dxb, verify chain walk via UC-12, publish cdr.audit.v1 entryType=CHAIN_VERIFY_OK. Any bucket on Kabul not yet shipped to dxb is reconstructed from S3 (UC-05 archive is the authoritative record).

2.4 S3 cross-region replication

  • Hot-write region (Kabul) writes to s3://ghasi-cdr-archive-kabul/… with Object-Lock Compliance.
  • DR replicates to s3://ghasi-cdr-archive-dxb/… via async BucketReplication with retention preserved (replica also Object-Locked).
  • Replication lag target: ≤ 15 min P95, ≤ 1 h P99. Breach → SEV2 CdrReplicationLagHigh.
  • The DR bucket is read-only until a confirmed failover runbook is executed; write access requires break-glass credentials.

3. Outbox to NATS publication contract

Per the outbox pattern (see EVENT_SCHEMAS §6):

  1. Aggregate state change + outbox row are a single Postgres transaction.
  2. A single leader-elected relay process polls SELECT event_id, subject, partition_key, payload FROM cdr.outbox WHERE published_at IS NULL ORDER BY created_at LIMIT 1000 every 250 ms.
  3. For each row, publish to NATS JetStream with Nats-Msg-Id = event_id (JetStream dedup), Nats-Partition-Key = operatorId, body = JSON payload.
  4. On ACK from JetStream, UPDATE cdr.outbox SET published_at = now() WHERE event_id = $1.
  5. Rows unpublished > 7 days → SEV1 CdrOutboxStalled. Rows unpublished > 30 days → Legal notification (regulator evidence at risk).

Ordering guarantee. Within a given partitionKey (= operatorId), events are delivered to consumers in the order they were committed to Postgres. Cross-operator ordering is not guaranteed.

Exactly-once delivery. The combination of (outbox transaction, JetStream Msg-Id dedup, consumer durable AckExplicit) yields effectively-exactly-once semantics. Consumers still implement idempotency on eventId.


4. ATRA schema negotiation lifecycle

ATRA may evolve its TAP/RAP schema. The service handles this with a pluggable adapter pattern:

┌─────────────────────────────────────────────────────────────────┐
│ Phase A — ATRA announces new variant (e.g. atra-tap-v3) │
│ Commerce + Regulator Liaison receive schema docs. │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Phase B — Developer ships adapter class + ASN.1 module │
│ `AtraTapV3Adapter` + config/tap-schemas/atra-tap-v3.asn1│
│ cdr.regulator_schemas row INSERTed with active=FALSE. │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Phase C — Dry-run validation │
│ GET /v1/cdr/regulator-schemas/atra-tap-v3/validate │
│ encodes 100 sample CDRs; returns per-row success. │
│ ATRA runs cross-check against their ingestion tool. │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Phase D — Shadow generation (optional, 3 days) │
│ Generate atra-tap-v3 files alongside atra-tap-v2, │
│ upload to ATRA stage URL, verify ACK. │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Phase E — Cutover │
│ POST /v1/cdr/regulator-schemas/atra-tap-v3:activate │
│ (four-eyes required). Sets active=TRUE for TAP_3_12. │
│ Old variant rows retained; next TAP encoder run uses v3.│
│ Publishes cdr.config.changed.v1 and cdr.audit.v1. │
└─────────────────────────────────────────────────────────────────┘

Reversibility. Activation is reversible — re-activating the previous variant is allowed for the next encoder run. Files already delivered under the new variant remain authoritative; no retroactive re-encoding.

Pin at generation time, not CDR-creation time. When UC-07 runs for settlement day D-1, it pins the schema variant value as of job start. This prevents a mid-job activation from producing inconsistent output.


5. Schema stability guarantees

5.1 REST API

SurfaceStability
/v1/cdr/* pathAdditive changes non-breaking — new endpoints, new optional fields
Response envelope (items, nextCursor, total)Stable
Error envelope (error.code, error.message, error.details)Stable
schemaVariant parameter in export trigger bodyIndependent of REST version; drives TAP/RAP emission only
New required fieldsBreaking → /v2/cdr/*, 90-day overlap

5.2 NATS events

AttributeStability
Subjects (cdr.*.v1)Stable within major version
Additive optional fieldsNon-breaking
Enum expansion (e.g., new adjustmentType)Non-breaking — consumers must treat unknowns as UNKNOWN
Required field removalBreaking → bump to .v2 subject, co-publish 90 days
partitionKey NATS header semanticsStable — always operatorId

5.3 File format (TAP/RAP)

AttributeStability
File-name patternStable (variant defines sequence format)
Signature sidecar formatStable; new signers via KeyId addition
ASN.1 moduleManaged per variant — variant swap is the evolution mechanism

6. Idempotency guarantees

OperationIdempotency keyBehaviour on replay
DLR → CDR projection (UC-01)source_event_id (NATS event id)Second insert rejected by unique index; NATS ACKs both
Hourly rollup seal (UC-03)(bucketHour, operatorId)Unique constraint; retry finds existing row and ACKs
TAP file generation (UC-07)(recordingEntity, exportType, settlementDay)Second run with X-Force-Regen: true required; else 409 ALREADY_GENERATED
Adjustment issue (UC-06)Idempotency-Key header (24 h retention)Replay returns original response
Bulk re-rate enqueueIdempotency-Key
Export redropexportIdSecond redrop within 60 s is rejected as REDROP_COOLDOWN
Outbox publicationevent_id == NATS Msg-IdJetStream dedup window 5 min; subsequent redelivery is a no-op

7. Failure-mode interactions with sync contract

See FAILURE_MODES for full catalog. Sync-relevant modes:

  • FM-01 NATS consumer lag → CDR generation behind; outbox relay continues but publication to downstream consumers (regulator-portal-service, analytics-service) may lag.
  • FM-10 Schema-version mismatch during ATRA schema change window → exports pause until cutover completes; inbound generation unaffected.
  • FM-11 Multi-region replication conflict → resolved by append-only policy; any divergence reconciled from the S3 archive (authoritative).
  • FM-13 Duplicate CDR from NATS redelivery → unique index on source_event_id rejects second insert.

8. Cross-References

End of SYNC_CONTRACT.md