Skip to main content

cdr-mediation-service — Deployment Topology

Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison + SRE Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, docs/architecture/ADR-0004-national-backbone-resilience.md §15 CDR pipeline

Runtime + Kubernetes topology for cdr-mediation-service. The service runs as three distinct Deployments (ingest pool, batch worker pool, exporter pool) because each has different scaling dynamics.


1. Runtime

DimensionChoice
LanguageTypeScript 5.x strict
FrameworkNestJS + Fastify
Node.js20 LTS
ORMPrisma 5.x
NATSnats 2.10+ via shared @ghasi/nats-client
HSMPKCS#11 via @ghasi/hsm-client
S3AWS SDK v3 (compatible with MinIO/Ceph)
SFTPssh2-sftp-client with strict host-key checking
TAP 3.12 encoderasn1js + custom encoder
ContainerDistroless gcr.io/distroless/nodejs20

2. Kubernetes Resources

Three Deployments — separate lifecycles.

2.1 Ingest Deployment

Consumes NATS subjects; writes CDRs; stateless + horizontally scalable.

apiVersion: apps/v1
kind: Deployment
metadata: { name: cdr-ingest, namespace: ghasi-prod }
spec:
replicas: 3
selector: { matchLabels: { app: cdr-mediation, component: ingest } }
template:
metadata:
labels: { app: cdr-mediation, component: ingest, tier: commerce }
spec:
serviceAccountName: cdr-mediation
nodeSelector: { node-pool: np-ctrl }
containers:
- name: cdr-ingest
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/ingest/main.js"]
ports:
- { name: http, containerPort: 3071 }
- { name: metrics, containerPort: 9464 }
envFrom:
- { configMapRef: { name: cdr-mediation-config } }
- { secretRef: { name: cdr-mediation-secrets } }
resources:
requests: { cpu: "250m", memory: "256Mi" }
limits: { cpu: "1000m", memory: "1Gi" }
readinessProbe:
httpGet: { path: /health/ready, port: http }
periodSeconds: 5
livenessProbe:
httpGet: { path: /health/live, port: http }
periodSeconds: 10
securityContext:
runAsNonRoot: true
runAsUser: 10001
readOnlyRootFilesystem: true
capabilities: { drop: [ALL] }

2.2 Batch-Worker Deployment

Hourly rollup + daily archive + chain verifier + clickhouse sync. Distributed lock via Redis; only one instance runs each job at a time.

apiVersion: apps/v1
kind: Deployment
metadata: { name: cdr-batch, namespace: ghasi-prod }
spec:
replicas: 2 # 2 for HA; internal distributed lock ensures single-runner per job
selector: { matchLabels: { app: cdr-mediation, component: batch } }
template:
metadata:
labels: { app: cdr-mediation, component: batch, tier: commerce }
spec:
serviceAccountName: cdr-mediation
nodeSelector: { node-pool: np-ctrl }
containers:
- name: cdr-batch
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/main.js"]
ports:
- { name: http, containerPort: 3072 }
- { name: metrics, containerPort: 9465 }
envFrom:
- { configMapRef: { name: cdr-mediation-config } }
- { secretRef: { name: cdr-mediation-secrets } }
resources:
requests: { cpu: "500m", memory: "1Gi" }
limits: { cpu: "2000m", memory: "4Gi" } # rollups can be memory-heavy

2.3 Exporter Deployment

Daily regulator export; builds + HSM-signs + SFTP/HTTPS delivers.

apiVersion: apps/v1
kind: Deployment
metadata: { name: cdr-exporter, namespace: ghasi-prod }
spec:
replicas: 2
selector: { matchLabels: { app: cdr-mediation, component: exporter } }
template:
metadata:
labels: { app: cdr-mediation, component: exporter, tier: commerce }
spec:
serviceAccountName: cdr-mediation
nodeSelector:
node-pool: np-ctrl
hsm-accessible: "true"
containers:
- name: cdr-exporter
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/exporter/main.js"]
ports:
- { name: http, containerPort: 3073 }
- { name: metrics, containerPort: 9466 }
envFrom:
- { configMapRef: { name: cdr-mediation-config } }
- { secretRef: { name: cdr-mediation-secrets } }
resources:
requests: { cpu: "500m", memory: "1Gi" }
limits: { cpu: "2000m", memory: "4Gi" }
volumeMounts:
- { name: tmp, mountPath: /tmp }
- { name: hsm-socket, mountPath: /var/run/hsm }
volumes:
- name: tmp
emptyDir: { sizeLimit: 5Gi } # for temporary file assembly
- name: hsm-socket
hostPath: { path: /var/run/hsm, type: Socket }

2.4 HPA (ingest only — batch & exporter fixed)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: cdr-ingest, namespace: ghasi-prod }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: cdr-ingest }
minReplicas: 3
maxReplicas: 12
metrics:
- type: Pods
pods:
metric: { name: cdr_nats_consumer_lag }
target: { type: AverageValue, averageValue: "500" }
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }

2.5 PodDisruptionBudgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cdr-ingest, namespace: ghasi-prod }
spec: { minAvailable: 2, selector: { matchLabels: { app: cdr-mediation, component: ingest } } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cdr-batch, namespace: ghasi-prod }
spec: { minAvailable: 1, selector: { matchLabels: { app: cdr-mediation, component: batch } } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cdr-exporter, namespace: ghasi-prod }
spec: { minAvailable: 1, selector: { matchLabels: { app: cdr-mediation, component: exporter } } }

2.6 Services

Ingest has an HTTP admin service; Batch / Exporter are workers (no inbound traffic beyond admin + metrics).

apiVersion: v1
kind: Service
metadata: { name: cdr-ingest-http, namespace: ghasi-prod }
spec:
selector: { app: cdr-mediation, component: ingest }
ports:
- { name: http, port: 3071, targetPort: http }
- { name: metrics, port: 9464, targetPort: metrics }
type: ClusterIP

2.7 NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cdr-mediation-ingress, namespace: ghasi-prod }
spec:
podSelector: { matchLabels: { app: cdr-mediation } }
policyTypes: [Ingress]
ingress:
- from:
- namespaceSelector: { matchLabels: { name: ghasi-prod-edge } }
podSelector: { matchLabels: { app: kong } }
ports: [{ port: 3071, protocol: TCP }]
- from:
- namespaceSelector: { matchLabels: { name: ghasi-obs } }
ports: [{ port: 9464, protocol: TCP }, { port: 9465, protocol: TCP }, { port: 9466, protocol: TCP }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cdr-mediation-egress, namespace: ghasi-prod }
spec:
podSelector: { matchLabels: { app: cdr-mediation } }
policyTypes: [Egress]
egress:
- to:
- podSelector: { matchLabels: { app: postgres-primary } }
ports: [{ port: 5432, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: redis-cluster } }
ports: [{ port: 6379, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: nats } }
ports: [{ port: 4222, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: minio } }
ports: [{ port: 9000, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: clickhouse } }
ports: [{ port: 9000, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: hsm-proxy } }
ports: [{ port: 9211, protocol: TCP }]
- to: # ATRA SFTP + HTTPS (configured CIDRs)
- ipBlock: { cidr: 198.18.0.0/24 } # ATRA example
ports:
- { port: 22, protocol: TCP }
- { port: 443, protocol: TCP }

3. CronJobs

Rollup + archive + chain verifier + ClickHouse sync + daily export run as CronJobs (not always-on pods) for cost + isolation:

apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-rollup-hourly, namespace: ghasi-prod }
spec:
schedule: "5 * * * *" # 5 past every hour
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 7
successfulJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: cdr-mediation
restartPolicy: OnFailure
containers:
- name: rollup
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/rollup.js"]
envFrom: [ { configMapRef: { name: cdr-mediation-config } }, { secretRef: { name: cdr-mediation-secrets } } ]
---
apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-daily-export, namespace: ghasi-prod }
spec:
schedule: "30 0 * * *" # 00:30 Kabul daily (Asia/Kabul TZ in config)
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
serviceAccountName: cdr-mediation
restartPolicy: OnFailure
containers:
- name: exporter
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/exporter/daily.js"]
---
apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-audit-verifier, namespace: ghasi-prod }
spec:
schedule: "0 2 * * *" # daily 02:00 UTC
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: verifier
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/audit-verify.js"]
---
apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-archive, namespace: ghasi-prod }
spec:
schedule: "0 3 * * 0" # weekly Sunday 03:00 UTC
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: archive
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/archive.js"]

4. Region Affinity (ADR-0004 §5 + §15)

Data flowRegion posture
CDR rowsRegion-local (writes in home region)
cdr.audit.v1 eventsCross-region mirrored + leaf to dxb (audit only)
S3 hot + cold bucketsRegional + cross-region replication (async)
Regulator exportRun in primary region only (no concurrent exports to avoid duplicates)
ClickHouseRegional cluster; cross-region analytical queries federate

Export cron runs only in ghasi-prod-kbl; ghasi-prod-mzr is standby and can take over during manual-gated fail-over.


5. Infrastructure Dependencies

DependencyPurpose
PostgreSQL 16cdr schema + partitions
Redis 7Distributed locks for batch jobs + dedup cache
NATS JetStreamEvent bus (cdr.* + consumer subscriptions)
S3 / MinIOHot (optional hotpart) + cold (archive) CDR store
HSM (PKCS#11)Export file signing
VaultATRA SFTP creds, HSM PIN, S3 creds
ClickHouseAnalytics mirror (EP-ANLYT-02)
SPIRE / SPIFFEWorkload identity

6. Secrets (Vault paths)

SecretPathUse
Postgres dynamic credsecret/data/cdr-mediation/dbService user
NATS NKeysecret/data/cdr-mediation/nats-nkeyNATS auth
HSM PINsecret/data/cdr-mediation/hsm-pinPKCS#11 session
ATRA SFTP keysecret/data/cdr-mediation/atra-sftp/{destination}Per-destination SFTP private key
ATRA HTTPS client certsecret/data/cdr-mediation/atra-https/{destination}mTLS client cert for HTTPS variant
S3 access keysecret/data/cdr-mediation/s3S3 auth
Signing key referencesecret/data/cdr-mediation/sign-key-refHSM key handle

7. Config (ConfigMap)

apiVersion: v1
kind: ConfigMap
metadata: { name: cdr-mediation-config, namespace: ghasi-prod }
data:
LOG_LEVEL: "info"
REGION: "kbl"
TZ: "Asia/Kabul"
POSTGRES_URL: "postgres://cdr-mediation@postgres-primary:5432/cdr"
REDIS_URL: "redis://redis-cluster:6379/3"
NATS_URL: "nats://nats:4222"
S3_ENDPOINT: "https://minio.ghasi-prod.svc:9000"
S3_BUCKET_HOT: "cdr-hot-kbl"
S3_BUCKET_COLD: "cdr-cold-kbl" # cross-replicated to dxb
CLICKHOUSE_URL: "clickhouse://clickhouse:9000/cdr"
HSM_PKCS11_LIB: "/usr/lib/softhsm/libsofthsm2.so" # prod: thales library path
INGEST_CONSUMER_GROUP: "cdr-ingest"
EXPORT_SCHEMA_DEFAULT: "ATRA_TAP_312_V1"
HOT_RETENTION_DAYS: "30"
ARCHIVE_RETENTION_YEARS: "7"
DAILY_EXPORT_HOUR_KBL: "00" # 00:30 in cron → picks up prev-day CDRs
CHAIN_VERIFIER_WINDOW_HOURS: "24"

8. Deployment Gate Checklist

  • All 16 spec docs at "Complete" status.
  • Canary deploy to 1 ingest replica for 30 min; no lag spike.
  • Chain verifier runs clean for 7 consecutive days in staging.
  • ATRA staging export delivered + ACKed (dry-run).
  • HSM signing of staging export validated.
  • kubectl diff shows no surprise changes.
  • Rollback tested: reverting to previous image restores ingest + rollup SLOs within 5 min.
  • On-call acknowledges + approves.

9. Cost Envelope

Approximate per-region monthly cost at national-backbone scale (5 MNOs, expected ~100 M CDRs/month):

ComponentMonthly
Ingest pods (3-12)~$200
Batch + Exporter pods~$100
CronJobs~$20
Postgres (shared; CDR schema)~$60
Redis (shared)~$15
NATS (shared)~$15
S3 hot storage~$50
S3 cold archive (7 y)~$30
HSM (amortised)~$100
ClickHouse (shared; CDR fact tables)~$40
ATRA egressnominal (SFTP + daily file)

Postgres + S3 storage dominate at steady-state. HSM is amortised across regulator-facing services.