cdr-mediation-service — Deployment Topology
Version: 1.0 Status: Draft Owner: Commerce + Regulator Liaison + SRE Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, docs/architecture/ADR-0004-national-backbone-resilience.md §15 CDR pipeline
Runtime + Kubernetes topology for cdr-mediation-service. The service runs as three distinct Deployments (ingest pool, batch worker pool, exporter pool) because each has different scaling dynamics.
1. Runtime
| Dimension | Choice |
|---|---|
| Language | TypeScript 5.x strict |
| Framework | NestJS + Fastify |
| Node.js | 20 LTS |
| ORM | Prisma 5.x |
| NATS | nats 2.10+ via shared @ghasi/nats-client |
| HSM | PKCS#11 via @ghasi/hsm-client |
| S3 | AWS SDK v3 (compatible with MinIO/Ceph) |
| SFTP | ssh2-sftp-client with strict host-key checking |
| TAP 3.12 encoder | asn1js + custom encoder |
| Container | Distroless gcr.io/distroless/nodejs20 |
2. Kubernetes Resources
Three Deployments — separate lifecycles.
2.1 Ingest Deployment
Consumes NATS subjects; writes CDRs; stateless + horizontally scalable.
apiVersion: apps/v1
kind: Deployment
metadata: { name: cdr-ingest, namespace: ghasi-prod }
spec:
replicas: 3
selector: { matchLabels: { app: cdr-mediation, component: ingest } }
template:
metadata:
labels: { app: cdr-mediation, component: ingest, tier: commerce }
spec:
serviceAccountName: cdr-mediation
nodeSelector: { node-pool: np-ctrl }
containers:
- name: cdr-ingest
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/ingest/main.js"]
ports:
- { name: http, containerPort: 3071 }
- { name: metrics, containerPort: 9464 }
envFrom:
- { configMapRef: { name: cdr-mediation-config } }
- { secretRef: { name: cdr-mediation-secrets } }
resources:
requests: { cpu: "250m", memory: "256Mi" }
limits: { cpu: "1000m", memory: "1Gi" }
readinessProbe:
httpGet: { path: /health/ready, port: http }
periodSeconds: 5
livenessProbe:
httpGet: { path: /health/live, port: http }
periodSeconds: 10
securityContext:
runAsNonRoot: true
runAsUser: 10001
readOnlyRootFilesystem: true
capabilities: { drop: [ALL] }
2.2 Batch-Worker Deployment
Hourly rollup + daily archive + chain verifier + clickhouse sync. Distributed lock via Redis; only one instance runs each job at a time.
apiVersion: apps/v1
kind: Deployment
metadata: { name: cdr-batch, namespace: ghasi-prod }
spec:
replicas: 2 # 2 for HA; internal distributed lock ensures single-runner per job
selector: { matchLabels: { app: cdr-mediation, component: batch } }
template:
metadata:
labels: { app: cdr-mediation, component: batch, tier: commerce }
spec:
serviceAccountName: cdr-mediation
nodeSelector: { node-pool: np-ctrl }
containers:
- name: cdr-batch
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/main.js"]
ports:
- { name: http, containerPort: 3072 }
- { name: metrics, containerPort: 9465 }
envFrom:
- { configMapRef: { name: cdr-mediation-config } }
- { secretRef: { name: cdr-mediation-secrets } }
resources:
requests: { cpu: "500m", memory: "1Gi" }
limits: { cpu: "2000m", memory: "4Gi" } # rollups can be memory-heavy
2.3 Exporter Deployment
Daily regulator export; builds + HSM-signs + SFTP/HTTPS delivers.
apiVersion: apps/v1
kind: Deployment
metadata: { name: cdr-exporter, namespace: ghasi-prod }
spec:
replicas: 2
selector: { matchLabels: { app: cdr-mediation, component: exporter } }
template:
metadata:
labels: { app: cdr-mediation, component: exporter, tier: commerce }
spec:
serviceAccountName: cdr-mediation
nodeSelector:
node-pool: np-ctrl
hsm-accessible: "true"
containers:
- name: cdr-exporter
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/exporter/main.js"]
ports:
- { name: http, containerPort: 3073 }
- { name: metrics, containerPort: 9466 }
envFrom:
- { configMapRef: { name: cdr-mediation-config } }
- { secretRef: { name: cdr-mediation-secrets } }
resources:
requests: { cpu: "500m", memory: "1Gi" }
limits: { cpu: "2000m", memory: "4Gi" }
volumeMounts:
- { name: tmp, mountPath: /tmp }
- { name: hsm-socket, mountPath: /var/run/hsm }
volumes:
- name: tmp
emptyDir: { sizeLimit: 5Gi } # for temporary file assembly
- name: hsm-socket
hostPath: { path: /var/run/hsm, type: Socket }
2.4 HPA (ingest only — batch & exporter fixed)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: cdr-ingest, namespace: ghasi-prod }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: cdr-ingest }
minReplicas: 3
maxReplicas: 12
metrics:
- type: Pods
pods:
metric: { name: cdr_nats_consumer_lag }
target: { type: AverageValue, averageValue: "500" }
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
2.5 PodDisruptionBudgets
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cdr-ingest, namespace: ghasi-prod }
spec: { minAvailable: 2, selector: { matchLabels: { app: cdr-mediation, component: ingest } } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cdr-batch, namespace: ghasi-prod }
spec: { minAvailable: 1, selector: { matchLabels: { app: cdr-mediation, component: batch } } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cdr-exporter, namespace: ghasi-prod }
spec: { minAvailable: 1, selector: { matchLabels: { app: cdr-mediation, component: exporter } } }
2.6 Services
Ingest has an HTTP admin service; Batch / Exporter are workers (no inbound traffic beyond admin + metrics).
apiVersion: v1
kind: Service
metadata: { name: cdr-ingest-http, namespace: ghasi-prod }
spec:
selector: { app: cdr-mediation, component: ingest }
ports:
- { name: http, port: 3071, targetPort: http }
- { name: metrics, port: 9464, targetPort: metrics }
type: ClusterIP
2.7 NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cdr-mediation-ingress, namespace: ghasi-prod }
spec:
podSelector: { matchLabels: { app: cdr-mediation } }
policyTypes: [Ingress]
ingress:
- from:
- namespaceSelector: { matchLabels: { name: ghasi-prod-edge } }
podSelector: { matchLabels: { app: kong } }
ports: [{ port: 3071, protocol: TCP }]
- from:
- namespaceSelector: { matchLabels: { name: ghasi-obs } }
ports: [{ port: 9464, protocol: TCP }, { port: 9465, protocol: TCP }, { port: 9466, protocol: TCP }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cdr-mediation-egress, namespace: ghasi-prod }
spec:
podSelector: { matchLabels: { app: cdr-mediation } }
policyTypes: [Egress]
egress:
- to:
- podSelector: { matchLabels: { app: postgres-primary } }
ports: [{ port: 5432, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: redis-cluster } }
ports: [{ port: 6379, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: nats } }
ports: [{ port: 4222, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: minio } }
ports: [{ port: 9000, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: clickhouse } }
ports: [{ port: 9000, protocol: TCP }]
- to:
- podSelector: { matchLabels: { app: hsm-proxy } }
ports: [{ port: 9211, protocol: TCP }]
- to: # ATRA SFTP + HTTPS (configured CIDRs)
- ipBlock: { cidr: 198.18.0.0/24 } # ATRA example
ports:
- { port: 22, protocol: TCP }
- { port: 443, protocol: TCP }
3. CronJobs
Rollup + archive + chain verifier + ClickHouse sync + daily export run as CronJobs (not always-on pods) for cost + isolation:
apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-rollup-hourly, namespace: ghasi-prod }
spec:
schedule: "5 * * * *" # 5 past every hour
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 7
successfulJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: cdr-mediation
restartPolicy: OnFailure
containers:
- name: rollup
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/rollup.js"]
envFrom: [ { configMapRef: { name: cdr-mediation-config } }, { secretRef: { name: cdr-mediation-secrets } } ]
---
apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-daily-export, namespace: ghasi-prod }
spec:
schedule: "30 0 * * *" # 00:30 Kabul daily (Asia/Kabul TZ in config)
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
serviceAccountName: cdr-mediation
restartPolicy: OnFailure
containers:
- name: exporter
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/exporter/daily.js"]
---
apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-audit-verifier, namespace: ghasi-prod }
spec:
schedule: "0 2 * * *" # daily 02:00 UTC
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: verifier
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/audit-verify.js"]
---
apiVersion: batch/v1
kind: CronJob
metadata: { name: cdr-archive, namespace: ghasi-prod }
spec:
schedule: "0 3 * * 0" # weekly Sunday 03:00 UTC
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: archive
image: ghcr.io/ghasi/cdr-mediation-service:<digest>
args: ["node", "dist/apps/batch/archive.js"]
4. Region Affinity (ADR-0004 §5 + §15)
| Data flow | Region posture |
|---|---|
| CDR rows | Region-local (writes in home region) |
cdr.audit.v1 events | Cross-region mirrored + leaf to dxb (audit only) |
| S3 hot + cold buckets | Regional + cross-region replication (async) |
| Regulator export | Run in primary region only (no concurrent exports to avoid duplicates) |
| ClickHouse | Regional cluster; cross-region analytical queries federate |
Export cron runs only in ghasi-prod-kbl; ghasi-prod-mzr is standby and can take over during manual-gated fail-over.
5. Infrastructure Dependencies
| Dependency | Purpose |
|---|---|
| PostgreSQL 16 | cdr schema + partitions |
| Redis 7 | Distributed locks for batch jobs + dedup cache |
| NATS JetStream | Event bus (cdr.* + consumer subscriptions) |
| S3 / MinIO | Hot (optional hotpart) + cold (archive) CDR store |
| HSM (PKCS#11) | Export file signing |
| Vault | ATRA SFTP creds, HSM PIN, S3 creds |
| ClickHouse | Analytics mirror (EP-ANLYT-02) |
| SPIRE / SPIFFE | Workload identity |
6. Secrets (Vault paths)
| Secret | Path | Use |
|---|---|---|
| Postgres dynamic cred | secret/data/cdr-mediation/db | Service user |
| NATS NKey | secret/data/cdr-mediation/nats-nkey | NATS auth |
| HSM PIN | secret/data/cdr-mediation/hsm-pin | PKCS#11 session |
| ATRA SFTP key | secret/data/cdr-mediation/atra-sftp/{destination} | Per-destination SFTP private key |
| ATRA HTTPS client cert | secret/data/cdr-mediation/atra-https/{destination} | mTLS client cert for HTTPS variant |
| S3 access key | secret/data/cdr-mediation/s3 | S3 auth |
| Signing key reference | secret/data/cdr-mediation/sign-key-ref | HSM key handle |
7. Config (ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata: { name: cdr-mediation-config, namespace: ghasi-prod }
data:
LOG_LEVEL: "info"
REGION: "kbl"
TZ: "Asia/Kabul"
POSTGRES_URL: "postgres://cdr-mediation@postgres-primary:5432/cdr"
REDIS_URL: "redis://redis-cluster:6379/3"
NATS_URL: "nats://nats:4222"
S3_ENDPOINT: "https://minio.ghasi-prod.svc:9000"
S3_BUCKET_HOT: "cdr-hot-kbl"
S3_BUCKET_COLD: "cdr-cold-kbl" # cross-replicated to dxb
CLICKHOUSE_URL: "clickhouse://clickhouse:9000/cdr"
HSM_PKCS11_LIB: "/usr/lib/softhsm/libsofthsm2.so" # prod: thales library path
INGEST_CONSUMER_GROUP: "cdr-ingest"
EXPORT_SCHEMA_DEFAULT: "ATRA_TAP_312_V1"
HOT_RETENTION_DAYS: "30"
ARCHIVE_RETENTION_YEARS: "7"
DAILY_EXPORT_HOUR_KBL: "00" # 00:30 in cron → picks up prev-day CDRs
CHAIN_VERIFIER_WINDOW_HOURS: "24"
8. Deployment Gate Checklist
- All 16 spec docs at "Complete" status.
- Canary deploy to 1 ingest replica for 30 min; no lag spike.
- Chain verifier runs clean for 7 consecutive days in staging.
- ATRA staging export delivered + ACKed (dry-run).
- HSM signing of staging export validated.
-
kubectl diffshows no surprise changes. - Rollback tested: reverting to previous image restores ingest + rollup SLOs within 5 min.
- On-call acknowledges + approves.
9. Cost Envelope
Approximate per-region monthly cost at national-backbone scale (5 MNOs, expected ~100 M CDRs/month):
| Component | Monthly |
|---|---|
| Ingest pods (3-12) | ~$200 |
| Batch + Exporter pods | ~$100 |
| CronJobs | ~$20 |
| Postgres (shared; CDR schema) | ~$60 |
| Redis (shared) | ~$15 |
| NATS (shared) | ~$15 |
| S3 hot storage | ~$50 |
| S3 cold archive (7 y) | ~$30 |
| HSM (amortised) | ~$100 |
| ClickHouse (shared; CDR fact tables) | ~$40 |
| ATRA egress | nominal (SFTP + daily file) |
Postgres + S3 storage dominate at steady-state. HSM is amortised across regulator-facing services.