cbc-bridge-service — Deployment Topology

Version: 1.0 Status: Draft Owner: Government / Emergency + SRE Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, docs/architecture/ADR-0004-national-backbone-resilience.md, services/compliance-engine/DEPLOYMENT_TOPOLOGY.md

Runtime and deployment topology for cbc-bridge-service. Because this service has rare but critical traffic, the topology prioritises availability + correctness + regulator-defensibility over raw throughput.

1. Runtime

Dimension	Choice	Rationale
Language	TypeScript 5.x strict	Platform default
Framework	NestJS + Fastify adapter	Platform default
Node.js	20 LTS	Platform default
gRPC	`@grpc/grpc-js` via NestJS microservice	BroadcastEmergency, CancelBroadcast, GetBroadcastStatus
HTTP	NestJS HTTP over Fastify	Admin REST + health + metrics
ORM	Prisma 5.x	Platform default
NATS	`nats` 2.10+ via shared `@ghasi/nats-client`	Event publishing
HSM	PKCS#11 via `@ghasi/hsm-client`	Signature verification + report signing
Container	Distroless `gcr.io/distroless/nodejs20`	Minimal attack surface

2. Kubernetes Resources

2.1 Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cbc-bridge
  namespace: ghasi-prod
  labels:
    app: cbc-bridge
    tier: national-asset
    data-plane: government-emergency
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0       # never drop availability — emergency service
  selector:
    matchLabels: { app: cbc-bridge }
  template:
    metadata:
      labels:
        app: cbc-bridge
        tier: national-asset
      annotations:
        spire.io/workload: "cbc-bridge"
        vault.hashicorp.com/agent-inject: "true"
    spec:
      serviceAccountName: cbc-bridge
      nodeSelector:
        node-pool: np-data                # ADR-0004 §6
        hsm-accessible: "true"
      tolerations:
        - key: node-pool
          operator: Equal
          value: np-data
          effect: NoSchedule
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values: ["cbc-bridge"]
              topologyKey: "topology.kubernetes.io/zone"
      containers:
        - name: cbc-bridge
          image: ghcr.io/ghasi/cbc-bridge-service:<digest>
          ports:
            - { name: grpc, containerPort: 50061 }
            - { name: http, containerPort: 3061 }
            - { name: metrics, containerPort: 9464 }
          envFrom:
            - configMapRef: { name: cbc-bridge-config }
            - secretRef:    { name: cbc-bridge-secrets }
          resources:
            requests: { cpu: "500m",  memory: "512Mi" }
            limits:   { cpu: "2000m", memory: "2Gi"  }
          readinessProbe:
            httpGet: { path: /health/ready, port: http }
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet: { path: /health/live, port: http }
            periodSeconds: 10
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["node", "/app/dist/scripts/drain.js", "--seconds", "20"]
          securityContext:
            runAsNonRoot: true
            runAsUser: 10001
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities: { drop: [ALL] }
          volumeMounts:
            - { name: tmp,    mountPath: /tmp }
            - { name: hsm-socket, mountPath: /var/run/hsm }
      volumes:
        - name: tmp
          emptyDir: {}
        - name: hsm-socket
          hostPath: { path: /var/run/hsm, type: Socket }
      terminationGracePeriodSeconds: 30

2.2 HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: cbc-bridge, namespace: ghasi-prod }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: cbc-bridge }
  minReplicas: 3
  maxReplicas: 8
  metrics:
    - type: Resource
      resource: { name: cpu, target: { type: Utilization, averageUtilization: 60 } }
    - type: Pods
      pods:
        metric: { name: cbc_broadcast_requested_rate_per_minute }
        target: { type: AverageValue, averageValue: "3" }
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 600          # conservative; this is emergency infra
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - { type: Percent, value: 100, periodSeconds: 60 }

2.3 PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cbc-bridge, namespace: ghasi-prod }
spec:
  minAvailable: 2
  selector: { matchLabels: { app: cbc-bridge } }

2.4 Services

apiVersion: v1
kind: Service
metadata: { name: cbc-bridge-grpc, namespace: ghasi-prod }
spec:
  selector: { app: cbc-bridge }
  ports:
    - { name: grpc, port: 50061, targetPort: grpc, protocol: TCP }
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: cbc-bridge-http, namespace: ghasi-prod }
spec:
  selector: { app: cbc-bridge }
  ports:
    - { name: http, port: 3061, targetPort: http, protocol: TCP }
    - { name: metrics, port: 9464, targetPort: metrics, protocol: TCP }
  type: ClusterIP

2.5 NetworkPolicy

Deny-by-default; explicit allow per source / destination.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cbc-bridge-ingress, namespace: ghasi-prod }
spec:
  podSelector: { matchLabels: { app: cbc-bridge } }
  policyTypes: [Ingress]
  ingress:
    - from:
        - podSelector:
            matchLabels: { app: regulator-portal-service }
      ports: [{ port: 50061, protocol: TCP }]
    - from:
        - namespaceSelector:
            matchLabels: { name: ghasi-prod-edge }
          podSelector:
            matchLabels: { app: kong }
      ports: [{ port: 3061, protocol: TCP }]
    - from:
        - podSelector:
            matchLabels: { app: admin-dashboard }
      ports: [{ port: 3061, protocol: TCP }]
    - from:
        - namespaceSelector: { matchLabels: { name: ghasi-obs } }
      ports: [{ port: 9464, protocol: TCP }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cbc-bridge-egress, namespace: ghasi-prod }
spec:
  podSelector: { matchLabels: { app: cbc-bridge } }
  policyTypes: [Egress]
  egress:
    - to:  # Postgres
        - podSelector: { matchLabels: { app: postgres-primary } }
      ports: [{ port: 5432, protocol: TCP }]
    - to:  # Redis
        - podSelector: { matchLabels: { app: redis-cluster } }
      ports: [{ port: 6379, protocol: TCP }]
    - to:  # NATS
        - podSelector: { matchLabels: { app: nats } }
      ports: [{ port: 4222, protocol: TCP }]
    - to:  # HSM proxy
        - podSelector: { matchLabels: { app: hsm-proxy } }
      ports: [{ port: 9211, protocol: TCP }]
    - to:  # Vault Agent
        - podSelector: { matchLabels: { app: vault-agent } }
      ports: [{ port: 8200, protocol: TCP }]
    - to:  # MNO CBE endpoints — explicit CIDR per MNO (PrometheusRule / ConfigMap driven)
        - ipBlock: { cidr: 203.0.113.0/24 }    # AWCC CBE
        - ipBlock: { cidr: 198.51.100.0/24 }   # Roshan CBE
        - ipBlock: { cidr: 203.0.114.0/24 }    # Etisalat AF CBE
        - ipBlock: { cidr: 203.0.115.0/24 }    # MTN AF CBE
        - ipBlock: { cidr: 203.0.116.0/24 }    # Salaam CBE
      ports:
        - { port: 443, protocol: TCP }      # HTTPS / proprietary-over-TLS
        - { port: 2775, protocol: TCP }     # SMPP-over-TLS fallback

3. Region Affinity (ADR-0004 §5, §14)

Concern	Region posture
Active deployment	`ghasi-prod-kbl` (Kabul — primary), `ghasi-prod-mzr` (Mazar — hot standby)
HSM custody	Per-region HSM HA pair (ADR-0004 §11) with asymmetric key escrow in `dxb`
Postgres	Per-region Patroni cluster; control-plane tables (authorised callers, restricted patterns) multi-master; broadcasts region-local
NATS	Super-cluster kbl/mzr; `cbc.audit.v1` mirrored cross-region + leaf to `dxb`
MNO CBE endpoints	Region-local egress: Kabul pods use Kabul egress IP pool; Mazar pods use Mazar egress pool. Both registered with MNOs.
Failover for broadcasts	Manual-gated region switch (no split-brain on in-flight broadcasts)

4. Background Workers

Separate Deployments (not cron on main pods) for isolation + independent scaling:

Worker	Schedule	Replicas	Responsibility
`cbc-bridge-audit-verifier`	cron `0 2 * * *` (daily 02:00 UTC)	1 (distributed lock)	Verify hash chain last 24 h; emit status metric + alert on break
`cbc-bridge-drill-scheduler`	cron `0 10 1-7 * 2` (first Tuesday of month, 10:00 Asia/Kabul)	1	Submits monthly drill broadcast
`cbc-bridge-cell-db-refresher`	cron `0 3 * * 0` (weekly Sunday 03:00)	1	Pulls cell-tower database per MNO
`cbc-bridge-cert-monitor`	cron `/15 * * *` (every 15 min)	1	Checks caller-cert expiry; emits metric

5. Infrastructure Dependencies

Dependency	Purpose	Version
PostgreSQL 16	`cbc` schema (broadcasts, audit, authorised callers, cell DB)	16.x
Redis 7 cluster	Hot cache: caller lookup, CRL/OCSP cache, dispatch-inflight tracking	7.x
NATS JetStream	Event bus; `cbc.*` streams	2.10+
HSM (PKCS#11)	Signature verification + signing	Thales nShield / SoftHSM2 (dev)
Vault	MNO CBE credentials; CA trust material	—
S3 (MinIO compat)	Drill after-action reports; signed files	—
SPIRE / SPIFFE	Workload identity	—
ClickHouse (optional)	Cold-tier audit query	—

6. Secrets

All secrets injected via Vault Agent sidecar at /vault/secrets/.

Secret	Vault path	Purpose
Postgres credentials	`secret/data/cbc/db`	PG auth (dynamic, short-lived)
NATS NKey	`secret/data/cbc/nats-nkey`	NATS auth
MNO CBE credentials (per MNO)	`secret/data/cbc/mno/{mnoId}/cbe-creds`	Per-MNO adapter auth
HSM PIN	`secret/data/cbc/hsm-pin`	PKCS#11 session open
SPIRE SVID	—	Auto-rotated by SPIRE agent hourly
Trust anchors	`secret/data/cbc/trust-anchors`	National-PKI root + intermediate certs
ATRA SFTP (for sub-packages)	`secret/data/cbc/atra-sftp`	If Ghasi archives drill reports to ATRA

No secret is ever written to a ConfigMap, Helm values file, or source repository.

7. Config

ConfigMap cbc-bridge-config holds non-secret runtime config:

apiVersion: v1
kind: ConfigMap
metadata: { name: cbc-bridge-config, namespace: ghasi-prod }
data:
  LOG_LEVEL: "info"
  GRPC_TLS_ENABLED: "true"
  REGION: "kbl"
  DISPATCH_TIMEOUT_SECONDS: "30"
  CANCEL_WINDOW_SECONDS: "60"
  DRILL_CADENCE_CRON: "0 10 1-7 * 2"
  CELL_DB_REFRESH_CRON: "0 3 * * 0"
  AUDIT_VERIFIER_CRON: "0 2 * * *"
  AUTH_CALLER_CACHE_TTL_SECONDS: "300"
  CRL_CACHE_TTL_SECONDS: "14400"
  OCSP_STAPLE_REQUIRED: "true"
  BROADCAST_REPLAY_WINDOW_SECONDS: "300"
  NATS_STREAM_PREFIX: "CBC"
  DEFAULT_CBS_SERIAL_BASE: "1"
  CBE_ADAPTER_AWCC: "standard3gpp"
  CBE_ADAPTER_ROSHAN: "ericsson"
  CBE_ADAPTER_ETISALAT_AF: "standard3gpp"
  CBE_ADAPTER_MTN_AF: "huawei"
  CBE_ADAPTER_SALAAM: "standard3gpp"

Adapter selection is config-driven so a new MNO or protocol-change can be deployed without image rebuild.

8. Scaling Considerations

Because broadcast traffic is rare, the replica floor (3) is driven by availability, not load:

3 replicas across 3 AZs ensures a zone outage doesn't take down the service.
HPA scales up on broadcast-rate metric for burst events (national emergency).
Scale-down is slow (10-min stabilisation) to avoid thrashing during investigation windows.

HSM capacity: each pod uses up to 4 HSM sessions. HSM pair supports 200+ concurrent sessions — ample headroom.

9. Deployment Gate Checklist

Before a new image is promoted to ghasi-prod:

All 16 spec docs at Complete status.
Canary deploy to 1 replica for 30 min; broadcast-accept success + zero new critical alerts.
Staging GameDay passes (see TESTING_STRATEGY §6).
kubectl diff shows no surprise changes.
On-call acknowledges + approves.
CISO + CTO both sign off for emergency-broadcast-impacting changes.
Rollback tested: reverting to previous image restores P99 within 5 min.

10. Cost Envelope

Approximate per-region monthly cost at steady state (3 replicas + 1 audit verifier + 1 drill scheduler + 1 cell-db refresher + 1 cert monitor):

Component	Monthly
Compute	~$180 (3 × small-ish TypeScript pods + 4 cron workers)
Postgres (shared cluster, cbc tables only)	~$40
Redis (shared)	~$20
NATS (shared)	~$15
HSM (shared per-region)	~$800 amortised
S3	< $10
Egress to MNO CBE	per-MNO contract (dedicated link)

HSM dominates cost; shared across multiple services (compliance-engine, cdr-mediation, regulator-portal, consent-ledger).

1. Runtime​

2. Kubernetes Resources​

2.1 Deployment​

2.2 HPA​

2.3 PodDisruptionBudget​

2.4 Services​

2.5 NetworkPolicy​

3. Region Affinity (ADR-0004 §5, §14)​

4. Background Workers​

5. Infrastructure Dependencies​

6. Secrets​

7. Config​

8. Scaling Considerations​

9. Deployment Gate Checklist​

10. Cost Envelope​