cbc-bridge-service — Deployment Topology
Version: 1.0 Status: Draft Owner: Government / Emergency + SRE Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, docs/architecture/ADR-0004-national-backbone-resilience.md, services/compliance-engine/DEPLOYMENT_TOPOLOGY.md
Runtime and deployment topology for cbc-bridge-service. Because this service has rare but critical traffic, the topology prioritises availability + correctness + regulator-defensibility over raw throughput.
1. Runtime
| Dimension | Choice | Rationale |
|---|---|---|
| Language | TypeScript 5.x strict | Platform default |
| Framework | NestJS + Fastify adapter | Platform default |
| Node.js | 20 LTS | Platform default |
| gRPC | @grpc/grpc-js via NestJS microservice | BroadcastEmergency, CancelBroadcast, GetBroadcastStatus |
| HTTP | NestJS HTTP over Fastify | Admin REST + health + metrics |
| ORM | Prisma 5.x | Platform default |
| NATS | nats 2.10+ via shared @ghasi/nats-client | Event publishing |
| HSM | PKCS#11 via @ghasi/hsm-client | Signature verification + report signing |
| Container | Distroless gcr.io/distroless/nodejs20 | Minimal attack surface |
2. Kubernetes Resources
2.1 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cbc-bridge
namespace: ghasi-prod
labels:
app: cbc-bridge
tier: national-asset
data-plane: government-emergency
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # never drop availability — emergency service
selector:
matchLabels: { app: cbc-bridge }
template:
metadata:
labels:
app: cbc-bridge
tier: national-asset
annotations:
spire.io/workload: "cbc-bridge"
vault.hashicorp.com/agent-inject: "true"
spec:
serviceAccountName: cbc-bridge
nodeSelector:
node-pool: np-data # ADR-0004 §6
hsm-accessible: "true"
tolerations:
- key: node-pool
operator: Equal
value: np-data
effect: NoSchedule
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["cbc-bridge"]
topologyKey: "topology.kubernetes.io/zone"
containers:
- name: cbc-bridge
image: ghcr.io/ghasi/cbc-bridge-service:<digest>
ports:
- { name: grpc, containerPort: 50061 }
- { name: http, containerPort: 3061 }
- { name: metrics, containerPort: 9464 }
envFrom:
- configMapRef: { name: cbc-bridge-config }
- secretRef: { name: cbc-bridge-secrets }
resources:
requests: { cpu: "500m", memory: "512Mi" }
limits: { cpu: "2000m", memory: "2Gi" }
readinessProbe:
httpGet: { path: /health/ready, port: http }
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet: { path: /health/live, port: http }
periodSeconds: 10
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["node", "/app/dist/scripts/drain.js", "--seconds", "20"]
securityContext:
runAsNonRoot: true
runAsUser: 10001
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities: { drop: [ALL] }
volumeMounts:
- { name: tmp, mountPath: /tmp }
- { name: hsm-socket, mountPath: /var/run/hsm }
volumes:
- name: tmp
emptyDir: {}
- name: hsm-socket
hostPath: { path: /var/run/hsm, type: Socket }
terminationGracePeriodSeconds: 30
2.2 HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: cbc-bridge, namespace: ghasi-prod }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: cbc-bridge }
minReplicas: 3
maxReplicas: 8
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 60 } }
- type: Pods
pods:
metric: { name: cbc_broadcast_requested_rate_per_minute }
target: { type: AverageValue, averageValue: "3" }
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # conservative; this is emergency infra
scaleUp:
stabilizationWindowSeconds: 30
policies:
- { type: Percent, value: 100, periodSeconds: 60 }
2.3 PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: cbc-bridge, namespace: ghasi-prod }
spec:
minAvailable: 2
selector: { matchLabels: { app: cbc-bridge } }
2.4 Services
apiVersion: v1
kind: Service
metadata: { name: cbc-bridge-grpc, namespace: ghasi-prod }
spec:
selector: { app: cbc-bridge }
ports:
- { name: grpc, port: 50061, targetPort: grpc, protocol: TCP }
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: cbc-bridge-http, namespace: ghasi-prod }
spec:
selector: { app: cbc-bridge }
ports:
- { name: http, port: 3061, targetPort: http, protocol: TCP }
- { name: metrics, port: 9464, targetPort: metrics, protocol: TCP }
type: ClusterIP
2.5 NetworkPolicy
Deny-by-default; explicit allow per source / destination.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cbc-bridge-ingress, namespace: ghasi-prod }
spec:
podSelector: { matchLabels: { app: cbc-bridge } }
policyTypes: [Ingress]
ingress:
- from:
- podSelector:
matchLabels: { app: regulator-portal-service }
ports: [{ port: 50061, protocol: TCP }]
- from:
- namespaceSelector:
matchLabels: { name: ghasi-prod-edge }
podSelector:
matchLabels: { app: kong }
ports: [{ port: 3061, protocol: TCP }]
- from:
- podSelector:
matchLabels: { app: admin-dashboard }
ports: [{ port: 3061, protocol: TCP }]
- from:
- namespaceSelector: { matchLabels: { name: ghasi-obs } }
ports: [{ port: 9464, protocol: TCP }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: cbc-bridge-egress, namespace: ghasi-prod }
spec:
podSelector: { matchLabels: { app: cbc-bridge } }
policyTypes: [Egress]
egress:
- to: # Postgres
- podSelector: { matchLabels: { app: postgres-primary } }
ports: [{ port: 5432, protocol: TCP }]
- to: # Redis
- podSelector: { matchLabels: { app: redis-cluster } }
ports: [{ port: 6379, protocol: TCP }]
- to: # NATS
- podSelector: { matchLabels: { app: nats } }
ports: [{ port: 4222, protocol: TCP }]
- to: # HSM proxy
- podSelector: { matchLabels: { app: hsm-proxy } }
ports: [{ port: 9211, protocol: TCP }]
- to: # Vault Agent
- podSelector: { matchLabels: { app: vault-agent } }
ports: [{ port: 8200, protocol: TCP }]
- to: # MNO CBE endpoints — explicit CIDR per MNO (PrometheusRule / ConfigMap driven)
- ipBlock: { cidr: 203.0.113.0/24 } # AWCC CBE
- ipBlock: { cidr: 198.51.100.0/24 } # Roshan CBE
- ipBlock: { cidr: 203.0.114.0/24 } # Etisalat AF CBE
- ipBlock: { cidr: 203.0.115.0/24 } # MTN AF CBE
- ipBlock: { cidr: 203.0.116.0/24 } # Salaam CBE
ports:
- { port: 443, protocol: TCP } # HTTPS / proprietary-over-TLS
- { port: 2775, protocol: TCP } # SMPP-over-TLS fallback
3. Region Affinity (ADR-0004 §5, §14)
| Concern | Region posture |
|---|---|
| Active deployment | ghasi-prod-kbl (Kabul — primary), ghasi-prod-mzr (Mazar — hot standby) |
| HSM custody | Per-region HSM HA pair (ADR-0004 §11) with asymmetric key escrow in dxb |
| Postgres | Per-region Patroni cluster; control-plane tables (authorised callers, restricted patterns) multi-master; broadcasts region-local |
| NATS | Super-cluster kbl/mzr; cbc.audit.v1 mirrored cross-region + leaf to dxb |
| MNO CBE endpoints | Region-local egress: Kabul pods use Kabul egress IP pool; Mazar pods use Mazar egress pool. Both registered with MNOs. |
| Failover for broadcasts | Manual-gated region switch (no split-brain on in-flight broadcasts) |
4. Background Workers
Separate Deployments (not cron on main pods) for isolation + independent scaling:
| Worker | Schedule | Replicas | Responsibility |
|---|---|---|---|
cbc-bridge-audit-verifier | cron 0 2 * * * (daily 02:00 UTC) | 1 (distributed lock) | Verify hash chain last 24 h; emit status metric + alert on break |
cbc-bridge-drill-scheduler | cron 0 10 1-7 * 2 (first Tuesday of month, 10:00 Asia/Kabul) | 1 | Submits monthly drill broadcast |
cbc-bridge-cell-db-refresher | cron 0 3 * * 0 (weekly Sunday 03:00) | 1 | Pulls cell-tower database per MNO |
cbc-bridge-cert-monitor | cron */15 * * * * (every 15 min) | 1 | Checks caller-cert expiry; emits metric |
5. Infrastructure Dependencies
| Dependency | Purpose | Version |
|---|---|---|
| PostgreSQL 16 | cbc schema (broadcasts, audit, authorised callers, cell DB) | 16.x |
| Redis 7 cluster | Hot cache: caller lookup, CRL/OCSP cache, dispatch-inflight tracking | 7.x |
| NATS JetStream | Event bus; cbc.* streams | 2.10+ |
| HSM (PKCS#11) | Signature verification + signing | Thales nShield / SoftHSM2 (dev) |
| Vault | MNO CBE credentials; CA trust material | — |
| S3 (MinIO compat) | Drill after-action reports; signed files | — |
| SPIRE / SPIFFE | Workload identity | — |
| ClickHouse (optional) | Cold-tier audit query | — |
6. Secrets
All secrets injected via Vault Agent sidecar at /vault/secrets/.
| Secret | Vault path | Purpose |
|---|---|---|
| Postgres credentials | secret/data/cbc/db | PG auth (dynamic, short-lived) |
| NATS NKey | secret/data/cbc/nats-nkey | NATS auth |
| MNO CBE credentials (per MNO) | secret/data/cbc/mno/{mnoId}/cbe-creds | Per-MNO adapter auth |
| HSM PIN | secret/data/cbc/hsm-pin | PKCS#11 session open |
| SPIRE SVID | — | Auto-rotated by SPIRE agent hourly |
| Trust anchors | secret/data/cbc/trust-anchors | National-PKI root + intermediate certs |
| ATRA SFTP (for sub-packages) | secret/data/cbc/atra-sftp | If Ghasi archives drill reports to ATRA |
No secret is ever written to a ConfigMap, Helm values file, or source repository.
7. Config
ConfigMap cbc-bridge-config holds non-secret runtime config:
apiVersion: v1
kind: ConfigMap
metadata: { name: cbc-bridge-config, namespace: ghasi-prod }
data:
LOG_LEVEL: "info"
GRPC_TLS_ENABLED: "true"
REGION: "kbl"
DISPATCH_TIMEOUT_SECONDS: "30"
CANCEL_WINDOW_SECONDS: "60"
DRILL_CADENCE_CRON: "0 10 1-7 * 2"
CELL_DB_REFRESH_CRON: "0 3 * * 0"
AUDIT_VERIFIER_CRON: "0 2 * * *"
AUTH_CALLER_CACHE_TTL_SECONDS: "300"
CRL_CACHE_TTL_SECONDS: "14400"
OCSP_STAPLE_REQUIRED: "true"
BROADCAST_REPLAY_WINDOW_SECONDS: "300"
NATS_STREAM_PREFIX: "CBC"
DEFAULT_CBS_SERIAL_BASE: "1"
CBE_ADAPTER_AWCC: "standard3gpp"
CBE_ADAPTER_ROSHAN: "ericsson"
CBE_ADAPTER_ETISALAT_AF: "standard3gpp"
CBE_ADAPTER_MTN_AF: "huawei"
CBE_ADAPTER_SALAAM: "standard3gpp"
Adapter selection is config-driven so a new MNO or protocol-change can be deployed without image rebuild.
8. Scaling Considerations
Because broadcast traffic is rare, the replica floor (3) is driven by availability, not load:
- 3 replicas across 3 AZs ensures a zone outage doesn't take down the service.
- HPA scales up on broadcast-rate metric for burst events (national emergency).
- Scale-down is slow (10-min stabilisation) to avoid thrashing during investigation windows.
HSM capacity: each pod uses up to 4 HSM sessions. HSM pair supports 200+ concurrent sessions — ample headroom.
9. Deployment Gate Checklist
Before a new image is promoted to ghasi-prod:
- All 16 spec docs at Complete status.
- Canary deploy to 1 replica for 30 min; broadcast-accept success + zero new critical alerts.
- Staging GameDay passes (see TESTING_STRATEGY §6).
-
kubectl diffshows no surprise changes. - On-call acknowledges + approves.
- CISO + CTO both sign off for emergency-broadcast-impacting changes.
- Rollback tested: reverting to previous image restores P99 within 5 min.
10. Cost Envelope
Approximate per-region monthly cost at steady state (3 replicas + 1 audit verifier + 1 drill scheduler + 1 cell-db refresher + 1 cert monitor):
| Component | Monthly |
|---|---|
| Compute | ~$180 (3 × small-ish TypeScript pods + 4 cron workers) |
| Postgres (shared cluster, cbc tables only) | ~$40 |
| Redis (shared) | ~$20 |
| NATS (shared) | ~$15 |
| HSM (shared per-region) | ~$800 amortised |
| S3 | < $10 |
| Egress to MNO CBE | per-MNO contract (dedicated link) |
HSM dominates cost; shared across multiple services (compliance-engine, cdr-mediation, regulator-portal, consent-ledger).