Skip to main content

cbc-bridge-service — Service Readiness

Version: 1.0 Status: Draft Owner: Government / Emergency + SRE Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, _report.md, FAILURE_MODES.md, TESTING_STRATEGY.md

Readiness criteria for production deployment. Because this service connects to MNO RAN cell-broadcast interfaces for civil emergency alerts, go-live requires government, regulator, and MNO partner sign-off in addition to engineering readiness.


1. Code Readiness

CriterionStatusNotes
gRPC CbcBridgeService.v1 (BroadcastEmergency, GetBroadcastStatus, CancelBroadcast, ScheduleDrill, VerifyAuthorisedCaller)
REST admin (broadcast list, audit query, caller registry CRUD, cell-DB refresh, drill schedule)
CBS PDU encoder (severity → Message Identifier; per-language DCS; multi-page)Conformance per 3GPP TS 23.041
Per-MNO adapter implementations: Standard3gppCbeAdapter, EricssonProprietaryCbeAdapter, HuaweiProprietaryCbeAdapterOne per MNO vendor stack
PKI signature verification via HSM/PKCS#11 (no in-process key fallback)Fail-loud on HSM outage
Authorised-caller registry + cert-subject mappingDual-control for registry changes
Hash-chained audit (prev_hash, record_hash) with daily verifierRFC 8785 canonicalisation
Cancellation dual-control within 60 s grace windowAtomic state transition
Monthly drill scheduler with test-range Message Identifier4370..4379 test slot
Cell-database refresh cron (weekly, per MNO)
Replay-attack defence: nonce + timestamp window5-min window
mTLS gRPC + SPIRE SVID
Idempotency on admin writes
NATS event emission (outbox + DB atomic)
Per-MNO egress IP pool (NetworkPolicy)MNO-whitelisted IPs

2. Testing Readiness

CriterionTargetStatus
Unit coverage≥ 90% line (domain), ≥ 80% branch
CBS PDU encoder tests (all severity × language combinations)≥ 40
Language conformance tests (en/fa/ps/ar round-trip)100 samples per language
Hash-chain integrity tests (append, verify, tamper-detect)≥ 15
State-machine transition tests≥ 30
Authorisation-gate tests (allowedSeverities × allowedRegions × expiry)≥ 40
Property-based tests (fast-check)≥ 10 properties × 500 runs
Integration: PKI happy + CRL + OCSP + expired + tampered + replayPassed
Integration: All 3 adapter types against mock CBEsPassed
Integration: drill fires on schedule + test Message IdentifierPassed
Integration: cancel dual-control within windowPassed
Integration: hash-chain verifier runs + detects tamperPassed
Contract test with regulator-portal-service (BroadcastEmergency provider)Passed
E2E: happy P0 + partial success + all-MNO fail + drill + cancel + audit verifyPassed
Adversarial-input corpus test (PKI bypass)0 bypasses
Chaos: HSM out → fail-closed verifiedPassed
Chaos: 1 MNO CBE out → PARTIAL verdictPassed
Chaos: all MNO CBE out → FAILED + runbook executedPassed
Chaos: Postgres out → in-flight complete via RedisPassed
Chaos: region partition → region-local operationPassed
Security: audit UPDATE/DELETE rejectedPassed
Security: authorised-caller escalation blockedPassed
Security: replay-attack rejectedPassed
Load: 100 concurrent cancellations without state-racePassed
Load: 500 bad-cert/s for 5 min does not block legitimatePassed

3. Observability Readiness

CriterionStatus
All Prometheus metrics emitting (OBSERVABILITY.md §1)
Grafana dashboard cbc-bridge-service.json deployed (NOC + Regulator + Engineering rows)
All alerts configured in Alertmanager with runbook links
Structured JSON logs (Pino); caller-org logged; no subscriber PII
OTel trace propagation end-to-end verified
Loki parsing rules validated
SIEM forwarding of cbc.audit.v1 via regulator-portal verified

Alerts Configured

  • CbcBroadcastDispatchFailureCritical (≥ 25% dispatch failure per MNO 2 min)
  • CbcBroadcastAllMnoFailed (FAILED verdict) — CEO-paging
  • CbcPkiVerifyFailureSpike (> 5 failures/min 5 min)
  • CbcHsmUnavailable (circuit open or up=0)
  • CbcAuditChainBroken — Critical, CISO-paging
  • CbcDrillOverdue (> 7 d past cadence)
  • CbcCellDatabaseStale (> 14 d)
  • CbcAuthorisedCallerCertExpiringSoon (< 14 d)
  • CbcBroadcastAcceptLatencyHigh (P99 > 1 s)
  • CbcPartialDispatchRateHigh (PARTIAL > 10% for 30 min)

4. Security Readiness

CriterionStatus
mTLS on gRPC + SPIRE SVIDs hourly rotation
National-PKI trust chain configured in HSM trust anchors
NetworkPolicy restricts ingress to regulator-portal + government-PKI callers
NetworkPolicy egress limited to per-MNO CBE CIDRs (deny everything else)
Audit UPDATE/DELETE rejected by Postgres trigger
PKI-bypass adversarial corpus passes (0 bypasses)
Replay-attack corpus passes
Authorised-caller registry immutable history; dual-control on edits
HSM HA + regional quorum (ADR-0004 §11)
Pen test against gRPC + REST admin
CISO sign-off

5. Operational Readiness

CriterionStatus
K8s Deployment (3 replicas min across 3 AZs) reviewed
HPA scaled on broadcast-rate metric with conservative scale-down
PDB minAvailable: 2
Rolling update: zero-drop verified under steady 100 broadcast/h
Graceful shutdown: 20 s SIGTERM drain for in-flight dispatches
Postgres conn pool (pgbouncer)
Redis conn pool sized
HSM pool (4 sessions/pod)
Multi-region topology (kbl primary + mzr standby + dxb DR mirror) verified
Per-MNO egress IP pool whitelisted with MNOs (exchanged during Phase 0)
Runbooks drafted for every alert
On-call rotation: Government / Emergency primary; SRE secondary; CISO executive
CEO + Board Secretary contact channel established
Out-of-band communication bridge (phone + Slack) documented

6. Documentation Readiness

All 16 SERVICE_TEMPLATE docs at "Complete". Plus:

  • Runbooks per alert (§3 list)
  • Government-caller integration guide (gRPC client + SDK)
  • MNO onboarding playbook (per MNO: adapter, egress, test-broadcast SLA)
  • Drill playbook (monthly cadence + after-action report template)
  • Regulator engagement document (audit-log format, SIEM stream contents)

7. Compliance / Regulatory Readiness

CriterionStatus
ATRA MoU for CBC service
National-PKI CA trust chain ratified by Legal + CISO
Government MoU with NDMA / Police / Civil Defence (authorised callers)
MNO MoU per operator (CBE protocol, staging endpoints, production endpoints, incident contacts)
3GPP TS 23.041 conformance declaration
Audit retention policy (13 m hot + 7 y cold) configured
Drill cadence agreed with NDMA + ATRA (monthly)
Emergency-broadcast legal authority documented (Government + Legal)
SIEM forwarding approved by regulator-portal team

8. Go/No-Go Criteria Summary

Production deployment is GO when:

  • All §1 Code Readiness complete.
  • Coverage targets met.
  • Load tests pass (at expected emergency-burst scale).
  • PKI-bypass corpus: 0 bypasses.
  • 14-day shadow mode in staging completed — drill-only with ATRA observation.
  • Chaos drill (HSM, 1 MNO, all MNO, Postgres, region partition) all recover per runbook.
  • Government + Legal + Security + Regulator Liaison + MNO partners sign-offs.
  • On-call rotation assigned and drilled once.
  • Out-of-band communication bridge tested.
  • Rollback procedure validated in staging.

9. Post-Launch Review

Within 30 days:

  • Monthly drill cadence held (100% of scheduled drills executed).
  • PKI verification success rate for legitimate callers (target: 100%).
  • Per-MNO CBE availability (target: 99.5% per MNO).
  • Audit chain integrity (0 breaks).
  • Partial-dispatch rate (target: < 1% of broadcasts go PARTIAL).
  • No unauthorised broadcasts (target: 0).
  • Government-client feedback gathered.
  • ATRA acknowledgement of audit visibility.

Within 90 days:

  • Real emergency broadcast exercise (tabletop or live drill with NDMA).
  • Cost analysis: HSM usage, MNO CBE link costs, on-call hours.
  • Cell-database accuracy (target: > 95% of national area covered per MNO).

10. Phased Rollout

PhaseDurationBehaviourExit criteria
P0 — Pre-engagement3 monthsMNO / ATRA / NDMA engagement; adapter development; test-PKI obtainedMoUs signed; adapters pass against MNO staging
P1 — Drill-only60 dService live; monthly drills only; no real emergency broadcasts2 consecutive drills 100% delivered
P2 — Non-life-critical emergencies30 dEnable P1/P2 severity for public-health / security advisoriesATRA satisfaction; no unauthorised broadcasts
P3 — Full emergencyOngoingEnable P0 for life-critical (earthquake, civil defence)N/A — steady state

Rollback at any phase via feature flag CBC_ACCEPT_SEVERITY_MAX = NONE|P2|P1|P0.