sms-firewall-service — Service Risk Register
Version: 1.0 Status: Draft Owner: Trust and Safety + Security + SRE Last Updated: 2026-04-21 References: FAILURE_MODES.md, SECURITY_MODEL.md, ADR-0004
Known service-level risks with owners, mitigations, and residual-risk classification. Scored 1–5 Likelihood × Impact; residual must be ≤ Medium for GA.
1. Risk Summary
| ID | Risk | Category | Likelihood | Impact | Pre-mitigation | Residual | Owner |
|---|---|---|---|---|---|---|---|
| FW-RISK-01 | False-positive BLOCKs drop legitimate OTP traffic | Correctness | 4 | 5 | Critical | Medium | Trust & Safety |
| FW-RISK-02 | ML model biased against a specific MNO gateway | ML fairness | 3 | 4 | High | Medium | Trust & Safety + ML Ops |
| FW-RISK-03 | Adversarial homoglyph / encoded-payload bypass | Security | 3 | 4 | High | Low | Security |
| FW-RISK-04 | Blocklist-federation source delivers poisoned entries | Dependency | 2 | 5 | High | Low | Security |
| FW-RISK-05 | Audit hash-chain break loses regulator-defensibility | Correctness | 2 | 5 | High | Low | Trust & Safety |
| FW-RISK-06 | Fail-closed Postgres incident causes national SMS outage | Availability | 2 | 5 | High | Medium | SRE |
| FW-RISK-07 | Emergency-bypass abuse by privileged insider | Insider | 1 | 5 | Medium | Low | Security + Legal |
| FW-RISK-08 | Fingerprint-storm adversary exhausts cache / rate-limit | Adversarial | 3 | 3 | Medium | Low | Security |
| FW-RISK-09 | Cross-region blocklist-state divergence | Correctness | 2 | 3 | Medium | Low | Platform Arch |
| FW-RISK-10 | Rule rollback under pressure breaks enforcement | Process | 2 | 3 | Medium | Low | Trust & Safety |
| FW-RISK-11 | Legitimate bulk-sender blocked as AIT / SIM-box | ML | 3 | 3 | Medium | Low | Trust & Safety |
| FW-RISK-12 | Federation partner disputes Ghasi blocklist entries | Political | 2 | 3 | Medium | Medium | Regulator Liaison |
| FW-RISK-13 | GDPR subject-access request on historical block | Legal | 2 | 3 | Medium | Low | Legal |
| FW-RISK-14 | HSM outage pauses outbound federation export | Dependency | 2 | 2 | Low | Low | Security |
| FW-RISK-15 | ML model drift causes silent detection-rate degradation | ML | 3 | 4 | High | Medium | ML Ops |
2. Risk Details
FW-RISK-01 — False-positive BLOCKs drop OTP
Banking OTP traffic gets false-positive AIT classification during a spike.
Mitigation. Shadow-mode required for rule/model updates; automatic rollback on BLOCK rate > baseline + 50% for 10 min; per-tenant whitelist for design-partner banks; trusted-tenant fast-path (EP-CE-13) bypasses firewall for pre-approved templates; tenant escalation dashboard.
Residual. Medium.
FW-RISK-02 — ML bias against specific MNO
AIT model under-represents one MNO's legitimate bulk traffic.
Mitigation. Fairness audit on model (per-MNO recall/precision; fail CI if disparate recall > 15%); balanced training corpus; per-MNO block-rate monitoring post-launch; human-in-loop on highest-confidence only.
Residual. Medium.
FW-RISK-03 — Adversarial homoglyph / encoded-payload bypass
Attacker obfuscates content to bypass rules.
Mitigation. NFKC + TR39 normalisation at ingest; canonicalisation before match; 500+ homoglyph corpus test in CI; security review on new rule types.
Residual. Low.
FW-RISK-04 — Poisoned federation entries
Compromised source pushes malicious entries.
Mitigation. Federation source auth via HSM-signed mutual certs; anomaly detection (sudden > 1 000 entries triggers review); dual-source corroboration for public-figure / bank-class entries; rollback capability.
Residual. Low.
FW-RISK-05 — Audit hash-chain break
Bug or tamper corrupts the chain.
Mitigation. Daily verifier; canonicalised payload (RFC 8785); two-implementation cross-check; weekly tamper-detection drill.
Residual. Low.
FW-RISK-06 — Fail-closed outage blocks national SMS
Postgres outage takes down firewall.
Mitigation. Postgres HA with auto-failover ≤ 30 s; Redis cache masks 5 min; multi-region fail-over ≤ 15 min; emergency-bypass for P0/P1 only (dual-approval, time-boxed).
Residual. Medium.
FW-RISK-07 — Emergency-bypass insider abuse
Insider engages bypass for personal gain.
Mitigation. Dual-approval (CISO + CTO); time-boxed ≤ 1 h; prominent SIEM audit event; real-time alert to CEO + Board Secretary; quarterly engagement review.
Residual. Low.
FW-RISK-08 — Fingerprint storm
Attacker rotates JA3 fingerprints at extreme rate.
Mitigation. Cloudflare + Kong edge absorbs; LFU cache; tarpit; scale-out; manual edge-filter runbook.
Residual. Low.
FW-RISK-09 — Cross-region divergence
Blocklist state differs between regions.
Mitigation. Logical replication with LWW; hourly reconciliation cron; alert on > 100 rows for 1 h.
Residual. Low.
FW-RISK-10 — Rule rollback breaks enforcement
Protective rule rolled back under pressure; real abuse slips through.
Mitigation. Rule rollback requires reason + ticket; time-boxed (auto-re-enable after 7 d unless replaced); T&S lead sign-off required.
Residual. Low.
FW-RISK-11 — Legitimate bulk-sender flagged
High-volume legitimate use misclassified.
Mitigation. Pre-registered bulk-sender exemption; ML model consumes sender-ID reputation; human-in-loop on high volume; per-tenant tier whitelist.
Residual. Low.
FW-RISK-12 — Federation partner dispute
MNO disputes Ghasi's federation entries.
Mitigation. Per-entry provenance in export; conflict-resolution via Regulator Liaison; MNO attestation in federation agreement.
Residual. Medium.
FW-RISK-13 — GDPR subject-access on historical block
Citizen asks for data held about them.
Mitigation. MSISDN-hash tokenisation in audit; Legal-drafted response format; 30-d SLA.
Residual. Low.
FW-RISK-14 — HSM outage pauses federation export
Export can't be signed.
Mitigation. HSM HA with regional quorum; export queues; backup manual-signing with dual-control.
Residual. Low.
FW-RISK-15 — ML drift
Attacker traffic evolves; model silently loses recall.
Mitigation. Continuous model-accuracy monitoring (held-out test + weekly freshly-labelled corpus); drift alert on F1 drop > 5%; quarterly retraining cadence.
Residual. Medium.
3. Residual-Risk Summary
| Residual | Count | Acceptance |
|---|---|---|
| Low | 10 | Accepted for GA |
| Medium | 5 | Accepted with mitigation commitments and named owners |
| High | 0 | — |
4. Risk Review Cadence
- Weekly during development (Platform Arch).
- Monthly post-GA (T&S + SRE + Security).
- Quarterly (Regulator Liaison + Legal + CTO).