SMS Firewall (sms-firewall-service) — Service Overview

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-20 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · AI_INTEGRATION Related ADR: ADR-0004 National-Backbone Resilience §3

1. Purpose — The National Perimeter for SMS Traffic

The SMS Firewall is the national perimeter enforcement layer for the Ghasi SMS backbone. Conceptually it sits at every SMPP-facing boundary — both inbound MO (mobile-originated) traffic from MNOs into the platform, and inbound transit MT (mobile-terminated) traffic from peer aggregators that attempt to inject messages destined for Afghan subscribers.

The firewall is a first-class network-layer service, equivalent in role to:

Edge / WAF for HTTP traffic (Kong + Cloudflare)
Compliance Layer for outbound tenant-originated SMS (compliance-engine)
Consent Ledger for opt-in / DND enforcement (consent-ledger-service)

Where the Compliance Layer protects the platform's own tenants from violating regulator rules, and the Consent Ledger protects subscribers from unwanted opt-out violations, the SMS Firewall protects the national network itself from:

AIT (Artificially Inflated Traffic) entering via inbound MO bind
SIM-box / grey-route MT traffic from peer aggregators bypassing legitimate interconnect
Spoofed sender IDs in transit MT not matching the origin AS / aggregator
Content-class violations (regulator-forbidden categories — gambling, certain political content, malware payloads)
Geo / origin violations (e.g. inbound MO claiming to originate from a non-Afghan MSISDN range over an Afghan MNO bind)
DND violations at the perimeter (national DND list applied before the message reaches routing-engine)
Rate / volume floods (per source MSISDN, per source aggregator, per destination MSISDN)

The firewall produces a verdict (ALLOW, FLAG, BLOCK, QUARANTINE) for every PDU that crosses the perimeter, and emits structured events that downstream services (routing-engine, fraud-intel-service, cdr-mediation-service, regulator-portal-service) consume.

2. Position in the Platform — Two Inbound Choke-Points

                       ┌──────────────────────────────────────┐
                       │   MNO peers (AWCC, Roshan, Etisalat, │
                       │   MTN AF, Salaam) — inbound MO        │
                       └──────────────────┬───────────────────┘
                                          │ SMPP deliver_sm (MO)
                                          ▼
              ┌────────────────────────────────────────────────┐
              │   smpp-connector-{mno}-rx / -trx (data-plane)  │
              └──────────────────┬─────────────────────────────┘
                                 │ NATS sms.mo.inbound.raw
                                 ▼
              ╔═════════════════════════════════════════════════╗
              ║  sms-firewall-service                            ║
              ║   gRPC FilterInbound(MO)  — invoked synchronously║
              ║                                                  ║
              ║   ┌────────────────────────────────────────┐     ║
              ║   │ origin / content / rate / geo / DND    │     ║
              ║   │ rule pipeline (CEL-style expressions)  │     ║
              ║   └─────────────┬──────────────────────────┘     ║
              ║                 │                                ║
              ║       ┌─────────┴──────────┐                     ║
              ║       ▼                    ▼                     ║
              ║   ALLOW / FLAG         BLOCK / QUARANTINE        ║
              ║       │                    │                     ║
              ║       ▼                    ▼                     ║
              ║   downstream         firewall.alert.*            ║
              ║   (routing /          firewall.audit.*           ║
              ║    consent /          NATS                       ║
              ║    fraud-intel)                                  ║
              ╚═════════════════════════════════════════════════╝

                       ┌──────────────────────────────────────┐
                       │  Peer aggregators / international     │
                       │  carriers — transit MT inbound        │
                       └──────────────────┬───────────────────┘
                                          │ SMPP submit_sm (transit MT)
                                          ▼
              ┌────────────────────────────────────────────────┐
              │   smpp-connector-transit-rx (data-plane)       │
              └──────────────────┬─────────────────────────────┘
                                 │ gRPC EvaluateTransit
                                 ▼
              ┌─────────────────────────────────────────────────┐
              │   sms-firewall-service                           │
              │   — peer-aggregator hygiene                      │
              │   — grey-route exclusion (HLR / MNP cross-check) │
              │   — sender-ID origin verification                │
              └──────────────────┬──────────────────────────────┘
                                 │ verdict
                                 ▼
                         routing-engine (or BLOCK terminal)

3. Bounded Context

Dimension	Value
Domain	Trust & Safety / National Perimeter
Owner squad	Trust & Safety
Deployment unit	Kubernetes `Deployment` — `sms-firewall-service` (per Afghan region: `kbl`, `mzr`)
Communication style	Inbound: gRPC (mTLS, from `smpp-connector` pods) · HTTP REST (admin) · NATS (rule federation, blocklist sync)
Storage	PostgreSQL schema `firewall` · Redis (rule cache, rate counters, blocklist Bloom filter) · MinIO (signed audit archive)
Failure mode	Fail-closed for transit MT (no transit traffic enters without a verdict); fail-open with quarantine for inbound MO (subscriber MO must not be silently dropped, but is shunted to a quarantine bucket reviewable by NOC)
Region affinity	Region-pinned per `smpp-connector` pool; rules and blocklists are mirrored cross-region via NATS JetStream stream `firewall.audit.*`

4. Responsibilities

#	Responsibility
R1	Accept `FilterInbound` gRPC calls from `smpp-connector-{mno}-rx/-trx` for every inbound MO PDU and return a verdict within P95 ≤ 30 ms (data-plane SLO)
R2	Accept `EvaluateTransit` gRPC calls from `smpp-connector-transit-rx` for inbound MT PDUs from peer aggregators and return a verdict within P95 ≤ 50 ms
R3	Evaluate against the active rule set: origin (source-addr / source-MNO / source-AS), content (keyword / regex / classifier), rate (per src-MSISDN, per dst-MSISDN, per aggregator), geo (MCC/MNC, A-number country), DND (national list), AIT signature, SIM-box signature, grey-route signature
R4	Maintain a per-source sliding-window rate governor in Redis (`fw:rate:src-msisdn:{e164}:{window}`) for all configured time windows (1 s, 1 min, 5 min, 1 h, 24 h)
R5	Maintain a national-blocklist Bloom filter in Redis updated from `regulator-portal-service` and from cross-MNO federation; backed by `firewall.blocklist_entries` Postgres table
R6	Quarantine messages with `verdict = QUARANTINE` into the `firewall.quarantine_queue` Postgres table for NOC manual review (24 h auto-expiry)
R7	Expose a REST admin API (`POST /v1/admin/firewall/rules`, `GET /v1/admin/firewall/quarantine`, `POST /v1/admin/firewall/quarantine/{id}/release`) for NOC and Trust & Safety leads
R8	Publish `firewall.alert.` (subscriber-impacting) and `firewall.audit.` (regulator-grade evidence) events to NATS JetStream; `firewall.audit.*` is mirrored to `mzr` and to `dxb` cold archive
R9	Federate national blocklist updates with peer MNOs and ATRA via the regulator portal (`POST /v1/internal/firewall/federation/import` consumer)
R10	Provide read-only metrics (`firewall_verdict_total{verdict,rule_id,mno}`, `firewall_inbound_pdus_total`, `firewall_quarantine_depth`, `firewall_rule_eval_seconds`) for Prometheus scrape

5. Non-Responsibilities

Does not evaluate outbound tenant SMS — that is compliance-engine's job
Does not decide DLR routing or operator selection — that is routing-engine
Does not persist subscriber consent — that is consent-ledger-service (firewall reads the DND projection, it does not own it)
Does not train or score ML fraud models — that is fraud-intel-service (firewall consumes the published fraud.detected.* events and applies them as a rule input)
Does not generate CDRs — that is cdr-mediation-service
Does not terminate SMPP binds or hold session state with MNOs — smpp-connector-{mno}-{tx|rx|trx} owns bind state

6. Upstream / Downstream Dependencies

Direction	Service	Protocol	Purpose
Inbound caller	`smpp-connector-{mno}-rx` / `smpp-connector-{mno}-trx`	gRPC (mTLS, SPIFFE)	`FilterInbound(MoContext)` per inbound `deliver_sm` PDU
Inbound caller	`smpp-connector-transit-rx`	gRPC (mTLS, SPIFFE)	`EvaluateTransit(TransitMtContext)` per inbound `submit_sm` PDU
Inbound caller	`routing-engine`	gRPC (mTLS)	`CheckOutboundEgress(routeId, dstMsisdn)` for DND-at-perimeter check on tenant-originated MT (read-only verdict)
Inbound admin	`admin-dashboard` (NOC console)	HTTP REST (mTLS, JWT + role=`tns-admin` / `noc`)	Rule CRUD, quarantine review, blocklist management
Inbound event	`fraud-intel-service`	NATS JetStream `fraud.detected.v1`	Promotes high-confidence fraud verdicts into firewall rule inputs (e.g. SIM-box signature)
Inbound event	`consent-ledger-service`	NATS JetStream `consent.dnd.snapshot.v1`	Hourly snapshot of national DND list, materialised into Redis Bloom filter
Inbound event	`regulator-portal-service`	NATS JetStream `regulator.blocklist.published.v1`	Regulator-mandated blocklist entries (sender IDs, MSISDN ranges)
Outbound read/write	PostgreSQL `firewall` schema	TCP (pg driver)	Rules, blocklist entries, quarantine queue, audit log (partitioned monthly)
Outbound cache	Redis (region-local)	TCP	Rule set hot cache, rate counters, blocklist Bloom filter
Outbound events	NATS JetStream	TCP	`firewall.alert.`, `firewall.audit.`, `firewall.quarantine.*`
Outbound (sink)	`routing-engine`	(none — verdict is returned over the originating gRPC call; routing-engine receives the verdict via the calling `smpp-connector` pod or via NATS)	—
Outbound (sink)	`fraud-intel-service`	NATS JetStream `firewall.audit.v1`	Firewall evidence is fed into fraud-intel ML training
Outbound (sink)	`cdr-mediation-service`	NATS JetStream `firewall.audit.v1`	Firewall verdict is appended to the inbound MO CDR

7. High-Level Flow — Inbound MO

8. High-Level Flow — Transit MT

9. Runtime Topology Summary

Aspect	Value
Process model	Single-binary NestJS application; gRPC server (port `50061`) + HTTP REST admin (port `3061`) + Prometheus `/metrics` (port `9061`)
Replicas	`kbl`: minimum 4, HPA scales on `firewall_rule_eval_seconds` P95 > 25 ms; `mzr`: minimum 3
Node pool	`np-data` (telecom-grade NICs, dedicated egress VLAN to MNO IPSec tunnels)
Zonal placement	Pod anti-affinity across 3 AZs per region
Startup	Loads active rule set from Postgres into Redis `fw:rules:active` hash on boot; rebuilds Bloom filter from `firewall.blocklist_entries`
Hot reload	Rule edits via REST trigger `firewall.rule.changed.v1` event; all replicas refresh cache within 5 s
Shutdown	SIGTERM → drain in-flight gRPC (max 10 s) → close NATS consumers → flush rate counters → exit
Region affinity	Region-pinned (no cross-region failover for the data-plane verdict path; control-plane data — rules, blocklist — replicates via Postgres logical replication)

10. Key Design Decisions

Decision	Rationale
Firewall is a synchronous gRPC call from `smpp-connector`, not an async NATS step	An MNO MO PDU has a strict response window (`deliver_sm_resp` must come back within seconds); a synchronous verdict keeps the bind healthy and avoids enqueue / dequeue latency
Fail-closed for transit MT, fail-open with quarantine for inbound MO	Transit MT is third-party traffic with no subscriber relationship — silent block is safe. Inbound MO from a real Afghan subscriber must never be silently dropped on a service outage; quarantine + NOC review preserves the subscriber relationship
Rules expressed as CEL-style sandboxed expressions with typed inputs (`src.msisdn`, `dst.msisdn`, `mno.id`, `pdu.body`, `pdu.coding`, `peer.asn`, `consent.dndPresent`)	Same authoring model as `compliance-engine` so Trust & Safety can move between the two; sandboxed and auditable; no arbitrary code
Region-pinned, not multi-master	The verdict path must be sub-50 ms; cross-region consensus adds latency. Control-plane (rules / blocklist) replicates asynchronously
Bloom filter for national blocklist in Redis	National DND + blocklist may exceed 5 M entries; a Bloom filter gives O(1) lookup at constant memory; false-positives fall through to a definitive Postgres check
Audit log is append-only, partitioned monthly, mirrored to `dxb` WORM bucket	Regulator (ATRA) requires 7-year retention with tamper evidence — append-only + immutable cold archive satisfies this
Verdict events are dual-channel: synchronous (over the calling gRPC stream) + async (`firewall.audit.v1` to NATS)	Synchronous gives the connector a definitive answer; async gives downstream consumers (`fraud-intel`, `cdr-mediation`, `regulator-portal`) a single canonical evidence stream
Per-bind concurrency cap on `FilterInbound` (default 200 in-flight)	Prevents one runaway MNO bind (e.g. AIT flood from a compromised aggregator) from starving evaluation capacity for healthy binds
No tenant ID on inbound MO firewall path	Inbound MO does not yet belong to a tenant — tenant resolution happens later in the consent / routing layer. Firewall keys exclusively on `srcMsisdn`, `dstMsisdn`, `mnoBindId`

11. Service Operating Modes

Mode	Trigger	Behaviour
NORMAL	Default	Full rule evaluation, all verdict types active
DEGRADED	Postgres or Redis unhealthy	Falls back to in-memory rule cache (last known good); rate counters become best-effort; quarantine writes are queued in local disk WAL until Postgres recovers
PANIC	Verdict latency > 100 ms P95 sustained 60 s	Disables expensive rules (regex / classifier); origin + blocklist Bloom checks remain active; emits `firewall.mode.panic.entered.v1`
MAINTENANCE	Manual via admin REST	Returns `ALLOW + flag=MAINTENANCE` for all calls; full audit; only used during planned rule-engine upgrades; requires NOC + Trust & Safety dual approval

12. Cross-Service Citations

Related epic	Owner service	Why it matters here
`EP-FW-01` Inbound MO Firewall	`sms-firewall-service` (this)	Defines `FilterInbound` contract used by all `smpp-connector-*-rx` pods
`EP-FW-02` Transit MT Firewall	`sms-firewall-service` (this)	Defines `EvaluateTransit` consumed by `smpp-connector-transit-rx`; downstream sink is `routing-engine`
`EP-FW-03` National Blocklist Federation	`sms-firewall-service` (this)	Consumes `regulator.blocklist.published.v1` from `regulator-portal-service`; produces federated diff to peer MNOs
`EP-CONS-02` STOP-Keyword Handling	`consent-ledger-service`	Firewall surfaces STOP-keyword candidates from `pdu.body` and forwards to `consent-ledger`; consent revocation is emitted by consent-ledger and replayed back into firewall's DND projection
`EP-FRAUD-02` SIM-Box / Grey-Route Detection	`fraud-intel-service`	Firewall provides the raw evidence; fraud-intel produces the high-confidence signatures that the firewall promotes to BLOCK rules
`EP-RE-*` (routing-engine)	`routing-engine`	Routing engine calls `CheckOutboundEgress` on tenant-originated MT to enforce national DND at the perimeter before a carrier submit

13. Open Questions

ID	Question	Owner	Target
OQ-FW-01	Does ATRA require firewall verdict events to be co-signed by an HSM-held key for regulator submission?	Regulator Liaison	2026-05-30
OQ-FW-02	Should the Bloom filter false-positive rate be 1% (current) or tightened to 0.1% (4× memory)?	Trust & Safety	2026-05-15
OQ-FW-03	Are MNO peers willing to consume our federated blocklist diff over MNO IPSec, or via a regulator-mediated SFTP only?	Carrier Relations	2026-06-15

1. Purpose — The National Perimeter for SMS Traffic​

2. Position in the Platform — Two Inbound Choke-Points​

3. Bounded Context​

4. Responsibilities​

5. Non-Responsibilities​

6. Upstream / Downstream Dependencies​

7. High-Level Flow — Inbound MO​

8. High-Level Flow — Transit MT​

9. Runtime Topology Summary​

10. Key Design Decisions​

11. Service Operating Modes​

12. Cross-Service Citations​

13. Open Questions​