SMS Firewall (sms-firewall-service) — Service Overview
Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-20 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · AI_INTEGRATION Related ADR: ADR-0004 National-Backbone Resilience §3
1. Purpose — The National Perimeter for SMS Traffic
The SMS Firewall is the national perimeter enforcement layer for the Ghasi SMS backbone. Conceptually it sits at every SMPP-facing boundary — both inbound MO (mobile-originated) traffic from MNOs into the platform, and inbound transit MT (mobile-terminated) traffic from peer aggregators that attempt to inject messages destined for Afghan subscribers.
The firewall is a first-class network-layer service, equivalent in role to:
- Edge / WAF for HTTP traffic (Kong + Cloudflare)
- Compliance Layer for outbound tenant-originated SMS (
compliance-engine) - Consent Ledger for opt-in / DND enforcement (
consent-ledger-service)
Where the Compliance Layer protects the platform's own tenants from violating regulator rules, and the Consent Ledger protects subscribers from unwanted opt-out violations, the SMS Firewall protects the national network itself from:
- AIT (Artificially Inflated Traffic) entering via inbound MO bind
- SIM-box / grey-route MT traffic from peer aggregators bypassing legitimate interconnect
- Spoofed sender IDs in transit MT not matching the origin AS / aggregator
- Content-class violations (regulator-forbidden categories — gambling, certain political content, malware payloads)
- Geo / origin violations (e.g. inbound MO claiming to originate from a non-Afghan MSISDN range over an Afghan MNO bind)
- DND violations at the perimeter (national DND list applied before the message reaches
routing-engine) - Rate / volume floods (per source MSISDN, per source aggregator, per destination MSISDN)
The firewall produces a verdict (ALLOW, FLAG, BLOCK, QUARANTINE) for every PDU that crosses the perimeter, and emits structured events that downstream services (routing-engine, fraud-intel-service, cdr-mediation-service, regulator-portal-service) consume.
2. Position in the Platform — Two Inbound Choke-Points
┌──────────────────────────────────────┐
│ MNO peers (AWCC, Roshan, Etisalat, │
│ MTN AF, Salaam) — inbound MO │
└──────────────────┬───────────────────┘
│ SMPP deliver_sm (MO)
▼
┌────────────────────────────────────────────────┐
│ smpp-connector-{mno}-rx / -trx (data-plane) │
└──────────────────┬─────────────────────────────┘
│ NATS sms.mo.inbound.raw
▼
╔═════════════════════════════════════════════════╗
║ sms-firewall-service ║
║ gRPC FilterInbound(MO) — invoked synchronously║
║ ║
║ ┌────────────────────────────────────────┐ ║
║ │ origin / content / rate / geo / DND │ ║
║ │ rule pipeline (CEL-style expressions) │ ║
║ └─────────────┬──────────────────────────┘ ║
║ │ ║
║ ┌─────────┴──────────┐ ║
║ ▼ ▼ ║
║ ALLOW / FLAG BLOCK / QUARANTINE ║
║ │ │ ║
║ ▼ ▼ ║
║ downstream firewall.alert.* ║
║ (routing / firewall.audit.* ║
║ consent / NATS ║
║ fraud-intel) ║
╚═════════════════════════════════════════════════╝
┌──────────────────────────────────────┐
│ Peer aggregators / international │
│ carriers — transit MT inbound │
└──────────────────┬───────────────────┘
│ SMPP submit_sm (transit MT)
▼
┌────────────────────────────────────────────────┐
│ smpp-connector-transit-rx (data-plane) │
└──────────────────┬─────────────────────────────┘
│ gRPC EvaluateTransit
▼
┌─────────────────────────────────────────────────┐
│ sms-firewall-service │
│ — peer-aggregator hygiene │
│ — grey-route exclusion (HLR / MNP cross-check) │
│ — sender-ID origin verification │
└──────────────────┬──────────────────────────────┘
│ verdict
▼
routing-engine (or BLOCK terminal)
3. Bounded Context
| Dimension | Value |
|---|---|
| Domain | Trust & Safety / National Perimeter |
| Owner squad | Trust & Safety |
| Deployment unit | Kubernetes Deployment — sms-firewall-service (per Afghan region: kbl, mzr) |
| Communication style | Inbound: gRPC (mTLS, from smpp-connector pods) · HTTP REST (admin) · NATS (rule federation, blocklist sync) |
| Storage | PostgreSQL schema firewall · Redis (rule cache, rate counters, blocklist Bloom filter) · MinIO (signed audit archive) |
| Failure mode | Fail-closed for transit MT (no transit traffic enters without a verdict); fail-open with quarantine for inbound MO (subscriber MO must not be silently dropped, but is shunted to a quarantine bucket reviewable by NOC) |
| Region affinity | Region-pinned per smpp-connector pool; rules and blocklists are mirrored cross-region via NATS JetStream stream firewall.audit.* |
4. Responsibilities
| # | Responsibility |
|---|---|
| R1 | Accept FilterInbound gRPC calls from smpp-connector-{mno}-rx/-trx for every inbound MO PDU and return a verdict within P95 ≤ 30 ms (data-plane SLO) |
| R2 | Accept EvaluateTransit gRPC calls from smpp-connector-transit-rx for inbound MT PDUs from peer aggregators and return a verdict within P95 ≤ 50 ms |
| R3 | Evaluate against the active rule set: origin (source-addr / source-MNO / source-AS), content (keyword / regex / classifier), rate (per src-MSISDN, per dst-MSISDN, per aggregator), geo (MCC/MNC, A-number country), DND (national list), AIT signature, SIM-box signature, grey-route signature |
| R4 | Maintain a per-source sliding-window rate governor in Redis (fw:rate:src-msisdn:{e164}:{window}) for all configured time windows (1 s, 1 min, 5 min, 1 h, 24 h) |
| R5 | Maintain a national-blocklist Bloom filter in Redis updated from regulator-portal-service and from cross-MNO federation; backed by firewall.blocklist_entries Postgres table |
| R6 | Quarantine messages with verdict = QUARANTINE into the firewall.quarantine_queue Postgres table for NOC manual review (24 h auto-expiry) |
| R7 | Expose a REST admin API (POST /v1/admin/firewall/rules, GET /v1/admin/firewall/quarantine, POST /v1/admin/firewall/quarantine/{id}/release) for NOC and Trust & Safety leads |
| R8 | Publish firewall.alert.* (subscriber-impacting) and firewall.audit.* (regulator-grade evidence) events to NATS JetStream; firewall.audit.* is mirrored to mzr and to dxb cold archive |
| R9 | Federate national blocklist updates with peer MNOs and ATRA via the regulator portal (POST /v1/internal/firewall/federation/import consumer) |
| R10 | Provide read-only metrics (firewall_verdict_total{verdict,rule_id,mno}, firewall_inbound_pdus_total, firewall_quarantine_depth, firewall_rule_eval_seconds) for Prometheus scrape |
5. Non-Responsibilities
- Does not evaluate outbound tenant SMS — that is
compliance-engine's job - Does not decide DLR routing or operator selection — that is
routing-engine - Does not persist subscriber consent — that is
consent-ledger-service(firewall reads the DND projection, it does not own it) - Does not train or score ML fraud models — that is
fraud-intel-service(firewall consumes the publishedfraud.detected.*events and applies them as a rule input) - Does not generate CDRs — that is
cdr-mediation-service - Does not terminate SMPP binds or hold session state with MNOs —
smpp-connector-{mno}-{tx|rx|trx}owns bind state
6. Upstream / Downstream Dependencies
| Direction | Service | Protocol | Purpose |
|---|---|---|---|
| Inbound caller | smpp-connector-{mno}-rx / smpp-connector-{mno}-trx | gRPC (mTLS, SPIFFE) | FilterInbound(MoContext) per inbound deliver_sm PDU |
| Inbound caller | smpp-connector-transit-rx | gRPC (mTLS, SPIFFE) | EvaluateTransit(TransitMtContext) per inbound submit_sm PDU |
| Inbound caller | routing-engine | gRPC (mTLS) | CheckOutboundEgress(routeId, dstMsisdn) for DND-at-perimeter check on tenant-originated MT (read-only verdict) |
| Inbound admin | admin-dashboard (NOC console) | HTTP REST (mTLS, JWT + role=tns-admin / noc) | Rule CRUD, quarantine review, blocklist management |
| Inbound event | fraud-intel-service | NATS JetStream fraud.detected.v1 | Promotes high-confidence fraud verdicts into firewall rule inputs (e.g. SIM-box signature) |
| Inbound event | consent-ledger-service | NATS JetStream consent.dnd.snapshot.v1 | Hourly snapshot of national DND list, materialised into Redis Bloom filter |
| Inbound event | regulator-portal-service | NATS JetStream regulator.blocklist.published.v1 | Regulator-mandated blocklist entries (sender IDs, MSISDN ranges) |
| Outbound read/write | PostgreSQL firewall schema | TCP (pg driver) | Rules, blocklist entries, quarantine queue, audit log (partitioned monthly) |
| Outbound cache | Redis (region-local) | TCP | Rule set hot cache, rate counters, blocklist Bloom filter |
| Outbound events | NATS JetStream | TCP | firewall.alert.*, firewall.audit.*, firewall.quarantine.* |
| Outbound (sink) | routing-engine | (none — verdict is returned over the originating gRPC call; routing-engine receives the verdict via the calling smpp-connector pod or via NATS) | — |
| Outbound (sink) | fraud-intel-service | NATS JetStream firewall.audit.v1 | Firewall evidence is fed into fraud-intel ML training |
| Outbound (sink) | cdr-mediation-service | NATS JetStream firewall.audit.v1 | Firewall verdict is appended to the inbound MO CDR |
7. High-Level Flow — Inbound MO
8. High-Level Flow — Transit MT
9. Runtime Topology Summary
| Aspect | Value |
|---|---|
| Process model | Single-binary NestJS application; gRPC server (port 50061) + HTTP REST admin (port 3061) + Prometheus /metrics (port 9061) |
| Replicas | kbl: minimum 4, HPA scales on firewall_rule_eval_seconds P95 > 25 ms; mzr: minimum 3 |
| Node pool | np-data (telecom-grade NICs, dedicated egress VLAN to MNO IPSec tunnels) |
| Zonal placement | Pod anti-affinity across 3 AZs per region |
| Startup | Loads active rule set from Postgres into Redis fw:rules:active hash on boot; rebuilds Bloom filter from firewall.blocklist_entries |
| Hot reload | Rule edits via REST trigger firewall.rule.changed.v1 event; all replicas refresh cache within 5 s |
| Shutdown | SIGTERM → drain in-flight gRPC (max 10 s) → close NATS consumers → flush rate counters → exit |
| Region affinity | Region-pinned (no cross-region failover for the data-plane verdict path; control-plane data — rules, blocklist — replicates via Postgres logical replication) |
10. Key Design Decisions
| Decision | Rationale |
|---|---|
Firewall is a synchronous gRPC call from smpp-connector, not an async NATS step | An MNO MO PDU has a strict response window (deliver_sm_resp must come back within seconds); a synchronous verdict keeps the bind healthy and avoids enqueue / dequeue latency |
| Fail-closed for transit MT, fail-open with quarantine for inbound MO | Transit MT is third-party traffic with no subscriber relationship — silent block is safe. Inbound MO from a real Afghan subscriber must never be silently dropped on a service outage; quarantine + NOC review preserves the subscriber relationship |
Rules expressed as CEL-style sandboxed expressions with typed inputs (src.msisdn, dst.msisdn, mno.id, pdu.body, pdu.coding, peer.asn, consent.dndPresent) | Same authoring model as compliance-engine so Trust & Safety can move between the two; sandboxed and auditable; no arbitrary code |
| Region-pinned, not multi-master | The verdict path must be sub-50 ms; cross-region consensus adds latency. Control-plane (rules / blocklist) replicates asynchronously |
| Bloom filter for national blocklist in Redis | National DND + blocklist may exceed 5 M entries; a Bloom filter gives O(1) lookup at constant memory; false-positives fall through to a definitive Postgres check |
Audit log is append-only, partitioned monthly, mirrored to dxb WORM bucket | Regulator (ATRA) requires 7-year retention with tamper evidence — append-only + immutable cold archive satisfies this |
Verdict events are dual-channel: synchronous (over the calling gRPC stream) + async (firewall.audit.v1 to NATS) | Synchronous gives the connector a definitive answer; async gives downstream consumers (fraud-intel, cdr-mediation, regulator-portal) a single canonical evidence stream |
Per-bind concurrency cap on FilterInbound (default 200 in-flight) | Prevents one runaway MNO bind (e.g. AIT flood from a compromised aggregator) from starving evaluation capacity for healthy binds |
| No tenant ID on inbound MO firewall path | Inbound MO does not yet belong to a tenant — tenant resolution happens later in the consent / routing layer. Firewall keys exclusively on srcMsisdn, dstMsisdn, mnoBindId |
11. Service Operating Modes
| Mode | Trigger | Behaviour |
|---|---|---|
| NORMAL | Default | Full rule evaluation, all verdict types active |
| DEGRADED | Postgres or Redis unhealthy | Falls back to in-memory rule cache (last known good); rate counters become best-effort; quarantine writes are queued in local disk WAL until Postgres recovers |
| PANIC | Verdict latency > 100 ms P95 sustained 60 s | Disables expensive rules (regex / classifier); origin + blocklist Bloom checks remain active; emits firewall.mode.panic.entered.v1 |
| MAINTENANCE | Manual via admin REST | Returns ALLOW + flag=MAINTENANCE for all calls; full audit; only used during planned rule-engine upgrades; requires NOC + Trust & Safety dual approval |
12. Cross-Service Citations
| Related epic | Owner service | Why it matters here |
|---|---|---|
EP-FW-01 Inbound MO Firewall | sms-firewall-service (this) | Defines FilterInbound contract used by all smpp-connector-*-rx pods |
EP-FW-02 Transit MT Firewall | sms-firewall-service (this) | Defines EvaluateTransit consumed by smpp-connector-transit-rx; downstream sink is routing-engine |
EP-FW-03 National Blocklist Federation | sms-firewall-service (this) | Consumes regulator.blocklist.published.v1 from regulator-portal-service; produces federated diff to peer MNOs |
EP-CONS-02 STOP-Keyword Handling | consent-ledger-service | Firewall surfaces STOP-keyword candidates from pdu.body and forwards to consent-ledger; consent revocation is emitted by consent-ledger and replayed back into firewall's DND projection |
EP-FRAUD-02 SIM-Box / Grey-Route Detection | fraud-intel-service | Firewall provides the raw evidence; fraud-intel produces the high-confidence signatures that the firewall promotes to BLOCK rules |
EP-RE-* (routing-engine) | routing-engine | Routing engine calls CheckOutboundEgress on tenant-originated MT to enforce national DND at the perimeter before a carrier submit |
13. Open Questions
| ID | Question | Owner | Target |
|---|---|---|---|
| OQ-FW-01 | Does ATRA require firewall verdict events to be co-signed by an HSM-held key for regulator submission? | Regulator Liaison | 2026-05-30 |
| OQ-FW-02 | Should the Bloom filter false-positive rate be 1% (current) or tightened to 0.1% (4× memory)? | Trust & Safety | 2026-05-15 |
| OQ-FW-03 | Are MNO peers willing to consume our federated blocklist diff over MNO IPSec, or via a regulator-mediated SFTP only? | Carrier Relations | 2026-06-15 |