Skip to main content

SMS Firewall (sms-firewall-service) — Service Overview

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-20 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · AI_INTEGRATION Related ADR: ADR-0004 National-Backbone Resilience §3


1. Purpose — The National Perimeter for SMS Traffic

The SMS Firewall is the national perimeter enforcement layer for the Ghasi SMS backbone. Conceptually it sits at every SMPP-facing boundary — both inbound MO (mobile-originated) traffic from MNOs into the platform, and inbound transit MT (mobile-terminated) traffic from peer aggregators that attempt to inject messages destined for Afghan subscribers.

The firewall is a first-class network-layer service, equivalent in role to:

  • Edge / WAF for HTTP traffic (Kong + Cloudflare)
  • Compliance Layer for outbound tenant-originated SMS (compliance-engine)
  • Consent Ledger for opt-in / DND enforcement (consent-ledger-service)

Where the Compliance Layer protects the platform's own tenants from violating regulator rules, and the Consent Ledger protects subscribers from unwanted opt-out violations, the SMS Firewall protects the national network itself from:

  1. AIT (Artificially Inflated Traffic) entering via inbound MO bind
  2. SIM-box / grey-route MT traffic from peer aggregators bypassing legitimate interconnect
  3. Spoofed sender IDs in transit MT not matching the origin AS / aggregator
  4. Content-class violations (regulator-forbidden categories — gambling, certain political content, malware payloads)
  5. Geo / origin violations (e.g. inbound MO claiming to originate from a non-Afghan MSISDN range over an Afghan MNO bind)
  6. DND violations at the perimeter (national DND list applied before the message reaches routing-engine)
  7. Rate / volume floods (per source MSISDN, per source aggregator, per destination MSISDN)

The firewall produces a verdict (ALLOW, FLAG, BLOCK, QUARANTINE) for every PDU that crosses the perimeter, and emits structured events that downstream services (routing-engine, fraud-intel-service, cdr-mediation-service, regulator-portal-service) consume.


2. Position in the Platform — Two Inbound Choke-Points

┌──────────────────────────────────────┐
│ MNO peers (AWCC, Roshan, Etisalat, │
│ MTN AF, Salaam) — inbound MO │
└──────────────────┬───────────────────┘
│ SMPP deliver_sm (MO)

┌────────────────────────────────────────────────┐
│ smpp-connector-{mno}-rx / -trx (data-plane) │
└──────────────────┬─────────────────────────────┘
│ NATS sms.mo.inbound.raw

╔═════════════════════════════════════════════════╗
║ sms-firewall-service ║
║ gRPC FilterInbound(MO) — invoked synchronously║
║ ║
║ ┌────────────────────────────────────────┐ ║
║ │ origin / content / rate / geo / DND │ ║
║ │ rule pipeline (CEL-style expressions) │ ║
║ └─────────────┬──────────────────────────┘ ║
║ │ ║
║ ┌─────────┴──────────┐ ║
║ ▼ ▼ ║
║ ALLOW / FLAG BLOCK / QUARANTINE ║
║ │ │ ║
║ ▼ ▼ ║
║ downstream firewall.alert.* ║
║ (routing / firewall.audit.* ║
║ consent / NATS ║
║ fraud-intel) ║
╚═════════════════════════════════════════════════╝

┌──────────────────────────────────────┐
│ Peer aggregators / international │
│ carriers — transit MT inbound │
└──────────────────┬───────────────────┘
│ SMPP submit_sm (transit MT)

┌────────────────────────────────────────────────┐
│ smpp-connector-transit-rx (data-plane) │
└──────────────────┬─────────────────────────────┘
│ gRPC EvaluateTransit

┌─────────────────────────────────────────────────┐
│ sms-firewall-service │
│ — peer-aggregator hygiene │
│ — grey-route exclusion (HLR / MNP cross-check) │
│ — sender-ID origin verification │
└──────────────────┬──────────────────────────────┘
│ verdict

routing-engine (or BLOCK terminal)

3. Bounded Context

DimensionValue
DomainTrust & Safety / National Perimeter
Owner squadTrust & Safety
Deployment unitKubernetes Deploymentsms-firewall-service (per Afghan region: kbl, mzr)
Communication styleInbound: gRPC (mTLS, from smpp-connector pods) · HTTP REST (admin) · NATS (rule federation, blocklist sync)
StoragePostgreSQL schema firewall · Redis (rule cache, rate counters, blocklist Bloom filter) · MinIO (signed audit archive)
Failure modeFail-closed for transit MT (no transit traffic enters without a verdict); fail-open with quarantine for inbound MO (subscriber MO must not be silently dropped, but is shunted to a quarantine bucket reviewable by NOC)
Region affinityRegion-pinned per smpp-connector pool; rules and blocklists are mirrored cross-region via NATS JetStream stream firewall.audit.*

4. Responsibilities

#Responsibility
R1Accept FilterInbound gRPC calls from smpp-connector-{mno}-rx/-trx for every inbound MO PDU and return a verdict within P95 ≤ 30 ms (data-plane SLO)
R2Accept EvaluateTransit gRPC calls from smpp-connector-transit-rx for inbound MT PDUs from peer aggregators and return a verdict within P95 ≤ 50 ms
R3Evaluate against the active rule set: origin (source-addr / source-MNO / source-AS), content (keyword / regex / classifier), rate (per src-MSISDN, per dst-MSISDN, per aggregator), geo (MCC/MNC, A-number country), DND (national list), AIT signature, SIM-box signature, grey-route signature
R4Maintain a per-source sliding-window rate governor in Redis (fw:rate:src-msisdn:{e164}:{window}) for all configured time windows (1 s, 1 min, 5 min, 1 h, 24 h)
R5Maintain a national-blocklist Bloom filter in Redis updated from regulator-portal-service and from cross-MNO federation; backed by firewall.blocklist_entries Postgres table
R6Quarantine messages with verdict = QUARANTINE into the firewall.quarantine_queue Postgres table for NOC manual review (24 h auto-expiry)
R7Expose a REST admin API (POST /v1/admin/firewall/rules, GET /v1/admin/firewall/quarantine, POST /v1/admin/firewall/quarantine/{id}/release) for NOC and Trust & Safety leads
R8Publish firewall.alert.* (subscriber-impacting) and firewall.audit.* (regulator-grade evidence) events to NATS JetStream; firewall.audit.* is mirrored to mzr and to dxb cold archive
R9Federate national blocklist updates with peer MNOs and ATRA via the regulator portal (POST /v1/internal/firewall/federation/import consumer)
R10Provide read-only metrics (firewall_verdict_total{verdict,rule_id,mno}, firewall_inbound_pdus_total, firewall_quarantine_depth, firewall_rule_eval_seconds) for Prometheus scrape

5. Non-Responsibilities

  • Does not evaluate outbound tenant SMS — that is compliance-engine's job
  • Does not decide DLR routing or operator selection — that is routing-engine
  • Does not persist subscriber consent — that is consent-ledger-service (firewall reads the DND projection, it does not own it)
  • Does not train or score ML fraud models — that is fraud-intel-service (firewall consumes the published fraud.detected.* events and applies them as a rule input)
  • Does not generate CDRs — that is cdr-mediation-service
  • Does not terminate SMPP binds or hold session state with MNOs — smpp-connector-{mno}-{tx|rx|trx} owns bind state

6. Upstream / Downstream Dependencies

DirectionServiceProtocolPurpose
Inbound callersmpp-connector-{mno}-rx / smpp-connector-{mno}-trxgRPC (mTLS, SPIFFE)FilterInbound(MoContext) per inbound deliver_sm PDU
Inbound callersmpp-connector-transit-rxgRPC (mTLS, SPIFFE)EvaluateTransit(TransitMtContext) per inbound submit_sm PDU
Inbound callerrouting-enginegRPC (mTLS)CheckOutboundEgress(routeId, dstMsisdn) for DND-at-perimeter check on tenant-originated MT (read-only verdict)
Inbound adminadmin-dashboard (NOC console)HTTP REST (mTLS, JWT + role=tns-admin / noc)Rule CRUD, quarantine review, blocklist management
Inbound eventfraud-intel-serviceNATS JetStream fraud.detected.v1Promotes high-confidence fraud verdicts into firewall rule inputs (e.g. SIM-box signature)
Inbound eventconsent-ledger-serviceNATS JetStream consent.dnd.snapshot.v1Hourly snapshot of national DND list, materialised into Redis Bloom filter
Inbound eventregulator-portal-serviceNATS JetStream regulator.blocklist.published.v1Regulator-mandated blocklist entries (sender IDs, MSISDN ranges)
Outbound read/writePostgreSQL firewall schemaTCP (pg driver)Rules, blocklist entries, quarantine queue, audit log (partitioned monthly)
Outbound cacheRedis (region-local)TCPRule set hot cache, rate counters, blocklist Bloom filter
Outbound eventsNATS JetStreamTCPfirewall.alert.*, firewall.audit.*, firewall.quarantine.*
Outbound (sink)routing-engine(none — verdict is returned over the originating gRPC call; routing-engine receives the verdict via the calling smpp-connector pod or via NATS)
Outbound (sink)fraud-intel-serviceNATS JetStream firewall.audit.v1Firewall evidence is fed into fraud-intel ML training
Outbound (sink)cdr-mediation-serviceNATS JetStream firewall.audit.v1Firewall verdict is appended to the inbound MO CDR

7. High-Level Flow — Inbound MO


8. High-Level Flow — Transit MT


9. Runtime Topology Summary

AspectValue
Process modelSingle-binary NestJS application; gRPC server (port 50061) + HTTP REST admin (port 3061) + Prometheus /metrics (port 9061)
Replicaskbl: minimum 4, HPA scales on firewall_rule_eval_seconds P95 > 25 ms; mzr: minimum 3
Node poolnp-data (telecom-grade NICs, dedicated egress VLAN to MNO IPSec tunnels)
Zonal placementPod anti-affinity across 3 AZs per region
StartupLoads active rule set from Postgres into Redis fw:rules:active hash on boot; rebuilds Bloom filter from firewall.blocklist_entries
Hot reloadRule edits via REST trigger firewall.rule.changed.v1 event; all replicas refresh cache within 5 s
ShutdownSIGTERM → drain in-flight gRPC (max 10 s) → close NATS consumers → flush rate counters → exit
Region affinityRegion-pinned (no cross-region failover for the data-plane verdict path; control-plane data — rules, blocklist — replicates via Postgres logical replication)

10. Key Design Decisions

DecisionRationale
Firewall is a synchronous gRPC call from smpp-connector, not an async NATS stepAn MNO MO PDU has a strict response window (deliver_sm_resp must come back within seconds); a synchronous verdict keeps the bind healthy and avoids enqueue / dequeue latency
Fail-closed for transit MT, fail-open with quarantine for inbound MOTransit MT is third-party traffic with no subscriber relationship — silent block is safe. Inbound MO from a real Afghan subscriber must never be silently dropped on a service outage; quarantine + NOC review preserves the subscriber relationship
Rules expressed as CEL-style sandboxed expressions with typed inputs (src.msisdn, dst.msisdn, mno.id, pdu.body, pdu.coding, peer.asn, consent.dndPresent)Same authoring model as compliance-engine so Trust & Safety can move between the two; sandboxed and auditable; no arbitrary code
Region-pinned, not multi-masterThe verdict path must be sub-50 ms; cross-region consensus adds latency. Control-plane (rules / blocklist) replicates asynchronously
Bloom filter for national blocklist in RedisNational DND + blocklist may exceed 5 M entries; a Bloom filter gives O(1) lookup at constant memory; false-positives fall through to a definitive Postgres check
Audit log is append-only, partitioned monthly, mirrored to dxb WORM bucketRegulator (ATRA) requires 7-year retention with tamper evidence — append-only + immutable cold archive satisfies this
Verdict events are dual-channel: synchronous (over the calling gRPC stream) + async (firewall.audit.v1 to NATS)Synchronous gives the connector a definitive answer; async gives downstream consumers (fraud-intel, cdr-mediation, regulator-portal) a single canonical evidence stream
Per-bind concurrency cap on FilterInbound (default 200 in-flight)Prevents one runaway MNO bind (e.g. AIT flood from a compromised aggregator) from starving evaluation capacity for healthy binds
No tenant ID on inbound MO firewall pathInbound MO does not yet belong to a tenant — tenant resolution happens later in the consent / routing layer. Firewall keys exclusively on srcMsisdn, dstMsisdn, mnoBindId

11. Service Operating Modes

ModeTriggerBehaviour
NORMALDefaultFull rule evaluation, all verdict types active
DEGRADEDPostgres or Redis unhealthyFalls back to in-memory rule cache (last known good); rate counters become best-effort; quarantine writes are queued in local disk WAL until Postgres recovers
PANICVerdict latency > 100 ms P95 sustained 60 sDisables expensive rules (regex / classifier); origin + blocklist Bloom checks remain active; emits firewall.mode.panic.entered.v1
MAINTENANCEManual via admin RESTReturns ALLOW + flag=MAINTENANCE for all calls; full audit; only used during planned rule-engine upgrades; requires NOC + Trust & Safety dual approval

12. Cross-Service Citations

Related epicOwner serviceWhy it matters here
EP-FW-01 Inbound MO Firewallsms-firewall-service (this)Defines FilterInbound contract used by all smpp-connector-*-rx pods
EP-FW-02 Transit MT Firewallsms-firewall-service (this)Defines EvaluateTransit consumed by smpp-connector-transit-rx; downstream sink is routing-engine
EP-FW-03 National Blocklist Federationsms-firewall-service (this)Consumes regulator.blocklist.published.v1 from regulator-portal-service; produces federated diff to peer MNOs
EP-CONS-02 STOP-Keyword Handlingconsent-ledger-serviceFirewall surfaces STOP-keyword candidates from pdu.body and forwards to consent-ledger; consent revocation is emitted by consent-ledger and replayed back into firewall's DND projection
EP-FRAUD-02 SIM-Box / Grey-Route Detectionfraud-intel-serviceFirewall provides the raw evidence; fraud-intel produces the high-confidence signatures that the firewall promotes to BLOCK rules
EP-RE-* (routing-engine)routing-engineRouting engine calls CheckOutboundEgress on tenant-originated MT to enforce national DND at the perimeter before a carrier submit

13. Open Questions

IDQuestionOwnerTarget
OQ-FW-01Does ATRA require firewall verdict events to be co-signed by an HSM-held key for regulator submission?Regulator Liaison2026-05-30
OQ-FW-02Should the Bloom filter false-positive rate be 1% (current) or tightened to 0.1% (4× memory)?Trust & Safety2026-05-15
OQ-FW-03Are MNO peers willing to consume our federated blocklist diff over MNO IPSec, or via a regulator-mediated SFTP only?Carrier Relations2026-06-15