SMS Firewall Service — Application Logic
Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · DOMAIN_MODEL · API_CONTRACTS · SECURITY_MODEL
1. Use Cases
UC-FilterInboundMo (gRPC handler — synchronous, hot path)
Trigger: smpp-connector-{mno}-rx calls SmsFirewallService/FilterInbound(MoContext) over mTLS gRPC immediately upon receipt of a deliver_sm PDU and before the connector returns deliver_sm_resp to the originating MNO.
Input: MoContext — srcMsisdn (E.164), dstMsisdn (E.164), mnoBindId, pduBody (≤ 1600 chars), pduCoding (SMPP data_coding 0/3/8), pduTon, pduNpi, recvTs, traceId, smppSequenceNumber.
Output: Verdict — verdict, traceId, evaluatedRuleIds[], ruleHits[], blockReason?, holdId?, effectiveTtlSeconds, flags[].
SLA: P95 ≤ 30 ms (data-plane budget), P99 ≤ 50 ms. The MNO deliver_sm_resp window is constrained at the SMPP bind level (typically enquire_link_timer 60s — but a real-time response is required to avoid bind health degradation).
Steps:
-
Caller authentication. Validate gRPC peer SVID against
spiffe://ghasi/np-data/smpp-connector-*allowlist. Non-matching →PERMISSION_DENIED. mTLS cert is reloaded by Vault Agent on rotation (every 30 days). -
MAINTENANCE-mode short-circuit. If
firewall.operating_mode = MAINTENANCE, returnALLOW + flags=["MAINTENANCE_MODE"]. Audit row is still written. Skip steps 3–10. -
Verdict cache check. Compute
pduFingerprint = sha256(srcMsisdn:dstMsisdn:senderId:body).GET fw:verdict:{pduFingerprint}from Redis (TTLeffectiveTtlSeconds). On HIT, return cached verdict immediately and skip to step 11. -
Input validation. Reject malformed
srcMsisdn/dstMsisdn(must match^\+[1-9]\d{6,14}$) withINVALID_ARGUMENT. RejectmnoBindIdnot infirewall.mno_bind_registrywithFAILED_PRECONDITION. -
Origin / blocklist check (≤ 1 ms).
BF.EXISTS fw:blocklist:national {srcMsisdn}(Redis Bloom filter, capacity 10M, fp-rate 0.01).- On Bloom HIT: definitive
SELECT 1 FROM firewall.blocklist_entries WHERE entry=$1 AND active=TRUE. On Postgres hit →BLOCK + ORIGIN_BLOCKLIST. - On Bloom MISS → not blocked.
-
Geo check (≤ 2 ms). Resolve
mnoBindIdpermitted country codes from local cache. IfsrcMsisdncountry code not inpermittedCountryCodes→BLOCK + GEO_FORBIDDEN. Onnumber-intelligence-service.Lookupavailable, additionally cross-checklineType=MOBILEandcountry=AF. If number-intelUNAVAILABLE, fall back to MCC/MNC table only and addflags=["NUMINT_UNAVAILABLE"]. -
Rate-governor check (≤ 5 ms). For each configured window (
1s,1m,5m,1h,24h):ZADD fw:rate:src-msisdn:{e164}:1m <ts> <pduFingerprint>ZREMRANGEBYSCORE fw:rate:src-msisdn:{e164}:1m -inf (now-60s)ZCARD fw:rate:src-msisdn:{e164}:1mIf
ZCARD > threshold(default10/1s,100/1m,500/1h) →BLOCK + RATE_EXCEEDED. Tenant-allowlisted short-codes use elevated thresholds fromfirewall.rate_overrides. Redis unavailable →flags=["RATE_GOVERNOR_DEGRADED"], governor skipped, metricfirewall_rate_governor_skip_total++. -
DND check (≤ 2 ms).
BF.EXISTS fw:dnd:bloom {dstMsisdn}then definitiveSELECT 1 FROM firewall.dnd_snapshot WHERE msisdn=$1. On hit →BLOCK + DND_PRESENT(only if rule typeDND_PRESENTis enabled for this scope). -
Content-class evaluation (≤ 15 ms). Load active rule set from in-process cache (refreshed every 60 s via
firewall.rule.changed.v1). Evaluate ALLOW rules first (whitelist short-circuit). Then evaluate BLOCK > QUARANTINE > FLAG rules in priority order, AND-logic within a rule. CEL-style sandboxed evaluation; per-rule timeout 50 ms; on timeout → auto-disable rule + emitfirewall.rule.degraded.v1. -
AIT / SIM-box signature check (≤ 3 ms). Lookup
fw:ait:pattern:{dstMsisdn-prefix}andfw:simbox:{srcMsisdn}. Match →BLOCK + AIT_SIGNATUREorBLOCK + SIMBOX_SIGNATURE. These signatures are populated byfraud-intel-servicevia thefraud.detected.*consumer. -
Verdict assembly + side effects:
INSERT INTO firewall.audit (...)with hash-chainedprevHash/rowHash(transactional with verdict-cache write).- If verdict = QUARANTINE:
INSERT INTO firewall.quarantine_queuewith encrypted PDU;expires_at = now() + 24h. SETEX fw:verdict:{pduFingerprint} {effectiveTtlSeconds} <verdictJson>.- Outbox: write
firewall.audit.v1event row tofirewall.outbox(transactional with audit insert). - Async: Prometheus counter increment, Pino structured log.
-
Return
Verdictto the connector.
Error mapping:
| gRPC status | Condition | Connector behaviour |
|---|---|---|
OK | Verdict returned | Act per verdict |
PERMISSION_DENIED | Caller SVID not allowlisted | Crash with auth error; PagerDuty |
INVALID_ARGUMENT | Malformed input | Connector logs, drops PDU, alerts |
FAILED_PRECONDITION | mnoBindId unregistered | Connector self-deregisters and alerts |
RESOURCE_EXHAUSTED | Per-pod concurrency cap (200/bind) hit | Connector backs off and retries on a sibling firewall replica |
UNAVAILABLE / DEADLINE_EXCEEDED (> 100ms) | Pod down / overloaded | Connector writes PDU to local-disk WAL for replay (preserve subscriber relationship); MNO deliver_sm_resp ESME_ROK so the MO is not NACK'd back to subscriber |
INTERNAL | Handler exception | Same as UNAVAILABLE — fail-closed local WAL |
UC-EvaluateTransitMt (gRPC handler — synchronous, hot path)
Trigger: smpp-connector-transit-rx calls SmsFirewallService/EvaluateTransit(TransitMtContext) for every inbound submit_sm from a peer aggregator, before dispatching to routing-engine.
Input: TransitMtContext — peerAsn, peerSystemId, srcAddr, dstMsisdn, senderId, pduBody, pduTon, pduNpi, registeredDelivery, esmClass.
Output: Verdict (same shape as UC-FilterInboundMo).
SLA: P95 ≤ 50 ms, P99 ≤ 100 ms.
Steps:
-
Caller auth + maintenance check (as UC-FilterInboundMo).
-
Peer ASN allowlist (≤ 1 ms).
SELECT 1 FROM firewall.peer_asn_allowlist WHERE peer_asn=$1 AND active=TRUE. Miss →BLOCK + PEER_ASN_UNKNOWN. Cached in process for 60s; refreshed onfirewall.peer.allowlist.changed.v1. -
Peer-quarantine check.
SELECT quarantined FROM firewall.peer_aggregators WHERE peer_id=$1. Ifquarantined=TRUE→QUARANTINE + PEER_QUARANTINED. -
Sender-ID origin verification (≤ 10 ms). Call
sender-id-registry-service.Verify(senderId, peerId):status=OWNED_BY_PEER→ continue.status=OWNERSHIP_MISMATCH→BLOCK + SENDER_ID_SPOOFED.status=SUSPENDED→BLOCK + SENDER_ID_SUSPENDED.status=UNKNOWN→QUARANTINE(NOC review; do not auto-block to avoid breaking newly-onboarded peers in their grace window).UNAVAILABLE→ fall back to local hourly cachefirewall.peer_senderid_allowlist.
-
Grey-route detection (≤ 15 ms). Resolve
dstMsisdnHLR/MNP vianumber-intelligence-service.Lookup. Compare resolvedhomeMnoIdwith peer'spermittedDstMnoIdsfromfirewall.peer_mno_routes. Mismatch →BLOCK + GREY_ROUTE. Heuristic: peer with > 30% MT to non-peered MNO over last 1000 submissions emitsfirewall.alert.greyroute.heuristic.v1(warning, not blocking). -
Content-class + rate evaluation (as UC-FilterInboundMo steps 7+9; geo skipped for transit).
-
Verdict assembly + side effects (as UC-FilterInboundMo step 11). On BLOCK, the connector returns
submit_sm_respwithcommand_status = ESME_RSUBMITFAIL (0x00000045).
Fail-closed for transit MT: UNAVAILABLE/DEADLINE_EXCEEDED → connector returns ESME_RSUBMITFAIL to peer; emit firewall.transit.unavailable.v1. No local WAL — transit traffic is third-party with no subscriber relationship.
UC-AitDetection (NATS consumer)
Trigger: fraud-intel-service publishes fraud.detected.ait.v1 after its ML pipeline classifies a destination range or originator pattern as Artificially Inflated Traffic.
Steps:
- Validate event signature against
fraud-intel-serviceJWKS. - Upsert into
firewall.ait_patternskeyed on(patternType, dstMsisdnRange). - Refresh in-process AIT lookup (
fw:ait:pattern:*keys updated; TTL 1h). - Emit
firewall.alert.ait.detected.v1. - ACK NATS message.
Idempotency: Replays produce identical state (UPSERT semantics).
UC-SimBoxDetection (NATS consumer)
Trigger: fraud-intel-service publishes fraud.detected.simbox.v1 based on graph + ML signatures (IMEI churn, A-number rotation, geo cell-id volatility).
Steps:
- Validate signature.
- Upsert into
firewall.simbox_signalskeyed onoriginator(E.164). - Refresh
fw:simbox:{originator}in Redis (TTL 24h). - Emit
firewall.simbox.detected.v1. - ACK.
UC-NationalBlocklistFederation-Import (NATS consumer)
Trigger: regulator-portal-service publishes regulator.blocklist.published.v1 (HSM-signed by ATRA).
Steps:
- Verify HSM signature. Validate event payload against ATRA public key in Vault Transit (
transit/atra-public). Failure → emitfirewall.alert.federation.signature.invalid.v1(PagerDuty severity); do NOT consume. - Idempotency check. For each
entryin payload, compute(source='REGULATOR', regulatorRef, type, value)key. - Upsert.
action='ADD'→INSERT ... ON CONFLICT (source, regulator_ref, type, value) DO UPDATE SET active=TRUE, updated_at=now().action='REMOVE'→UPDATE ... SET active=FALSE, deactivated_at=now().
- Confidence rescore. Recompute
confidence_scoreper entry (formula in DOMAIN_MODEL §6). - Bloom rebuild. If
entryCountdelta > 5% of total ORentryCount > current_capacity * 0.8, schedule background Bloom rebuild (worker picks up within 5 s). - Emit
firewall.blocklist.federated.v1withaddedCount,removedCount,regulatorRef,entryCountTotal. - Append blocklist audit (
firewall.blocklist_audit). - ACK.
UC-NationalBlocklistFederation-Export (cron job)
Trigger: Daily cron at 02:00 Asia/Kabul (single-leader via Redis lock fw:fed:export:lock).
Steps:
- Compute diff: entries with
share_with_peers=TRUEwhoseupdated_at > lastExportedAt. - Render JSON Lines: one entry per line,
{type, value, action, confidence, sources, ts}. - Sign with platform HSM key via PKCS#11 (
pkcs11:object=ghasi-firewall-fed-signer). - Upload to MinIO bucket
firewall-federation-out/{yyyymmdd}.jsonl.sig(Object Lock enabled, 7-year retention). - Mirror file to regulator-mediated SFTP within 5 minutes.
- Even on zero-diff days, emit
firewall.federation.heartbeat.v1so peer MNOs detect a stalled exporter. - On non-empty diff: emit
firewall.federation.exported.v1withexportSha256,signature,presignedUrl(24h validity).
UC-BlocklistAdminCRUD (REST handlers)
| Endpoint | Caller | Effect |
|---|---|---|
POST /v1/admin/firewall/blocklists/{id}/entries | tns-admin | Insert entry with source='OPERATOR_MANUAL'; emit firewall.blocklist.changed.v1 |
DELETE /v1/admin/firewall/blocklists/{id}/entries/{entryId} | tns-admin | Soft-delete (active=FALSE); emit event |
POST /v1/admin/firewall/blocklists/{id}/entries:bulk | tns-admin | Atomic batch insert (≤ 10000 entries); per-entry validation; rollback on first failure |
GET /v1/admin/firewall/blocklist/{entryId}/history | tns-admin / regulator-auditor | Chronological events from firewall.blocklist_audit: created, source_added, source_removed, confidence_changed, manually_overridden, deactivated, reactivated |
All admin operations are audited to firewall.blocklist_audit with actor, timestamp (UTC µs), reason.
UC-AuditQuery (REST handler)
Endpoint: GET /v1/admin/firewall/audit?from={iso8601}&to={iso8601}&direction=&verdict=&srcMsisdn=&dstMsisdn=&mnoBindId=&peerAsn=&cursor=&limit=
Caller: tns-admin, regulator-auditor.
Steps:
- RBAC enforcement —
regulator-auditorcannot specifysrcMsisdn/dstMsisdnfilters narrower than 3 digits to prevent targeted enumeration; results are masked. - Cursor-paginated query against
firewall.auditwith partition pruning (useverdict_atfirst in WHERE clause). - Mask MSISDNs in response according to caller role:
tns-admin: full MSISDN visibleregulator-auditor:+CCNNN***masked
- Audit the audit query itself (meta-audit) into
firewall.access_log.
2. Rule Pipeline Ordering Rationale
The pipeline runs fast-path → medium-path → slow-path to maximise early termination:
| Order | Check | Latency budget | Rationale |
|---|---|---|---|
| 1 | MAINTENANCE-mode short-circuit | < 0.1 ms | Single in-process flag |
| 2 | Verdict cache | 1 ms | Repeat-PDU dedup |
| 3 | Origin Bloom + PG fallthrough | 1 ms | 99% cases miss Bloom; no PG read |
| 4 | Geo (cached MCC/MNC table) | 2 ms | In-process map |
| 5 | Rate governor (Redis ZADD/ZCARD) | 5 ms | Sliding-window |
| 6 | DND Bloom + PG fallthrough | 2 ms | Same Bloom pattern |
| 7 | CEL content rules (priority order) | 15 ms | Sandboxed; per-rule 50 ms timeout |
| 8 | AIT / SIM-box signature lookup | 3 ms | Redis-cached patterns |
| 9 | Audit + outbox write | 2 ms | Transactional |
| Total budget | 30 ms (P95) |
Per SERVICE_OVERVIEW §11, PANIC mode disables type REGEX and CLASSIFIER rules to keep the budget under 30 ms during incident.
3. Quarantine Workflow (UC-QuarantineReview)
Auto-expiry: A cron worker every 5 minutes runs:
UPDATE firewall.quarantine_queue
SET status='AUTO_EXPIRED'
WHERE status='PENDING' AND expires_at < now()
RETURNING hold_id;
For each released holdId, emit firewall.quarantine.expired.v1. Distributed lock fw:quarantine:expiry:lock ensures single-leader.
4. Operating-Mode Transition Workflow
Manual switch:
- NOC operator A calls
POST /v1/admin/firewall/mode {targetMode, reason}with their JWT. Service stores pending request keyed by(targetMode, requesterId)with 60 s TTL. - Within 60 s, operator B calls same endpoint with
{targetMode, reason, secondApproverToken: <A's request id>}. IfB.userId != A.userId→ mode switches. - Single approver → HTTP 412
DUAL_APPROVAL_REQUIRED. - Switch persists to
firewall.operating_modetable; broadcast viafirewall.mode.changed.v1; all replicas hot-reload mode within 5 s.
Auto-trip to PANIC:
- Background watcher samples
firewall_rule_eval_seconds{quantile="0.95"}every 10 s. -
100 ms sustained 60 s → atomically transition to PANIC; emit
firewall.alert.mode.auto_panic.v1(PagerDuty); disable rules withtype IN ('REGEX', 'CLASSIFIER'). - < 30 ms sustained 5 m → atomically transition back to NORMAL; emit
firewall.mode.changed.v1.
5. Background Workers
| Worker | Schedule | Purpose | Distributed lock |
|---|---|---|---|
RuleCacheRefreshWorker | Every 60s + on firewall.rule.changed.v1 | Reload active rules into process cache | None (idempotent) |
BloomRebuildWorker | On-demand + nightly 02:30 Asia/Kabul | Rebuild blocklist + DND Bloom filters | fw:bloom:rebuild:lock |
QuarantineExpiryWorker | Every 5 min | Auto-expire PENDING holds past expires_at | fw:quarantine:expiry:lock |
FederationExportWorker | Daily 02:00 Asia/Kabul | Sign + publish blocklist diff | fw:fed:export:lock |
AuditVerifierWorker | Daily 03:30 Asia/Kabul | Verify hash-chain integrity in yesterday's audit partition; emit firewall.audit.chain.verified.v1 or chain-break alert | fw:audit:verify:lock |
AuditArchiveWorker | Daily 03:00 Asia/Kabul | Export yesterday's audit partition to MinIO Parquet+zstd, HSM-sign, Object-Lock 7y | fw:audit:archive:lock |
PartitionMaintenanceWorker | Daily 02:00 Asia/Kabul | Pre-create next 3 monthly partitions for firewall.audit | fw:partition:lock |
PeerHygieneScoreWorker | Every 5 min | Recompute peer hygiene scores; trigger quarantine if score < 30 | fw:peer:score:lock |
ConsentDndConsumer | Continuous | Materialise DND snapshot from consent.dnd.snapshot.v1 | NATS durable consumer (queue group) |
FraudIntelConsumer | Continuous | Process fraud.detected.* | NATS durable consumer |
RegulatorBlocklistConsumer | Continuous | Process regulator.blocklist.published.v1 | NATS durable consumer |