Skip to main content

SMS Firewall Service — Application Logic

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · DOMAIN_MODEL · API_CONTRACTS · SECURITY_MODEL


1. Use Cases

UC-FilterInboundMo (gRPC handler — synchronous, hot path)

Trigger: smpp-connector-{mno}-rx calls SmsFirewallService/FilterInbound(MoContext) over mTLS gRPC immediately upon receipt of a deliver_sm PDU and before the connector returns deliver_sm_resp to the originating MNO.

Input: MoContextsrcMsisdn (E.164), dstMsisdn (E.164), mnoBindId, pduBody (≤ 1600 chars), pduCoding (SMPP data_coding 0/3/8), pduTon, pduNpi, recvTs, traceId, smppSequenceNumber.

Output: Verdictverdict, traceId, evaluatedRuleIds[], ruleHits[], blockReason?, holdId?, effectiveTtlSeconds, flags[].

SLA: P95 ≤ 30 ms (data-plane budget), P99 ≤ 50 ms. The MNO deliver_sm_resp window is constrained at the SMPP bind level (typically enquire_link_timer 60s — but a real-time response is required to avoid bind health degradation).

Steps:

  1. Caller authentication. Validate gRPC peer SVID against spiffe://ghasi/np-data/smpp-connector-* allowlist. Non-matching → PERMISSION_DENIED. mTLS cert is reloaded by Vault Agent on rotation (every 30 days).

  2. MAINTENANCE-mode short-circuit. If firewall.operating_mode = MAINTENANCE, return ALLOW + flags=["MAINTENANCE_MODE"]. Audit row is still written. Skip steps 3–10.

  3. Verdict cache check. Compute pduFingerprint = sha256(srcMsisdn:dstMsisdn:senderId:body). GET fw:verdict:{pduFingerprint} from Redis (TTL effectiveTtlSeconds). On HIT, return cached verdict immediately and skip to step 11.

  4. Input validation. Reject malformed srcMsisdn / dstMsisdn (must match ^\+[1-9]\d{6,14}$) with INVALID_ARGUMENT. Reject mnoBindId not in firewall.mno_bind_registry with FAILED_PRECONDITION.

  5. Origin / blocklist check (≤ 1 ms).

    • BF.EXISTS fw:blocklist:national {srcMsisdn} (Redis Bloom filter, capacity 10M, fp-rate 0.01).
    • On Bloom HIT: definitive SELECT 1 FROM firewall.blocklist_entries WHERE entry=$1 AND active=TRUE. On Postgres hit → BLOCK + ORIGIN_BLOCKLIST.
    • On Bloom MISS → not blocked.
  6. Geo check (≤ 2 ms). Resolve mnoBindId permitted country codes from local cache. If srcMsisdn country code not in permittedCountryCodesBLOCK + GEO_FORBIDDEN. On number-intelligence-service.Lookup available, additionally cross-check lineType=MOBILE and country=AF. If number-intel UNAVAILABLE, fall back to MCC/MNC table only and add flags=["NUMINT_UNAVAILABLE"].

  7. Rate-governor check (≤ 5 ms). For each configured window (1s, 1m, 5m, 1h, 24h):

    ZADD fw:rate:src-msisdn:{e164}:1m <ts> <pduFingerprint>
    ZREMRANGEBYSCORE fw:rate:src-msisdn:{e164}:1m -inf (now-60s)
    ZCARD fw:rate:src-msisdn:{e164}:1m

    If ZCARD > threshold (default 10/1s, 100/1m, 500/1h) → BLOCK + RATE_EXCEEDED. Tenant-allowlisted short-codes use elevated thresholds from firewall.rate_overrides. Redis unavailable → flags=["RATE_GOVERNOR_DEGRADED"], governor skipped, metric firewall_rate_governor_skip_total++.

  8. DND check (≤ 2 ms). BF.EXISTS fw:dnd:bloom {dstMsisdn} then definitive SELECT 1 FROM firewall.dnd_snapshot WHERE msisdn=$1. On hit → BLOCK + DND_PRESENT (only if rule type DND_PRESENT is enabled for this scope).

  9. Content-class evaluation (≤ 15 ms). Load active rule set from in-process cache (refreshed every 60 s via firewall.rule.changed.v1). Evaluate ALLOW rules first (whitelist short-circuit). Then evaluate BLOCK > QUARANTINE > FLAG rules in priority order, AND-logic within a rule. CEL-style sandboxed evaluation; per-rule timeout 50 ms; on timeout → auto-disable rule + emit firewall.rule.degraded.v1.

  10. AIT / SIM-box signature check (≤ 3 ms). Lookup fw:ait:pattern:{dstMsisdn-prefix} and fw:simbox:{srcMsisdn}. Match → BLOCK + AIT_SIGNATURE or BLOCK + SIMBOX_SIGNATURE. These signatures are populated by fraud-intel-service via the fraud.detected.* consumer.

  11. Verdict assembly + side effects:

    • INSERT INTO firewall.audit (...) with hash-chained prevHash/rowHash (transactional with verdict-cache write).
    • If verdict = QUARANTINE: INSERT INTO firewall.quarantine_queue with encrypted PDU; expires_at = now() + 24h.
    • SETEX fw:verdict:{pduFingerprint} {effectiveTtlSeconds} <verdictJson>.
    • Outbox: write firewall.audit.v1 event row to firewall.outbox (transactional with audit insert).
    • Async: Prometheus counter increment, Pino structured log.
  12. Return Verdict to the connector.

Error mapping:

gRPC statusConditionConnector behaviour
OKVerdict returnedAct per verdict
PERMISSION_DENIEDCaller SVID not allowlistedCrash with auth error; PagerDuty
INVALID_ARGUMENTMalformed inputConnector logs, drops PDU, alerts
FAILED_PRECONDITIONmnoBindId unregisteredConnector self-deregisters and alerts
RESOURCE_EXHAUSTEDPer-pod concurrency cap (200/bind) hitConnector backs off and retries on a sibling firewall replica
UNAVAILABLE / DEADLINE_EXCEEDED (> 100ms)Pod down / overloadedConnector writes PDU to local-disk WAL for replay (preserve subscriber relationship); MNO deliver_sm_resp ESME_ROK so the MO is not NACK'd back to subscriber
INTERNALHandler exceptionSame as UNAVAILABLE — fail-closed local WAL

UC-EvaluateTransitMt (gRPC handler — synchronous, hot path)

Trigger: smpp-connector-transit-rx calls SmsFirewallService/EvaluateTransit(TransitMtContext) for every inbound submit_sm from a peer aggregator, before dispatching to routing-engine.

Input: TransitMtContextpeerAsn, peerSystemId, srcAddr, dstMsisdn, senderId, pduBody, pduTon, pduNpi, registeredDelivery, esmClass.

Output: Verdict (same shape as UC-FilterInboundMo).

SLA: P95 ≤ 50 ms, P99 ≤ 100 ms.

Steps:

  1. Caller auth + maintenance check (as UC-FilterInboundMo).

  2. Peer ASN allowlist (≤ 1 ms). SELECT 1 FROM firewall.peer_asn_allowlist WHERE peer_asn=$1 AND active=TRUE. Miss → BLOCK + PEER_ASN_UNKNOWN. Cached in process for 60s; refreshed on firewall.peer.allowlist.changed.v1.

  3. Peer-quarantine check. SELECT quarantined FROM firewall.peer_aggregators WHERE peer_id=$1. If quarantined=TRUEQUARANTINE + PEER_QUARANTINED.

  4. Sender-ID origin verification (≤ 10 ms). Call sender-id-registry-service.Verify(senderId, peerId):

    • status=OWNED_BY_PEER → continue.
    • status=OWNERSHIP_MISMATCHBLOCK + SENDER_ID_SPOOFED.
    • status=SUSPENDEDBLOCK + SENDER_ID_SUSPENDED.
    • status=UNKNOWNQUARANTINE (NOC review; do not auto-block to avoid breaking newly-onboarded peers in their grace window).
    • UNAVAILABLE → fall back to local hourly cache firewall.peer_senderid_allowlist.
  5. Grey-route detection (≤ 15 ms). Resolve dstMsisdn HLR/MNP via number-intelligence-service.Lookup. Compare resolved homeMnoId with peer's permittedDstMnoIds from firewall.peer_mno_routes. Mismatch → BLOCK + GREY_ROUTE. Heuristic: peer with > 30% MT to non-peered MNO over last 1000 submissions emits firewall.alert.greyroute.heuristic.v1 (warning, not blocking).

  6. Content-class + rate evaluation (as UC-FilterInboundMo steps 7+9; geo skipped for transit).

  7. Verdict assembly + side effects (as UC-FilterInboundMo step 11). On BLOCK, the connector returns submit_sm_resp with command_status = ESME_RSUBMITFAIL (0x00000045).

Fail-closed for transit MT: UNAVAILABLE/DEADLINE_EXCEEDED → connector returns ESME_RSUBMITFAIL to peer; emit firewall.transit.unavailable.v1. No local WAL — transit traffic is third-party with no subscriber relationship.


UC-AitDetection (NATS consumer)

Trigger: fraud-intel-service publishes fraud.detected.ait.v1 after its ML pipeline classifies a destination range or originator pattern as Artificially Inflated Traffic.

Steps:

  1. Validate event signature against fraud-intel-service JWKS.
  2. Upsert into firewall.ait_patterns keyed on (patternType, dstMsisdnRange).
  3. Refresh in-process AIT lookup (fw:ait:pattern:* keys updated; TTL 1h).
  4. Emit firewall.alert.ait.detected.v1.
  5. ACK NATS message.

Idempotency: Replays produce identical state (UPSERT semantics).


UC-SimBoxDetection (NATS consumer)

Trigger: fraud-intel-service publishes fraud.detected.simbox.v1 based on graph + ML signatures (IMEI churn, A-number rotation, geo cell-id volatility).

Steps:

  1. Validate signature.
  2. Upsert into firewall.simbox_signals keyed on originator (E.164).
  3. Refresh fw:simbox:{originator} in Redis (TTL 24h).
  4. Emit firewall.simbox.detected.v1.
  5. ACK.

UC-NationalBlocklistFederation-Import (NATS consumer)

Trigger: regulator-portal-service publishes regulator.blocklist.published.v1 (HSM-signed by ATRA).

Steps:

  1. Verify HSM signature. Validate event payload against ATRA public key in Vault Transit (transit/atra-public). Failure → emit firewall.alert.federation.signature.invalid.v1 (PagerDuty severity); do NOT consume.
  2. Idempotency check. For each entry in payload, compute (source='REGULATOR', regulatorRef, type, value) key.
  3. Upsert.
    • action='ADD'INSERT ... ON CONFLICT (source, regulator_ref, type, value) DO UPDATE SET active=TRUE, updated_at=now().
    • action='REMOVE'UPDATE ... SET active=FALSE, deactivated_at=now().
  4. Confidence rescore. Recompute confidence_score per entry (formula in DOMAIN_MODEL §6).
  5. Bloom rebuild. If entryCount delta > 5% of total OR entryCount > current_capacity * 0.8, schedule background Bloom rebuild (worker picks up within 5 s).
  6. Emit firewall.blocklist.federated.v1 with addedCount, removedCount, regulatorRef, entryCountTotal.
  7. Append blocklist audit (firewall.blocklist_audit).
  8. ACK.

UC-NationalBlocklistFederation-Export (cron job)

Trigger: Daily cron at 02:00 Asia/Kabul (single-leader via Redis lock fw:fed:export:lock).

Steps:

  1. Compute diff: entries with share_with_peers=TRUE whose updated_at > lastExportedAt.
  2. Render JSON Lines: one entry per line, {type, value, action, confidence, sources, ts}.
  3. Sign with platform HSM key via PKCS#11 (pkcs11:object=ghasi-firewall-fed-signer).
  4. Upload to MinIO bucket firewall-federation-out/{yyyymmdd}.jsonl.sig (Object Lock enabled, 7-year retention).
  5. Mirror file to regulator-mediated SFTP within 5 minutes.
  6. Even on zero-diff days, emit firewall.federation.heartbeat.v1 so peer MNOs detect a stalled exporter.
  7. On non-empty diff: emit firewall.federation.exported.v1 with exportSha256, signature, presignedUrl (24h validity).

UC-BlocklistAdminCRUD (REST handlers)

EndpointCallerEffect
POST /v1/admin/firewall/blocklists/{id}/entriestns-adminInsert entry with source='OPERATOR_MANUAL'; emit firewall.blocklist.changed.v1
DELETE /v1/admin/firewall/blocklists/{id}/entries/{entryId}tns-adminSoft-delete (active=FALSE); emit event
POST /v1/admin/firewall/blocklists/{id}/entries:bulktns-adminAtomic batch insert (≤ 10000 entries); per-entry validation; rollback on first failure
GET /v1/admin/firewall/blocklist/{entryId}/historytns-admin / regulator-auditorChronological events from firewall.blocklist_audit: created, source_added, source_removed, confidence_changed, manually_overridden, deactivated, reactivated

All admin operations are audited to firewall.blocklist_audit with actor, timestamp (UTC µs), reason.


UC-AuditQuery (REST handler)

Endpoint: GET /v1/admin/firewall/audit?from={iso8601}&to={iso8601}&direction=&verdict=&srcMsisdn=&dstMsisdn=&mnoBindId=&peerAsn=&cursor=&limit=

Caller: tns-admin, regulator-auditor.

Steps:

  1. RBAC enforcement — regulator-auditor cannot specify srcMsisdn/dstMsisdn filters narrower than 3 digits to prevent targeted enumeration; results are masked.
  2. Cursor-paginated query against firewall.audit with partition pruning (use verdict_at first in WHERE clause).
  3. Mask MSISDNs in response according to caller role:
    • tns-admin: full MSISDN visible
    • regulator-auditor: +CCNNN*** masked
  4. Audit the audit query itself (meta-audit) into firewall.access_log.

2. Rule Pipeline Ordering Rationale

The pipeline runs fast-path → medium-path → slow-path to maximise early termination:

OrderCheckLatency budgetRationale
1MAINTENANCE-mode short-circuit< 0.1 msSingle in-process flag
2Verdict cache1 msRepeat-PDU dedup
3Origin Bloom + PG fallthrough1 ms99% cases miss Bloom; no PG read
4Geo (cached MCC/MNC table)2 msIn-process map
5Rate governor (Redis ZADD/ZCARD)5 msSliding-window
6DND Bloom + PG fallthrough2 msSame Bloom pattern
7CEL content rules (priority order)15 msSandboxed; per-rule 50 ms timeout
8AIT / SIM-box signature lookup3 msRedis-cached patterns
9Audit + outbox write2 msTransactional
Total budget30 ms (P95)

Per SERVICE_OVERVIEW §11, PANIC mode disables type REGEX and CLASSIFIER rules to keep the budget under 30 ms during incident.


3. Quarantine Workflow (UC-QuarantineReview)

Auto-expiry: A cron worker every 5 minutes runs:

UPDATE firewall.quarantine_queue
SET status='AUTO_EXPIRED'
WHERE status='PENDING' AND expires_at < now()
RETURNING hold_id;

For each released holdId, emit firewall.quarantine.expired.v1. Distributed lock fw:quarantine:expiry:lock ensures single-leader.


4. Operating-Mode Transition Workflow

Manual switch:

  1. NOC operator A calls POST /v1/admin/firewall/mode {targetMode, reason} with their JWT. Service stores pending request keyed by (targetMode, requesterId) with 60 s TTL.
  2. Within 60 s, operator B calls same endpoint with {targetMode, reason, secondApproverToken: <A's request id>}. If B.userId != A.userId → mode switches.
  3. Single approver → HTTP 412 DUAL_APPROVAL_REQUIRED.
  4. Switch persists to firewall.operating_mode table; broadcast via firewall.mode.changed.v1; all replicas hot-reload mode within 5 s.

Auto-trip to PANIC:

  • Background watcher samples firewall_rule_eval_seconds{quantile="0.95"} every 10 s.
  • 100 ms sustained 60 s → atomically transition to PANIC; emit firewall.alert.mode.auto_panic.v1 (PagerDuty); disable rules with type IN ('REGEX', 'CLASSIFIER').

  • < 30 ms sustained 5 m → atomically transition back to NORMAL; emit firewall.mode.changed.v1.

5. Background Workers

WorkerSchedulePurposeDistributed lock
RuleCacheRefreshWorkerEvery 60s + on firewall.rule.changed.v1Reload active rules into process cacheNone (idempotent)
BloomRebuildWorkerOn-demand + nightly 02:30 Asia/KabulRebuild blocklist + DND Bloom filtersfw:bloom:rebuild:lock
QuarantineExpiryWorkerEvery 5 minAuto-expire PENDING holds past expires_atfw:quarantine:expiry:lock
FederationExportWorkerDaily 02:00 Asia/KabulSign + publish blocklist difffw:fed:export:lock
AuditVerifierWorkerDaily 03:30 Asia/KabulVerify hash-chain integrity in yesterday's audit partition; emit firewall.audit.chain.verified.v1 or chain-break alertfw:audit:verify:lock
AuditArchiveWorkerDaily 03:00 Asia/KabulExport yesterday's audit partition to MinIO Parquet+zstd, HSM-sign, Object-Lock 7yfw:audit:archive:lock
PartitionMaintenanceWorkerDaily 02:00 Asia/KabulPre-create next 3 monthly partitions for firewall.auditfw:partition:lock
PeerHygieneScoreWorkerEvery 5 minRecompute peer hygiene scores; trigger quarantine if score < 30fw:peer:score:lock
ConsentDndConsumerContinuousMaterialise DND snapshot from consent.dnd.snapshot.v1NATS durable consumer (queue group)
FraudIntelConsumerContinuousProcess fraud.detected.*NATS durable consumer
RegulatorBlocklistConsumerContinuousProcess regulator.blocklist.published.v1NATS durable consumer