Skip to main content

SMS Firewall Service — Jira-Ready Epics & User Stories

Status: populated Owner: Trust & Safety Last updated: 2026-04-20 Service prefix: FW Scope: National perimeter firewall — inbound MO + transit MT, AIT detection, SIM-box exclusion, grey-route exclusion, DND enforcement, regulator + cross-MNO federation, admin REST + audit log. Source of truth: _sources/sms-firewall-service/user_stories.md


Epic Summary

Epic IDTitleStoriesPoints
EP-FW-01Inbound MO Firewall (origin, content, rate, geo)US-FW-001 – US-FW-00636
EP-FW-02Transit MT Firewall (peer hygiene, grey-route exclusion)US-FW-007 – US-FW-01131
EP-FW-03National Blocklist Federation (regulator + cross-MNO)US-FW-012 – US-FW-01518
EP-FW-04Firewall Admin REST + Audit LogUS-FW-016 – US-FW-01921
Total19 stories106

EP-FW-01 · Inbound MO Firewall

Context: Per ADR-0004 §3, every inbound deliver_sm PDU from any MNO bind must be evaluated by the firewall before any downstream service (consent-ledger, fraud-intel, routing-engine) sees it. Verdict latency budget is 30 ms P95.

US-FW-001 · Synchronous Inbound MO Verdict gRPC

Type: Feature | Points: 8

Description: As smpp-connector-{mno}-rx, I need to call sms-firewall-service.FilterInbound(MoContext) over mTLS gRPC and receive a verdict (ALLOW | FLAG | BLOCK | QUARANTINE) before responding to the MNO with deliver_sm_resp.

Acceptance Criteria:

  • FilterInbound returns within P95 ≤ 30 ms on port 50061
  • Verdict enum: ALLOW | FLAG | BLOCK | QUARANTINE; BLOCK includes blockReason (ORIGIN_BLOCKLIST | CONTENT_FORBIDDEN | RATE_EXCEEDED | GEO_FORBIDDEN | DND_PRESENT | AIT_SIGNATURE | SIMBOX_SIGNATURE | REGULATOR_BLOCK) and ruleHit.ruleId
  • QUARANTINE returns holdId (UUID v4) and inserts firewall.quarantine_queue row with expires_at = now() + 24h
  • Caller SVID validated against spiffe://ghasi/np-data/smpp-connector-* allow-list; non-matching → PERMISSION_DENIED
  • MAINTENANCE mode returns ALLOW + flags=["MAINTENANCE_MODE"] and audit-logs the bypass
  • Integration test: 1000 calls at 200 RPS → P95 ≤ 30 ms with full rule pipeline active

US-FW-002 · Per-Source Sliding-Window Rate Governor

Type: Feature | Points: 5

Description: As the national perimeter, I need a sliding-window rate limit per srcMsisdn, per source aggregator, and per dstMsisdn so that a single compromised source cannot flood the platform.

Acceptance Criteria:

  • Default thresholds: 10/1s, 100/1m, 500/1h per srcMsisdn; configurable per scope in firewall.rate_overrides
  • Implementation: Redis sorted set fw:rate:src-msisdn:{e164}:{window} with ZADD + ZREMRANGEBYSCORE for true sliding behaviour
  • Threshold breach → BLOCK with blockReason = RATE_EXCEEDED
  • Redis unavailable → verdict ALLOW + flag=RATE_GOVERNOR_DEGRADED; metric firewall_rate_governor_skip_total
  • Tenant-allowlisted short-codes use elevated thresholds from override table
  • Unit test: 1000 events at 100 ms intervals; threshold 10/1s → first 10 ALLOW, remainder BLOCK

US-FW-003 · National Blocklist Bloom-Filter Lookup

Type: Feature | Points: 5

Description: As the firewall evaluator, I need a Redis Bloom filter (BF.EXISTS fw:blocklist:national) for srcMsisdn and senderId lookups, with a Postgres fallthrough for definitive confirmation.

Acceptance Criteria:

  • Bloom filter capacity 10M, target false-positive rate 0.01
  • BF.EXISTS = 0 → not blocked, no Postgres read
  • BF.EXISTS = 1SELECT 1 FROM firewall.blocklist_entries WHERE entry = $1 AND active = TRUE for definitive verdict
  • BF read latency P99 ≤ 0.5 ms
  • Hot reload: firewall.blocklist.changed.v1 event → all replicas refresh within 5 s
  • Redis unavailable → direct Postgres query + firewall_bloom_unavailable_total increment

US-FW-004 · Content-Class Rule Evaluation (CEL Expressions)

Type: Feature | Points: 8

Description: As a Trust & Safety lead, I need to author content-class rules using sandboxed CEL-style expressions referencing typed inputs (pdu.body, pdu.coding, src.msisdn, mno.id, peer.asn, consent.dndPresent).

Acceptance Criteria:

  • CEL grammar with allowed functions: matches(re), contains(s), startsWith(s), endsWith(s), len(), comparison + boolean operators
  • Typed inputs validated at admission; pdu.foo → HTTP 400 RULE_INVALID_INPUT_REF
  • Unsafe expressions (os.system, file IO, network) → HTTP 422 RULE_UNSAFE_EXPRESSION
  • Per-rule eval timeout 50 ms; exceeded → auto-disable and firewall.rule.degraded.v1 with reason = REGEX_TIMEOUT
  • Rule publish → firewall.rule.changed.v1 → all replicas hot-reload within 5 s
  • Match fires BLOCK | QUARANTINE | FLAG per action; ruleHit.ruleId populated

US-FW-005 · Geo / Origin Mismatch Detection

Type: Feature | Points: 5

Description: As the national perimeter, I need to reject inbound MO PDUs whose srcMsisdn country code does not match the permitted MCC/MNC for the originating MNO bind.

Acceptance Criteria:

  • firewall.mno_bind_registry row with permittedCountryCodes JSONB array consulted per call
  • Mismatch (+1... over permittedCountryCodes = ['+93']) → BLOCK with blockReason = GEO_FORBIDDEN
  • number-intelligence-service.Lookup cross-check used when available; UNAVAILABLE → fall back to MCC/MNC table + firewall_numint_unavailable_total
  • Per-bind override via PATCH /v1/admin/firewall/mno-binds/{id} audit-logged
  • Integration test: spoofed +1 source over AWCC bind → BLOCK + correct audit row

Type: Feature | Points: 5

Description: As the firewall data-plane, I need to consume the hourly consent.dnd.snapshot.v1 event from consent-ledger-service and materialise the national DND list into Redis Bloom + Postgres firewall.dnd_snapshot.

Acceptance Criteria:

  • Event payload includes snapshotUrl (MinIO presigned, 24 h validity), entryCount, snapshotSha256
  • Snapshot rebuild completes within 60 s of event ACK; failure → previous snapshot remains active + staleness metric
  • DND check: BF.EXISTS fw:dnd:bloom then Postgres firewall.dnd_snapshot for definitive
  • Snapshot age > 6 h → firewall.alert.dnd.snapshot.stale.v1
  • Bloom auto-resize when entryCount > current_capacity × 0.8

EP-FW-02 · Transit MT Firewall

Context: Inbound submit_sm from peer aggregators must be evaluated for ASN hygiene, sender-ID spoofing, and grey-route signature before reaching routing-engine.

US-FW-007 · EvaluateTransit gRPC for Peer-Aggregator MT

Type: Feature | Points: 8

Description: As smpp-connector-transit-rx, I need to call sms-firewall-service.EvaluateTransit(TransitMtContext) for every inbound submit_sm from a peer aggregator.

Acceptance Criteria:

  • gRPC EvaluateTransit returns within P95 ≤ 50 ms
  • Inputs: peerAsn, peerSystemId, srcAddr, dstMsisdn, senderId, pduBody, pduTon, pduNpi
  • Unknown peerAsnBLOCK with blockReason = PEER_ASN_UNKNOWN
  • Sender-ID not in peer's allowlist → BLOCK with blockReason = SENDER_ID_SPOOFED
  • BLOCK verdict → connector returns submit_sm_resp command_status = ESME_RSUBMITFAIL
  • Firewall unavailable → connector fail-closes (ESME_RSUBMITFAIL + firewall.transit.unavailable.v1)

US-FW-008 · Grey-Route Signature Detection (HLR + ASN Heuristic)

Type: Feature | Points: 8

Description: As the national perimeter, I need to detect grey-route arbitrage by cross-checking dstMsisdn HLR/MNP resolution against the peer's ASN.

Acceptance Criteria:

  • dstMsisdn resolves via number-intelligence-service.Lookup to homeMnoId; mismatch with peer's registered peer_mno_routesBLOCK with blockReason = GREY_ROUTE
  • Heuristic: peer with > 30% MT to non-peered MNO in last 1000 submissions → firewall.alert.greyroute.heuristic.v1 (warning only)
  • Consume fraud.detected.greyroute.v1: implicated peer added to firewall.peer_quarantine; subsequent MT auto-QUARANTINE
  • Heuristic disable via admin override → firewall.audit.policy.changed.v1 with operator identity
  • Integration test: peer ASN 64500 attempts MT to AWCC subscriber → BLOCK with blockReason = GREY_ROUTE

US-FW-009 · Peer-Aggregator Hygiene Score

Type: Feature | Points: 5

Description: As a Trust & Safety analyst, I need a rolling 24 h hygiene score per peer aggregator combining sender-ID conformance, grey-route hits, and content violations.

Acceptance Criteria:

  • 5-minute rollup writes firewall.peer_hygiene_scores (peer_id, score 0-100, windowStart, windowEnd, sampleCount)
  • score < 60 for 3 consecutive windows → firewall.alert.peer.degraded.v1
  • score < 30 → auto-QUARANTINE peer; firewall.peer_quarantine.entered.v1
  • Auto-quarantine release requires dual-approval POST /v1/admin/firewall/peers/{id}/release
  • Score formula documented and reproducible from raw audit log

US-FW-010 · Sender-ID Origin Verification on Transit MT

Type: Feature | Points: 5

Description: As the national perimeter, I need to verify each transit MT senderId against sender-id-registry-service for ownership and status.

Acceptance Criteria:

  • gRPC sender-id-registry-service.Verify(senderId, peerId) called per transit MT
  • Ownership mismatch → BLOCK with blockReason = SENDER_ID_SPOOFED
  • Suspended sender-ID → BLOCK with blockReason = SENDER_ID_SUSPENDED
  • Unknown sender-ID (status = UNKNOWN) → QUARANTINE (NOC review)
  • Registry unavailable → fall back to local hourly cache firewall.peer_senderid_allowlist

US-FW-011 · Quarantine Queue with NOC Manual Release

Type: Feature | Points: 5

Description: As an NOC operator, I need to view the quarantine queue, inspect quarantined PDUs, and either release (re-evaluate) or permanently reject.

Acceptance Criteria:

  • GET /v1/admin/firewall/quarantine?status=PENDING&page=1&pageSize=50 → paginated list
  • GET /v1/admin/firewall/quarantine/{holdId} → full MoContext or TransitMtContext (PDU body redacted in logs)
  • POST /v1/admin/firewall/quarantine/{holdId}/release → re-evaluate; if ALLOW, re-inject via firewall.quarantine.released.v1
  • POST /v1/admin/firewall/quarantine/{holdId}/reject {reason}firewall.quarantine.rejected.v1; 7-year cold retention
  • Auto-expiry at expires_atfirewall.quarantine.expired.v1

EP-FW-03 · National Blocklist Federation

Context: Per ADR-0004, the platform must consume regulator-issued blocklist updates and publish a daily signed diff back to peer Afghan MNOs.

US-FW-012 · National Blocklist Federation Import

Type: Feature | Points: 5

Description: As the national perimeter, I need to consume regulator.blocklist.published.v1 events from regulator-portal-service and import the regulator's mandated additions/removals.

Acceptance Criteria:

  • Event payload: entries: [{ type: 'MSISDN'|'SENDER_ID'|'KEYWORD', value, action: 'ADD'|'REMOVE', issuedBy, issuedAt, regulatorRef }]
  • Upsert into firewall.blocklist_entries with source = 'REGULATOR', regulator_ref
  • action = REMOVEactive = FALSE (soft delete; audit retained)
  • Idempotent on (source, regulator_ref, type, value) unique constraint
  • HSM signature validation; invalid → firewall.alert.federation.signature.invalid.v1 (PagerDuty)
  • Post-import: rebuild Bloom + emit firewall.blocklist.federated.v1 with counts

US-FW-013 · Cross-MNO Blocklist Federation Export

Type: Feature | Points: 5

Description: As carrier relations, I need a daily HSM-signed diff of firewall.blocklist_entries (with share_with_peers = TRUE) published to peer Afghan MNOs.

Acceptance Criteria:

  • Cron 02:00 Asia/Kabul; output firewall-federation-out/{yyyymmdd}.jsonl.sig (JSON Lines, HSM PKCS#11 signature)
  • firewall.federation.exported.v1 published with SHA-256, signature, presigned URL (24 h)
  • Mirror to regulator-mediated SFTP within 5 minutes of upload
  • Heartbeat firewall.federation.heartbeat.v1 even on zero-diff days
  • Integration test: signed file verifies against published HSM public key

US-FW-014 · Federated Entry Reputation and Confidence Scoring

Type: Feature | Points: 5

Description: As a Trust & Safety lead, I need per-entry source attribution and a confidence score so that single-source entries enter probation while multi-source entries auto-apply.

Acceptance Criteria:

  • firewall.blocklist_entries.sources JSONB: [{ sourceId, sourceType, reportedAt }]
  • confidence_score = clamp(regulator_count*1.0 + peer_count*0.5 + internal_count*0.7, 0, 1)
  • score >= 0.8auto_apply = TRUE
  • score < 0.8 → 24 h probation: matches → QUARANTINE not BLOCK
  • score < 0.4 → auto-deactivate + firewall.blocklist.entry.deactivated.v1

US-FW-015 · Per-Entry Audit Trail for Blocklist Changes

Type: Feature | Points: 3

Description: As a regulator auditor, I need the full chain of additions, removals, source attributions, and operator decisions for any blocklist entry.

Acceptance Criteria:

  • GET /v1/admin/firewall/blocklist/{entryId}/history → chronological events: created, source_added, source_removed, confidence_changed, manually_overridden, deactivated, reactivated
  • Each event: actor (operator ID / SYSTEM / regulator ref), timestamp (UTC µs), reason
  • GET /v1/internal/firewall/blocklist/export?since={iso8601} → JSON Lines + HSM signature (regulator-only mTLS)
  • firewall.blocklist_audit append-only enforced by Postgres trigger blocking UPDATE/DELETE

EP-FW-04 · Firewall Admin REST + Audit Log

US-FW-016 · Admin REST: Rule CRUD with Versioning

Type: Feature | Points: 5

Description: As a Trust & Safety admin, I need to create, list, fetch, update, deactivate, and version firewall rules via authenticated REST.

Acceptance Criteria:

  • POST /v1/admin/firewall/rules (role tns-admin) → 201 with ruleId, version: 1
  • PUT /v1/admin/firewall/rules/{ruleId} → new immutable version row; version bumped
  • GET /v1/admin/firewall/rules?scope=MO&enabled=true → paginated
  • enabled = false → excluded from runtime
  • Non-admin token → 403
  • Pact contract test against admin-dashboard consumer

US-FW-017 · Admin REST: MNO Bind Registry

Type: Feature | Points: 3

Description: As carrier relations, I need to register mnoBindIdmnoId ↔ permitted-country-codes ↔ permitted-sender-IDs mappings.

Acceptance Criteria:

  • POST /v1/admin/firewall/mno-binds with { mnoBindId, mnoId, direction, permittedCountryCodes[], permittedSenderIds[], notes } → 201
  • GET /v1/admin/firewall/mno-binds → list (no secrets)
  • DELETE /v1/admin/firewall/mno-binds/{id} → soft delete + firewall.mno_bind.deactivated.v1
  • POST /v1/internal/firewall/mno-binds/{id}/heartbeat from connector pods; missing > 60 s → firewall.alert.bind.missing.v1

US-FW-018 · Append-Only Audit Log to NATS + Postgres + Cold Archive

Type: Feature | Points: 8

Description: As the regulator-grade evidence pipeline, I need every verdict to produce an append-only firewall.audit.v1 event mirrored to Postgres and to MinIO WORM cold archive.

Acceptance Criteria:

  • NATS JetStream stream FIREWALL_AUDIT receives every verdict event with traceId, verdict, verdictAt, evaluatedRuleIds[], srcMsisdn, dstMsisdn, mnoBindId, peerAsn?, holdId?
  • Postgres firewall.audit_log partitioned PARTITION BY RANGE (verdict_at) monthly
  • Daily archive job 03:00 Asia/Kabul exports yesterday's partition to MinIO firewall-audit-archive/{yyyymmdd}.parquet.zst.sig (HSM-signed, Object Lock Compliance, 7-year retention)
  • FIREWALL_AUDIT mirrored to mzr and to dxb leaf
  • Postgres trigger blocks UPDATE/DELETE on firewall.audit_log

US-FW-019 · Operating-Mode Switch (NORMAL ↔ DEGRADED ↔ PANIC ↔ MAINTENANCE)

Type: Feature | Points: 5

Description: As NOC + Trust & Safety lead, I need to switch operating mode via dual-approval REST with full audit and auto-trip on latency breach.

Acceptance Criteria:

  • POST /v1/admin/firewall/mode { targetMode, reason, secondApproverToken } requires two distinct admin tokens within 60 s
  • Single approver → 412 DUAL_APPROVAL_REQUIRED
  • targetMode = PANIC → disables type IN ('REGEX','CLASSIFIER') rules at runtime; firewall_mode_panic_active = 1
  • Auto-trip: firewall_rule_eval_seconds{quantile="0.95"} > 100ms for 60 s → auto-PANIC + firewall.alert.mode.auto_panic.v1 (PagerDuty)
  • Auto-recovery: < 30 ms P95 sustained 5 m → auto-restore to NORMAL + event