Skip to main content

SMS Firewall Service — Security Model

Version: 1.0 Status: Draft Owner: Trust & Safety + Security Last Updated: 2026-04-21 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · 13 Security, Compliance, Tenancy Related ADR: ADR-0004 §11–§12


1. Authentication

1.1 gRPC plane — FilterInbound, EvaluateTransit, CheckOutboundEgress, GetVerdict

  • mTLS required. The gRPC server accepts only connections presenting an X.509 client certificate signed by the platform CA and bearing a SPIFFE ID matching the per-RPC allowlist:
RPCAllowed peer SPIFFE IDs
FilterInboundspiffe://ghasi/np-data/smpp-connector-{awcc,roshan,etisalat,mtn-af,salaam}-rx, -trx
EvaluateTransitspiffe://ghasi/np-data/smpp-connector-transit-rx
CheckOutboundEgressspiffe://ghasi/np-ctrl/routing-engine, spiffe://ghasi/np-ctrl/channel-router-service
GetVerdictspiffe://ghasi/np-data/cdr-mediation-service, spiffe://ghasi/np-data/fraud-intel-service
RegisterBindHeartbeatspiffe://ghasi/np-data/smpp-connector-*
  • Client certs are issued by SPIRE (per ADR-0004 §12) and rotated every 1 hour. The server hot-reloads on file change via inotify on the SVID mount path.
  • Local dev bypass via GRPC_TLS_ENABLED=false is a start-up guard rejected when NODE_ENV != 'development'.

1.2 REST admin plane

  • Kong validates the platform JWT (issued by auth-service, RS256, JWKS-backed). No JWT → 401 at the edge.
  • Kong forwards X-User-Id, X-Roles, X-Tenant-Id (null for platform-scoped admin), and X-Trace-Id headers. The firewall never parses JWTs directly — it trusts Kong.
  • For cross-region admin failover (mzr operating-mode handoff during kbl outage), Kong is region-pinned; the mzr Kong issues its own forwarding headers based on the same auth-service JWKS.

1.3 Internal NATS-event signing

  • Every event published to FIREWALL_* JetStream streams is signed with the firewall service's Ed25519 key (rotated quarterly via Vault).
  • Critical consumers (regulator-portal-service, cdr-mediation-service) verify the signature opportunistically; analytics-service does not (best-effort).
  • Inbound regulator.blocklist.published.v1 events are validated against the regulator's HSM public key (PKCS#11) before consumption — failure → firewall.alert.federation.signature.invalid.v1 (PagerDuty).

2. Authorisation (RBAC)

Role definitions live in auth-service; the firewall enforces scope/role checks at the handler boundary using NestJS Guards.

RoleCapabilities
tns-adminFull CRUD on rules, blocklists, MNO binds, peer aggregators; mode switching (one of two approvers); decrypt quarantine PDU bodies; full MSISDN visibility
nocRead quarantine queue; release/reject single quarantine entries (one of two approvers for release); read audit log; mode switching (second approver)
carrier-relationsCRUD on MNO bind registry, peer aggregators, peer routes; read-only on rules/blocklists
regulator-auditorRead audit log + blocklist history + federation log; MSISDN-masked responses; cannot view PDU bodies; cannot modify any state
tns-readerRead-only on rules + blocklists; no PII access
service (programmatic)gRPC SVID-authenticated; per-RPC allowlist (see §1.1)

Enforcement layers

  1. Kong: rejects un-authenticated requests at the edge.
  2. NestJS @RequireRoles(...) decorator + RoleGuard: rejects insufficient roles with 403 INSUFFICIENT_SCOPE before handler entry.
  3. Postgres RLS on firewall.quarantine_queue: app.caller_role set per request; only noc/tns-admin may SELECT.
  4. Body-decryption interceptor: PDU plaintext decryption (Vault Transit unwrap of DEK) only inside handlers gated by role check; any code path that would log plaintext is forbidden by ESLint rule.
  5. MSISDN-masking interceptor: regulator-auditor responses run through the masker (+CCNNN***); platform-internal scopes pass through.
  6. Dual-approval check: mode-switch and quarantine-release endpoints require two distinct user IDs within 60 s — enforced in the use-case handler.

3. Data protection

3.1 PII inventory & classification

FieldClassificationStorageTransit
firewall.audit.src_msisdn / dst_msisdnCONFIDENTIAL (E.164 PII)Plain in Postgres (TDE + disk encryption); masked in REST responses to regulator-auditor; masked in NATS events to all consumersTLS 1.3 only
firewall.quarantine_queue.original_pdu_cipherRESTRICTED (full PDU body)AES-256-GCM at rest; per-MNO KEK in Vault Transit (transit/ghasi-firewall-mno-{mnoId}); decrypted only inside noc-gated handlersTLS 1.3
firewall.audit.pdu_body_sha256INTERNALOne-way hash; no PII recoverableTLS 1.3
firewall.blocklist_entries.value (when type=MSISDN)CONFIDENTIALPlain; masked to regulator-auditor in APITLS 1.3
firewall.dnd_snapshot.msisdnCONFIDENTIALPlain; never exposed via REST (internal mesh-only)TLS 1.3
Rule evidence strings (in ruleHits)CONFIDENTIALRedacted at write-time (matched span replaced with *** + 4-char prefix/suffix context)TLS 1.3
Federation export (JSON Lines + .sig)INTERNALPlain in MinIO Object-Lock; signature non-secretTLS 1.3

3.2 Encryption keys

KeyStoreRotationAlgorithm
Per-MNO KEK for quarantine PDU encryptionVault Transit365 days OR on incidentAES-256 wrap (Vault Transit)
DEK per-quarantine rowWrapped inline; unwrapped per read via Vault TransitImplicit (per-row)AES-256-GCM
mTLS server certSPIRE (workload SVID)1 hourEC P-256
mTLS client cert (in caller pods)SPIRE1 hourEC P-256
Federation export PKCS#11 HSM signing keyHardware HSM (Thales Luna or AWS CloudHSM in dxb)365 daysRSA-3072 PKCS#1v1.5
Event-signing Ed25519 keyVault KV (rotated cryptographically)90 daysEd25519
Postgres TDE masterKMS (Vault Transit + cluster-managed)365 daysAES-256
MinIO archive encryptionKMS (per-bucket); per-day data key rotated by HSMDailyAES-256-GCM

3.3 Redaction rules

  • In events. MSISDNs appear as toMasked / srcMsisdnMasked (+CCNNN***). PDU body never appears (only pduBodySha256). Rule evidence strings are redacted with *** for the matched span.
  • In logs. Pino redactor masks all fields named srcMsisdn, dstMsisdn, senderId, pduBody. ESLint rule no-pdu-body-in-logs forbids logger.{info|warn|error}(..., { pduBody }) patterns at PR time.
  • In REST responses. A response interceptor applies role-based masking before serialization:
    • tns-admin: full visibility
    • noc: full MSISDNs but PDU body only via explicit decrypt endpoint
    • regulator-auditor: masked MSISDNs; no PDU body access
    • tns-reader: masked MSISDNs; no PDU body
  • In quarantine decrypt: plaintext PDU body returned only via GET /v1/admin/firewall/quarantine/{holdId} to noc/tns-admin; the decrypt action itself is audited in firewall.access_log.

3.4 Classifier data residency

  • The CLASSIFIER LLM is strictly on-cluster (local-llm-service). External LLM calls are architecturally forbidden — the firewall binary refuses to start with EXTERNAL_LLM_ENABLED=true. See AI_INTEGRATION §10.
  • PII is anonymised before LLM inference (defence-in-depth even on-cluster).
  • Cache key is sha256(piiRedactedBody) — no PII recoverable from cache.

4. Audit & integrity

4.1 Append-only verdict audit (firewall.audit)

  • Hash-chained. Every row carries prev_hash (= previous row's row_hash in the same partition) and row_hash (= SHA-256(prev_hash ‖ canonicalJson(row))).
  • Append-only at DB level. Postgres rules reject UPDATE/DELETE.
  • Offline integrity verification. AuditVerifierWorker runs daily at 03:30 Asia/Kabul over the prior day's partition; replays the chain end-to-end; on break, emits firewall.audit.chain.break.v1 (CRITICAL — PagerDuty + auto-engage Security incident response).
  • Cold archive. Daily Parquet+zstd export to MinIO firewall-audit-archive/{yyyymmdd}.parquet.zst.sig with HSM signature; Object Lock Compliance mode, 7-year retention.
  • Cross-region mirror. JetStream FIREWALL_AUDIT mirrored kbl→mzr; leaf-mirrored to dxb cold archive. Loss of kbl does not lose audit evidence.

4.2 Admin-action audit (firewall.blocklist_audit + firewall.outbox)

Every admin REST action emits firewall.rule.changed.v1 (or firewall.blocklist.changed.v1, etc.) with:

  • actorUserId
  • before / after snapshots (JSONB)
  • ip, userAgent
  • traceId
  • occurredAt (UTC µs)

Retention ≥ 13 months for admin events; permanent for blocklist audit (regulator dispute resolution).

4.3 Meta-audit (audit of audit reads)

When regulator-auditor queries /v1/admin/firewall/audit, the query itself is recorded into firewall.access_log. This deters and detects attempts to enumerate sensitive ranges by abusing read access.


5. Fail-closed posture

The firewall is the national perimeter — security is preserved by never letting an un-evaluated PDU through:

  • gRPC handler error / timeout → UNAVAILABLE → connector behaviour:
    • Transit MTsubmit_sm_resp ESME_RSUBMITFAIL to peer (silent block; subscriber not impacted)
    • MO → connector writes to local-disk WAL; MNO deliver_sm_resp ESME_ROK (preserves subscriber relationship); replays once firewall back
  • Postgres unavailable → service degrades to Redis-only verdict cache; new evaluations after 60s of cache miss → INTERNAL → fail-closed
  • Vault Transit unavailable → quarantine inserts cannot encrypt → new QUARANTINE verdicts upgrade to BLOCK with flag=KEK_UNAVAILABLE (still fail-closed)
  • HSM unavailable → federation export postponed; emits firewall.federation.export.postponed.v1
  • Local LLM unavailable → CLASSIFIER rules skip; non-classifier rules continue (reduced detection coverage but firewall still operational)

Security implication: availability attacks against the firewall cannot weaken policy — they can at worst delay traffic.


6. Tenant isolation

  • Inbound MO is pre-tenant — there is no tenant ID on the verdict path. Isolation is by mnoBindId and srcMsisdn.
  • Egress DND check is per-tenant (called by routing-engine with tenantId); tenant-specific allowlists in firewall.rate_overrides provide isolation.
  • Per-MNO KEKs ensure quarantine PDU compromise of one MNO's key does not expose another MNO's holds.

7. Secrets

SecretStoreInjected as
gRPC server SVID + keySPIRE → tmpfs mountFile mount, hot-reloaded
gRPC client SVIDs (in caller pods)SPIREFile mount in caller pod
Postgres credentialsVault DB dynamic secret (24h)Env var (rotated by Vault Agent)
Redis credentialsVault KVEnv var
NATS credentialsVault KVEnv var
KEKs for quarantine (per MNO)Vault Transit (referenced, not exported)
HSM signerPKCS#11 token (no plaintext export)PKCS#11 URI in env
Event-signing Ed25519 keyVault KVEnv var
Regulator HSM public keyVault KV (rotated when regulator rotates)File mount

gitleaks and trufflehog pre-commit + CI scans block accidental commits. No secret is ever written to logs, events, or config files.


8. Threat model

ThreatMitigation
MNO bind compromise → attacker-controlled MO injectionmTLS SVID per connector; SPIFFE ID strict allowlist; per-bind concurrency cap 200 in-flight; rate governor per srcMsisdn; bind heartbeat watchdog (firewall.alert.bind.missing.v1)
Admin token leakage → unauthorised rule edits / blocklist tamperingMFA required for tns-admin (auth-service enforced); all admin actions audit-logged with before/after; firewall.rule.changed.v1 fans out to SOC channel; rule mutations require dual approval for action=ALLOW (whitelist) on inbound MO
Blocklist injection via federation source poisoningHSM signature validation on regulator events; federation entries enter probation (QUARANTINE for 24h) until confidenceScore >= 0.8; per-source confidence_count capped (one source can't push score alone)
Bloom-filter poisoningBloom is rebuilt from authoritative firewall.blocklist_entries table (which is hash-audited); a poisoned Bloom only causes false-positive Postgres reads (no false-negative); rebuild verifies row-count delta < 5% per cycle
Rule expression DoS (regex catastrophic backtracking)re2 engine (linear time, no backtracking); pattern length cap 500 chars; per-rule eval timeout 50 ms; auto-disable on timeout via firewall.rule.degraded.v1; pattern admission ReDoS screen tests against known catastrophic patterns
CEL expression sandbox escapeWhitelisted function set; os.system/file IO/network forbidden; AST validation at admission rejects with HTTP 422 RULE_UNSAFE_EXPRESSION
Replay of cached verdict to bypass updated rulesVerdict cache TTL ≤ 60 s; on rule version bump, fw:verdict:* keys are invalidated by pattern deletion
Audit log tampering at DB levelPostgres rules reject UPDATE/DELETE; hash-chain verified offline daily; replication to mzr provides independent attest
Audit log tampering at archiveMinIO Object Lock Compliance mode (7y); HSM-signed Parquet exports; signatures verified by regulator on import
Compromised quarantine release endpointDual-approval; both approvers' identities recorded in firewall.quarantine.released.v1; review notes mandatory
Lateral movement via Postgres credentialVault dynamic credentials (24h); least-privilege DB role (no DROP, no superuser); pgaudit logs all credentialed sessions
Quarantine PDU exfiltration via /quarantine/{holdId}RLS enforces noc/tns-admin only; decrypt action audited to firewall.access_log; PDU plaintext never logged; download endpoints are streaming (no response cache)
Inference-pipeline manipulation (poisoned classifier output)Local LLM only; grammar-constrained JSON decode rejects malformed; classifier alone never escalates beyond FLAG; pairing-rule §5.1 in AI_INTEGRATION

9. GDPR & regulatory

  • Right to erasure (GDPR Art. 17): firewall does not own subscriber records. On auth.user.erased.v1, the firewall participates by:
    • Redacting dst_msisdn / src_msisdn in firewall.audit rows older than the regulator floor (7 years) → impossible because of append-only invariant; instead, future archive partitions are pseudonymised at archive-time using a per-tenant pseudonymisation key for archived rows older than the regulatory floor.
    • For rows within the 7-year regulatory window, no erasure (regulator obligation overrides per ATRA Art. 18).
  • Audit evidence window: ≥ 7 years (regulated). Regulator submissions via firewall.federation.exported.v1 and the /v1/internal/firewall/blocklist/export endpoint.
  • Data residency: all PII stays in Afghanistan (kbl + mzr regions). Cold archive in dxb (UAE leaf) is encrypted with a key escrowed in Afghan-operator-controlled HSM partition; UAE side cannot decrypt without Afghan key release.
  • Sub-processor list: none — all infra in-cluster.

10. Security testing

  • Contract tests per API_CONTRACTS §7.
  • Property-based rule-evaluator tests including known ReDoS patterns, CEL injection attempts, Unicode edge cases.
  • ZAP baseline + API scan on every main-branch build.
  • Quarterly external pen test scoped to firewall REST + gRPC surface.
  • Role-matrix integration test — every endpoint × every role × every PII field — verifies 200/403/redaction behaviour.
  • Hash-chain tamper-evidence test in CI: insert tampered row directly via raw SQL → AuditVerifierWorker emits chain.break within 1h.
  • Federation signature-injection test: corrupt 1 byte of regulator event → service rejects + emits signature.invalid event.
  • Secret scanning in CI (gitleaks); dependency scanning (osv-scanner); container scanning (trivy); SBOM generated per release.