SMS Firewall Service — Security Model

Version: 1.0 Status: Draft Owner: Trust & Safety + Security Last Updated: 2026-04-21 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · 13 Security, Compliance, Tenancy Related ADR: ADR-0004 §11–§12

1. Authentication

1.1 gRPC plane — `FilterInbound`, `EvaluateTransit`, `CheckOutboundEgress`, `GetVerdict`

mTLS required. The gRPC server accepts only connections presenting an X.509 client certificate signed by the platform CA and bearing a SPIFFE ID matching the per-RPC allowlist:

RPC	Allowed peer SPIFFE IDs
`FilterInbound`	`spiffe://ghasi/np-data/smpp-connector-{awcc,roshan,etisalat,mtn-af,salaam}-rx`, `-trx`
`EvaluateTransit`	`spiffe://ghasi/np-data/smpp-connector-transit-rx`
`CheckOutboundEgress`	`spiffe://ghasi/np-ctrl/routing-engine`, `spiffe://ghasi/np-ctrl/channel-router-service`
`GetVerdict`	`spiffe://ghasi/np-data/cdr-mediation-service`, `spiffe://ghasi/np-data/fraud-intel-service`
`RegisterBindHeartbeat`	`spiffe://ghasi/np-data/smpp-connector-*`

Client certs are issued by SPIRE (per ADR-0004 §12) and rotated every 1 hour. The server hot-reloads on file change via inotify on the SVID mount path.
Local dev bypass via GRPC_TLS_ENABLED=false is a start-up guard rejected when NODE_ENV != 'development'.

1.2 REST admin plane

Kong validates the platform JWT (issued by auth-service, RS256, JWKS-backed). No JWT → 401 at the edge.
Kong forwards X-User-Id, X-Roles, X-Tenant-Id (null for platform-scoped admin), and X-Trace-Id headers. The firewall never parses JWTs directly — it trusts Kong.
For cross-region admin failover (mzr operating-mode handoff during kbl outage), Kong is region-pinned; the mzr Kong issues its own forwarding headers based on the same auth-service JWKS.

1.3 Internal NATS-event signing

Every event published to FIREWALL_* JetStream streams is signed with the firewall service's Ed25519 key (rotated quarterly via Vault).
Critical consumers (regulator-portal-service, cdr-mediation-service) verify the signature opportunistically; analytics-service does not (best-effort).
Inbound regulator.blocklist.published.v1 events are validated against the regulator's HSM public key (PKCS#11) before consumption — failure → firewall.alert.federation.signature.invalid.v1 (PagerDuty).

2. Authorisation (RBAC)

Role definitions live in auth-service; the firewall enforces scope/role checks at the handler boundary using NestJS Guards.

Role	Capabilities
`tns-admin`	Full CRUD on rules, blocklists, MNO binds, peer aggregators; mode switching (one of two approvers); decrypt quarantine PDU bodies; full MSISDN visibility
`noc`	Read quarantine queue; release/reject single quarantine entries (one of two approvers for release); read audit log; mode switching (second approver)
`carrier-relations`	CRUD on MNO bind registry, peer aggregators, peer routes; read-only on rules/blocklists
`regulator-auditor`	Read audit log + blocklist history + federation log; MSISDN-masked responses; cannot view PDU bodies; cannot modify any state
`tns-reader`	Read-only on rules + blocklists; no PII access
`service` (programmatic)	gRPC SVID-authenticated; per-RPC allowlist (see §1.1)

Enforcement layers

Kong: rejects un-authenticated requests at the edge.
NestJS @RequireRoles(...) decorator + RoleGuard: rejects insufficient roles with 403 INSUFFICIENT_SCOPE before handler entry.
Postgres RLS on firewall.quarantine_queue: app.caller_role set per request; only noc/tns-admin may SELECT.
Body-decryption interceptor: PDU plaintext decryption (Vault Transit unwrap of DEK) only inside handlers gated by role check; any code path that would log plaintext is forbidden by ESLint rule.
MSISDN-masking interceptor: regulator-auditor responses run through the masker (+CCNNN***); platform-internal scopes pass through.
Dual-approval check: mode-switch and quarantine-release endpoints require two distinct user IDs within 60 s — enforced in the use-case handler.

3. Data protection

3.1 PII inventory & classification

Field	Classification	Storage	Transit
`firewall.audit.src_msisdn` / `dst_msisdn`	CONFIDENTIAL (E.164 PII)	Plain in Postgres (TDE + disk encryption); masked in REST responses to `regulator-auditor`; masked in NATS events to all consumers	TLS 1.3 only
`firewall.quarantine_queue.original_pdu_cipher`	RESTRICTED (full PDU body)	AES-256-GCM at rest; per-MNO KEK in Vault Transit (`transit/ghasi-firewall-mno-{mnoId}`); decrypted only inside `noc`-gated handlers	TLS 1.3
`firewall.audit.pdu_body_sha256`	INTERNAL	One-way hash; no PII recoverable	TLS 1.3
`firewall.blocklist_entries.value` (when type=MSISDN)	CONFIDENTIAL	Plain; masked to regulator-auditor in API	TLS 1.3
`firewall.dnd_snapshot.msisdn`	CONFIDENTIAL	Plain; never exposed via REST (internal mesh-only)	TLS 1.3
Rule `evidence` strings (in `ruleHits`)	CONFIDENTIAL	Redacted at write-time (matched span replaced with `***` + 4-char prefix/suffix context)	TLS 1.3
Federation export (JSON Lines + .sig)	INTERNAL	Plain in MinIO Object-Lock; signature non-secret	TLS 1.3

3.2 Encryption keys

Key	Store	Rotation	Algorithm
Per-MNO KEK for quarantine PDU encryption	Vault Transit	365 days OR on incident	AES-256 wrap (Vault Transit)
DEK per-quarantine row	Wrapped inline; unwrapped per read via Vault Transit	Implicit (per-row)	AES-256-GCM
mTLS server cert	SPIRE (workload SVID)	1 hour	EC P-256
mTLS client cert (in caller pods)	SPIRE	1 hour	EC P-256
Federation export PKCS#11 HSM signing key	Hardware HSM (Thales Luna or AWS CloudHSM in `dxb`)	365 days	RSA-3072 PKCS#1v1.5
Event-signing Ed25519 key	Vault KV (rotated cryptographically)	90 days	Ed25519
Postgres TDE master	KMS (Vault Transit + cluster-managed)	365 days	AES-256
MinIO archive encryption	KMS (per-bucket); per-day data key rotated by HSM	Daily	AES-256-GCM

3.3 Redaction rules

In events. MSISDNs appear as toMasked / srcMsisdnMasked (+CCNNN***). PDU body never appears (only pduBodySha256). Rule evidence strings are redacted with *** for the matched span.
In logs. Pino redactor masks all fields named srcMsisdn, dstMsisdn, senderId, pduBody. ESLint rule no-pdu-body-in-logs forbids logger.{info|warn|error}(..., { pduBody }) patterns at PR time.
In REST responses. A response interceptor applies role-based masking before serialization:
- tns-admin: full visibility
- noc: full MSISDNs but PDU body only via explicit decrypt endpoint
- regulator-auditor: masked MSISDNs; no PDU body access
- tns-reader: masked MSISDNs; no PDU body
In quarantine decrypt: plaintext PDU body returned only via GET /v1/admin/firewall/quarantine/{holdId} to noc/tns-admin; the decrypt action itself is audited in firewall.access_log.

3.4 Classifier data residency

The CLASSIFIER LLM is strictly on-cluster (local-llm-service). External LLM calls are architecturally forbidden — the firewall binary refuses to start with EXTERNAL_LLM_ENABLED=true. See AI_INTEGRATION §10.
PII is anonymised before LLM inference (defence-in-depth even on-cluster).
Cache key is sha256(piiRedactedBody) — no PII recoverable from cache.

4. Audit & integrity

4.1 Append-only verdict audit (`firewall.audit`)

Hash-chained. Every row carries prev_hash (= previous row's row_hash in the same partition) and row_hash (= SHA-256(prev_hash ‖ canonicalJson(row))).
Append-only at DB level. Postgres rules reject UPDATE/DELETE.
Offline integrity verification. AuditVerifierWorker runs daily at 03:30 Asia/Kabul over the prior day's partition; replays the chain end-to-end; on break, emits firewall.audit.chain.break.v1 (CRITICAL — PagerDuty + auto-engage Security incident response).
Cold archive. Daily Parquet+zstd export to MinIO firewall-audit-archive/{yyyymmdd}.parquet.zst.sig with HSM signature; Object Lock Compliance mode, 7-year retention.
Cross-region mirror. JetStream FIREWALL_AUDIT mirrored kbl→mzr; leaf-mirrored to dxb cold archive. Loss of kbl does not lose audit evidence.

4.2 Admin-action audit (`firewall.blocklist_audit` + `firewall.outbox`)

Every admin REST action emits firewall.rule.changed.v1 (or firewall.blocklist.changed.v1, etc.) with:

actorUserId
before / after snapshots (JSONB)
ip, userAgent
traceId
occurredAt (UTC µs)

Retention ≥ 13 months for admin events; permanent for blocklist audit (regulator dispute resolution).

4.3 Meta-audit (audit of audit reads)

When regulator-auditor queries /v1/admin/firewall/audit, the query itself is recorded into firewall.access_log. This deters and detects attempts to enumerate sensitive ranges by abusing read access.

5. Fail-closed posture

The firewall is the national perimeter — security is preserved by never letting an un-evaluated PDU through:

gRPC handler error / timeout → UNAVAILABLE → connector behaviour:
- Transit MT → submit_sm_resp ESME_RSUBMITFAIL to peer (silent block; subscriber not impacted)
- MO → connector writes to local-disk WAL; MNO deliver_sm_resp ESME_ROK (preserves subscriber relationship); replays once firewall back
Postgres unavailable → service degrades to Redis-only verdict cache; new evaluations after 60s of cache miss → INTERNAL → fail-closed
Vault Transit unavailable → quarantine inserts cannot encrypt → new QUARANTINE verdicts upgrade to BLOCK with flag=KEK_UNAVAILABLE (still fail-closed)
HSM unavailable → federation export postponed; emits firewall.federation.export.postponed.v1
Local LLM unavailable → CLASSIFIER rules skip; non-classifier rules continue (reduced detection coverage but firewall still operational)

Security implication: availability attacks against the firewall cannot weaken policy — they can at worst delay traffic.

6. Tenant isolation

Inbound MO is pre-tenant — there is no tenant ID on the verdict path. Isolation is by mnoBindId and srcMsisdn.
Egress DND check is per-tenant (called by routing-engine with tenantId); tenant-specific allowlists in firewall.rate_overrides provide isolation.
Per-MNO KEKs ensure quarantine PDU compromise of one MNO's key does not expose another MNO's holds.

7. Secrets

Secret	Store	Injected as
gRPC server SVID + key	SPIRE → tmpfs mount	File mount, hot-reloaded
gRPC client SVIDs (in caller pods)	SPIRE	File mount in caller pod
Postgres credentials	Vault DB dynamic secret (24h)	Env var (rotated by Vault Agent)
Redis credentials	Vault KV	Env var
NATS credentials	Vault KV	Env var
KEKs for quarantine (per MNO)	Vault Transit (referenced, not exported)	—
HSM signer	PKCS#11 token (no plaintext export)	PKCS#11 URI in env
Event-signing Ed25519 key	Vault KV	Env var
Regulator HSM public key	Vault KV (rotated when regulator rotates)	File mount

gitleaks and trufflehog pre-commit + CI scans block accidental commits. No secret is ever written to logs, events, or config files.

8. Threat model

Threat	Mitigation
MNO bind compromise → attacker-controlled MO injection	mTLS SVID per connector; SPIFFE ID strict allowlist; per-bind concurrency cap 200 in-flight; rate governor per srcMsisdn; bind heartbeat watchdog (`firewall.alert.bind.missing.v1`)
Admin token leakage → unauthorised rule edits / blocklist tampering	MFA required for `tns-admin` (auth-service enforced); all admin actions audit-logged with before/after; `firewall.rule.changed.v1` fans out to SOC channel; rule mutations require dual approval for `action=ALLOW` (whitelist) on inbound MO
Blocklist injection via federation source poisoning	HSM signature validation on regulator events; federation entries enter probation (`QUARANTINE` for 24h) until `confidenceScore >= 0.8`; per-source `confidence_count` capped (one source can't push score alone)
Bloom-filter poisoning	Bloom is rebuilt from authoritative `firewall.blocklist_entries` table (which is hash-audited); a poisoned Bloom only causes false-positive Postgres reads (no false-negative); rebuild verifies row-count delta < 5% per cycle
Rule expression DoS (regex catastrophic backtracking)	`re2` engine (linear time, no backtracking); pattern length cap 500 chars; per-rule eval timeout 50 ms; auto-disable on timeout via `firewall.rule.degraded.v1`; pattern admission ReDoS screen tests against known catastrophic patterns
CEL expression sandbox escape	Whitelisted function set; `os.system`/file IO/network forbidden; AST validation at admission rejects with HTTP 422 `RULE_UNSAFE_EXPRESSION`
Replay of cached verdict to bypass updated rules	Verdict cache TTL ≤ 60 s; on rule version bump, `fw:verdict:*` keys are invalidated by pattern deletion
Audit log tampering at DB level	Postgres rules reject UPDATE/DELETE; hash-chain verified offline daily; replication to `mzr` provides independent attest
Audit log tampering at archive	MinIO Object Lock Compliance mode (7y); HSM-signed Parquet exports; signatures verified by regulator on import
Compromised quarantine release endpoint	Dual-approval; both approvers' identities recorded in `firewall.quarantine.released.v1`; review notes mandatory
Lateral movement via Postgres credential	Vault dynamic credentials (24h); least-privilege DB role (no DROP, no superuser); pgaudit logs all credentialed sessions
Quarantine PDU exfiltration via /quarantine/{holdId}	RLS enforces noc/tns-admin only; decrypt action audited to `firewall.access_log`; PDU plaintext never logged; download endpoints are streaming (no response cache)
Inference-pipeline manipulation (poisoned classifier output)	Local LLM only; grammar-constrained JSON decode rejects malformed; classifier alone never escalates beyond FLAG; pairing-rule §5.1 in AI_INTEGRATION

Right to erasure (GDPR Art. 17): firewall does not own subscriber records. On auth.user.erased.v1, the firewall participates by:
- Redacting dst_msisdn / src_msisdn in firewall.audit rows older than the regulator floor (7 years) → impossible because of append-only invariant; instead, future archive partitions are pseudonymised at archive-time using a per-tenant pseudonymisation key for archived rows older than the regulatory floor.
- For rows within the 7-year regulatory window, no erasure (regulator obligation overrides per ATRA Art. 18).
Audit evidence window: ≥ 7 years (regulated). Regulator submissions via firewall.federation.exported.v1 and the /v1/internal/firewall/blocklist/export endpoint.
Data residency: all PII stays in Afghanistan (kbl + mzr regions). Cold archive in dxb (UAE leaf) is encrypted with a key escrowed in Afghan-operator-controlled HSM partition; UAE side cannot decrypt without Afghan key release.
Sub-processor list: none — all infra in-cluster.

10. Security testing

Contract tests per API_CONTRACTS §7.
Property-based rule-evaluator tests including known ReDoS patterns, CEL injection attempts, Unicode edge cases.
ZAP baseline + API scan on every main-branch build.
Quarterly external pen test scoped to firewall REST + gRPC surface.
Role-matrix integration test — every endpoint × every role × every PII field — verifies 200/403/redaction behaviour.
Hash-chain tamper-evidence test in CI: insert tampered row directly via raw SQL → AuditVerifierWorker emits chain.break within 1h.
Federation signature-injection test: corrupt 1 byte of regulator event → service rejects + emits signature.invalid event.
Secret scanning in CI (gitleaks); dependency scanning (osv-scanner); container scanning (trivy); SBOM generated per release.

1. Authentication​

1.1 gRPC plane — FilterInbound, EvaluateTransit, CheckOutboundEgress, GetVerdict​

1.2 REST admin plane​

1.3 Internal NATS-event signing​

2. Authorisation (RBAC)​

Enforcement layers​

3. Data protection​

3.1 PII inventory & classification​

3.2 Encryption keys​

3.3 Redaction rules​

3.4 Classifier data residency​

4. Audit & integrity​

4.1 Append-only verdict audit (firewall.audit)​

4.2 Admin-action audit (firewall.blocklist_audit + firewall.outbox)​

4.3 Meta-audit (audit of audit reads)​

5. Fail-closed posture​

6. Tenant isolation​

7. Secrets​

8. Threat model​

9. GDPR & regulatory​

10. Security testing​