SMS Firewall Service — Security Model
Version: 1.0 Status: Draft Owner: Trust & Safety + Security Last Updated: 2026-04-21 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · 13 Security, Compliance, Tenancy Related ADR: ADR-0004 §11–§12
1. Authentication
1.1 gRPC plane — FilterInbound, EvaluateTransit, CheckOutboundEgress, GetVerdict
- mTLS required. The gRPC server accepts only connections presenting an X.509 client certificate signed by the platform CA and bearing a SPIFFE ID matching the per-RPC allowlist:
| RPC | Allowed peer SPIFFE IDs |
|---|---|
FilterInbound | spiffe://ghasi/np-data/smpp-connector-{awcc,roshan,etisalat,mtn-af,salaam}-rx, -trx |
EvaluateTransit | spiffe://ghasi/np-data/smpp-connector-transit-rx |
CheckOutboundEgress | spiffe://ghasi/np-ctrl/routing-engine, spiffe://ghasi/np-ctrl/channel-router-service |
GetVerdict | spiffe://ghasi/np-data/cdr-mediation-service, spiffe://ghasi/np-data/fraud-intel-service |
RegisterBindHeartbeat | spiffe://ghasi/np-data/smpp-connector-* |
- Client certs are issued by SPIRE (per ADR-0004 §12) and rotated every 1 hour. The server hot-reloads on file change via inotify on the SVID mount path.
- Local dev bypass via
GRPC_TLS_ENABLED=falseis a start-up guard rejected whenNODE_ENV != 'development'.
1.2 REST admin plane
- Kong validates the platform JWT (issued by
auth-service, RS256, JWKS-backed). No JWT → 401 at the edge. - Kong forwards
X-User-Id,X-Roles,X-Tenant-Id(null for platform-scoped admin), andX-Trace-Idheaders. The firewall never parses JWTs directly — it trusts Kong. - For cross-region admin failover (
mzroperating-mode handoff duringkbloutage), Kong is region-pinned; themzrKong issues its own forwarding headers based on the same auth-service JWKS.
1.3 Internal NATS-event signing
- Every event published to
FIREWALL_*JetStream streams is signed with the firewall service's Ed25519 key (rotated quarterly via Vault). - Critical consumers (
regulator-portal-service,cdr-mediation-service) verify the signature opportunistically; analytics-service does not (best-effort). - Inbound
regulator.blocklist.published.v1events are validated against the regulator's HSM public key (PKCS#11) before consumption — failure →firewall.alert.federation.signature.invalid.v1(PagerDuty).
2. Authorisation (RBAC)
Role definitions live in auth-service; the firewall enforces scope/role checks at the handler boundary using NestJS Guards.
| Role | Capabilities |
|---|---|
tns-admin | Full CRUD on rules, blocklists, MNO binds, peer aggregators; mode switching (one of two approvers); decrypt quarantine PDU bodies; full MSISDN visibility |
noc | Read quarantine queue; release/reject single quarantine entries (one of two approvers for release); read audit log; mode switching (second approver) |
carrier-relations | CRUD on MNO bind registry, peer aggregators, peer routes; read-only on rules/blocklists |
regulator-auditor | Read audit log + blocklist history + federation log; MSISDN-masked responses; cannot view PDU bodies; cannot modify any state |
tns-reader | Read-only on rules + blocklists; no PII access |
service (programmatic) | gRPC SVID-authenticated; per-RPC allowlist (see §1.1) |
Enforcement layers
- Kong: rejects un-authenticated requests at the edge.
- NestJS
@RequireRoles(...)decorator + RoleGuard: rejects insufficient roles with 403INSUFFICIENT_SCOPEbefore handler entry. - Postgres RLS on
firewall.quarantine_queue:app.caller_roleset per request; onlynoc/tns-adminmay SELECT. - Body-decryption interceptor: PDU plaintext decryption (Vault Transit unwrap of DEK) only inside handlers gated by role check; any code path that would log plaintext is forbidden by ESLint rule.
- MSISDN-masking interceptor:
regulator-auditorresponses run through the masker (+CCNNN***); platform-internal scopes pass through. - Dual-approval check: mode-switch and quarantine-release endpoints require two distinct user IDs within 60 s — enforced in the use-case handler.
3. Data protection
3.1 PII inventory & classification
| Field | Classification | Storage | Transit |
|---|---|---|---|
firewall.audit.src_msisdn / dst_msisdn | CONFIDENTIAL (E.164 PII) | Plain in Postgres (TDE + disk encryption); masked in REST responses to regulator-auditor; masked in NATS events to all consumers | TLS 1.3 only |
firewall.quarantine_queue.original_pdu_cipher | RESTRICTED (full PDU body) | AES-256-GCM at rest; per-MNO KEK in Vault Transit (transit/ghasi-firewall-mno-{mnoId}); decrypted only inside noc-gated handlers | TLS 1.3 |
firewall.audit.pdu_body_sha256 | INTERNAL | One-way hash; no PII recoverable | TLS 1.3 |
firewall.blocklist_entries.value (when type=MSISDN) | CONFIDENTIAL | Plain; masked to regulator-auditor in API | TLS 1.3 |
firewall.dnd_snapshot.msisdn | CONFIDENTIAL | Plain; never exposed via REST (internal mesh-only) | TLS 1.3 |
Rule evidence strings (in ruleHits) | CONFIDENTIAL | Redacted at write-time (matched span replaced with *** + 4-char prefix/suffix context) | TLS 1.3 |
| Federation export (JSON Lines + .sig) | INTERNAL | Plain in MinIO Object-Lock; signature non-secret | TLS 1.3 |
3.2 Encryption keys
| Key | Store | Rotation | Algorithm |
|---|---|---|---|
| Per-MNO KEK for quarantine PDU encryption | Vault Transit | 365 days OR on incident | AES-256 wrap (Vault Transit) |
| DEK per-quarantine row | Wrapped inline; unwrapped per read via Vault Transit | Implicit (per-row) | AES-256-GCM |
| mTLS server cert | SPIRE (workload SVID) | 1 hour | EC P-256 |
| mTLS client cert (in caller pods) | SPIRE | 1 hour | EC P-256 |
| Federation export PKCS#11 HSM signing key | Hardware HSM (Thales Luna or AWS CloudHSM in dxb) | 365 days | RSA-3072 PKCS#1v1.5 |
| Event-signing Ed25519 key | Vault KV (rotated cryptographically) | 90 days | Ed25519 |
| Postgres TDE master | KMS (Vault Transit + cluster-managed) | 365 days | AES-256 |
| MinIO archive encryption | KMS (per-bucket); per-day data key rotated by HSM | Daily | AES-256-GCM |
3.3 Redaction rules
- In events. MSISDNs appear as
toMasked/srcMsisdnMasked(+CCNNN***). PDU body never appears (onlypduBodySha256). Ruleevidencestrings are redacted with***for the matched span. - In logs. Pino redactor masks all fields named
srcMsisdn,dstMsisdn,senderId,pduBody. ESLint ruleno-pdu-body-in-logsforbidslogger.{info|warn|error}(..., { pduBody })patterns at PR time. - In REST responses. A response interceptor applies role-based masking before serialization:
tns-admin: full visibilitynoc: full MSISDNs but PDU body only via explicit decrypt endpointregulator-auditor: masked MSISDNs; no PDU body accesstns-reader: masked MSISDNs; no PDU body
- In quarantine decrypt: plaintext PDU body returned only via
GET /v1/admin/firewall/quarantine/{holdId}tonoc/tns-admin; the decrypt action itself is audited infirewall.access_log.
3.4 Classifier data residency
- The CLASSIFIER LLM is strictly on-cluster (
local-llm-service). External LLM calls are architecturally forbidden — the firewall binary refuses to start withEXTERNAL_LLM_ENABLED=true. See AI_INTEGRATION §10. - PII is anonymised before LLM inference (defence-in-depth even on-cluster).
- Cache key is
sha256(piiRedactedBody)— no PII recoverable from cache.
4. Audit & integrity
4.1 Append-only verdict audit (firewall.audit)
- Hash-chained. Every row carries
prev_hash(= previous row'srow_hashin the same partition) androw_hash(= SHA-256(prev_hash ‖ canonicalJson(row))). - Append-only at DB level. Postgres rules reject UPDATE/DELETE.
- Offline integrity verification.
AuditVerifierWorkerruns daily at 03:30 Asia/Kabul over the prior day's partition; replays the chain end-to-end; on break, emitsfirewall.audit.chain.break.v1(CRITICAL — PagerDuty + auto-engage Security incident response). - Cold archive. Daily Parquet+zstd export to MinIO
firewall-audit-archive/{yyyymmdd}.parquet.zst.sigwith HSM signature; Object Lock Compliance mode, 7-year retention. - Cross-region mirror. JetStream
FIREWALL_AUDITmirrored kbl→mzr; leaf-mirrored todxbcold archive. Loss ofkbldoes not lose audit evidence.
4.2 Admin-action audit (firewall.blocklist_audit + firewall.outbox)
Every admin REST action emits firewall.rule.changed.v1 (or firewall.blocklist.changed.v1, etc.) with:
actorUserIdbefore/aftersnapshots (JSONB)ip,userAgenttraceIdoccurredAt(UTC µs)
Retention ≥ 13 months for admin events; permanent for blocklist audit (regulator dispute resolution).
4.3 Meta-audit (audit of audit reads)
When regulator-auditor queries /v1/admin/firewall/audit, the query itself is recorded into firewall.access_log. This deters and detects attempts to enumerate sensitive ranges by abusing read access.
5. Fail-closed posture
The firewall is the national perimeter — security is preserved by never letting an un-evaluated PDU through:
- gRPC handler error / timeout →
UNAVAILABLE→ connector behaviour:- Transit MT →
submit_sm_resp ESME_RSUBMITFAILto peer (silent block; subscriber not impacted) - MO → connector writes to local-disk WAL; MNO
deliver_sm_resp ESME_ROK(preserves subscriber relationship); replays once firewall back
- Transit MT →
- Postgres unavailable → service degrades to Redis-only verdict cache; new evaluations after 60s of cache miss →
INTERNAL→ fail-closed - Vault Transit unavailable → quarantine inserts cannot encrypt → new QUARANTINE verdicts upgrade to BLOCK with
flag=KEK_UNAVAILABLE(still fail-closed) - HSM unavailable → federation export postponed; emits
firewall.federation.export.postponed.v1 - Local LLM unavailable → CLASSIFIER rules skip; non-classifier rules continue (reduced detection coverage but firewall still operational)
Security implication: availability attacks against the firewall cannot weaken policy — they can at worst delay traffic.
6. Tenant isolation
- Inbound MO is pre-tenant — there is no tenant ID on the verdict path. Isolation is by
mnoBindIdandsrcMsisdn. - Egress DND check is per-tenant (called by routing-engine with
tenantId); tenant-specific allowlists infirewall.rate_overridesprovide isolation. - Per-MNO KEKs ensure quarantine PDU compromise of one MNO's key does not expose another MNO's holds.
7. Secrets
| Secret | Store | Injected as |
|---|---|---|
| gRPC server SVID + key | SPIRE → tmpfs mount | File mount, hot-reloaded |
| gRPC client SVIDs (in caller pods) | SPIRE | File mount in caller pod |
| Postgres credentials | Vault DB dynamic secret (24h) | Env var (rotated by Vault Agent) |
| Redis credentials | Vault KV | Env var |
| NATS credentials | Vault KV | Env var |
| KEKs for quarantine (per MNO) | Vault Transit (referenced, not exported) | — |
| HSM signer | PKCS#11 token (no plaintext export) | PKCS#11 URI in env |
| Event-signing Ed25519 key | Vault KV | Env var |
| Regulator HSM public key | Vault KV (rotated when regulator rotates) | File mount |
gitleaks and trufflehog pre-commit + CI scans block accidental commits. No secret is ever written to logs, events, or config files.
8. Threat model
| Threat | Mitigation |
|---|---|
| MNO bind compromise → attacker-controlled MO injection | mTLS SVID per connector; SPIFFE ID strict allowlist; per-bind concurrency cap 200 in-flight; rate governor per srcMsisdn; bind heartbeat watchdog (firewall.alert.bind.missing.v1) |
| Admin token leakage → unauthorised rule edits / blocklist tampering | MFA required for tns-admin (auth-service enforced); all admin actions audit-logged with before/after; firewall.rule.changed.v1 fans out to SOC channel; rule mutations require dual approval for action=ALLOW (whitelist) on inbound MO |
| Blocklist injection via federation source poisoning | HSM signature validation on regulator events; federation entries enter probation (QUARANTINE for 24h) until confidenceScore >= 0.8; per-source confidence_count capped (one source can't push score alone) |
| Bloom-filter poisoning | Bloom is rebuilt from authoritative firewall.blocklist_entries table (which is hash-audited); a poisoned Bloom only causes false-positive Postgres reads (no false-negative); rebuild verifies row-count delta < 5% per cycle |
| Rule expression DoS (regex catastrophic backtracking) | re2 engine (linear time, no backtracking); pattern length cap 500 chars; per-rule eval timeout 50 ms; auto-disable on timeout via firewall.rule.degraded.v1; pattern admission ReDoS screen tests against known catastrophic patterns |
| CEL expression sandbox escape | Whitelisted function set; os.system/file IO/network forbidden; AST validation at admission rejects with HTTP 422 RULE_UNSAFE_EXPRESSION |
| Replay of cached verdict to bypass updated rules | Verdict cache TTL ≤ 60 s; on rule version bump, fw:verdict:* keys are invalidated by pattern deletion |
| Audit log tampering at DB level | Postgres rules reject UPDATE/DELETE; hash-chain verified offline daily; replication to mzr provides independent attest |
| Audit log tampering at archive | MinIO Object Lock Compliance mode (7y); HSM-signed Parquet exports; signatures verified by regulator on import |
| Compromised quarantine release endpoint | Dual-approval; both approvers' identities recorded in firewall.quarantine.released.v1; review notes mandatory |
| Lateral movement via Postgres credential | Vault dynamic credentials (24h); least-privilege DB role (no DROP, no superuser); pgaudit logs all credentialed sessions |
| Quarantine PDU exfiltration via /quarantine/{holdId} | RLS enforces noc/tns-admin only; decrypt action audited to firewall.access_log; PDU plaintext never logged; download endpoints are streaming (no response cache) |
| Inference-pipeline manipulation (poisoned classifier output) | Local LLM only; grammar-constrained JSON decode rejects malformed; classifier alone never escalates beyond FLAG; pairing-rule §5.1 in AI_INTEGRATION |
9. GDPR & regulatory
- Right to erasure (GDPR Art. 17): firewall does not own subscriber records. On
auth.user.erased.v1, the firewall participates by:- Redacting
dst_msisdn/src_msisdninfirewall.auditrows older than the regulator floor (7 years) → impossible because of append-only invariant; instead, future archive partitions are pseudonymised at archive-time using a per-tenant pseudonymisation key for archived rows older than the regulatory floor. - For rows within the 7-year regulatory window, no erasure (regulator obligation overrides per ATRA Art. 18).
- Redacting
- Audit evidence window: ≥ 7 years (regulated). Regulator submissions via
firewall.federation.exported.v1and the/v1/internal/firewall/blocklist/exportendpoint. - Data residency: all PII stays in Afghanistan (
kbl+mzrregions). Cold archive indxb(UAE leaf) is encrypted with a key escrowed in Afghan-operator-controlled HSM partition; UAE side cannot decrypt without Afghan key release. - Sub-processor list: none — all infra in-cluster.
10. Security testing
- Contract tests per API_CONTRACTS §7.
- Property-based rule-evaluator tests including known ReDoS patterns, CEL injection attempts, Unicode edge cases.
- ZAP baseline + API scan on every main-branch build.
- Quarterly external pen test scoped to firewall REST + gRPC surface.
- Role-matrix integration test — every endpoint × every role × every PII field — verifies 200/403/redaction behaviour.
- Hash-chain tamper-evidence test in CI: insert tampered row directly via raw SQL → AuditVerifierWorker emits
chain.breakwithin 1h. - Federation signature-injection test: corrupt 1 byte of regulator event → service rejects + emits
signature.invalidevent. - Secret scanning in CI (
gitleaks); dependency scanning (osv-scanner); container scanning (trivy); SBOM generated per release.