Compliance Engine — Security Model

Status: populated Owner: Security + Trust & Safety Last updated: 2026-04-19 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · 13 Security, Compliance, Tenancy

1. Authentication

1.1 gRPC plane — `EvaluateCompliance`

mTLS required. The gRPC server accepts only connections presenting a client certificate signed by the platform CA.
The only authorised client is sms-orchestrator (CN sms-orchestrator, issued by the platform CA). A SAN allowlist pins the exact expected identity; other certs are rejected with UNAUTHENTICATED.
Client certs are mounted from Vault via the Vault Agent Sidecar Injector and rotated every 30 days; the server hot-reloads on file change.
Local dev bypass via GRPC_TLS_ENABLED=false is prohibited in any non-local environment (enforced by a start-up guard that refuses to boot with TLS disabled when NODE_ENV != 'development').

1.2 REST plane — admin + tenant portal

Kong validates the platform JWT (issued by auth-service, RS256, JWKS-backed). Requests without a valid token are rejected at the edge.
Kong forwards X-Tenant-Id, X-Account-Id, X-User-Id, X-Roles headers on the authenticated upstream request. compliance-engine never parses JWTs directly — it trusts Kong's injection.
For tenant portal endpoints (/v1/portal/compliance/*), compliance-engine sets SET LOCAL app.current_tenant_id = <X-Tenant-Id> per request; Row-Level Security on hold_queue enforces tenant isolation at the DB level (belt and braces).

1.3 IdP-agnostic

Per ADR-0002, the identity provider that authenticated the caller (Keycloak default / tenant OIDC / tenant SAML / Firebase legacy) is irrelevant to compliance-engine. All it ever sees is the platform JWT. The idp claim is captured in audit_log.before/after for forensic purposes only.

2. Authorization (RBAC)

Compliance operations require specific platform roles. Role definitions live in auth-service; compliance-engine enforces scope/role checks at the handler boundary using NestJS Guards.

Role	Capabilities
`platform.compliance.admin`	Full CRUD on rules, rule-sets, blocklists, keyword lists; tenant tier override; bulk-review; view full message body in hold queue; activate default rule set
`platform.compliance.reviewer`	Read hold queue (body redacted); perform single-item `RELEASE` / `REJECT`; read tenant scores
`platform.auditor`	Read-only on audit log, reports, tenant scores, violations; no message-body access; cannot change any state
`platform.support`	Read-only on hold queue (heavily redacted) for L2 support triage
Tenant `sms:compliance:read` scope	Read own tenant's holds, score, violations via `/v1/portal/compliance/*`
Tenant `sms:compliance:appeal` scope	Submit appeals on own held/blocked messages

Enforcement points:

NestJS RoleGuard — runs first, rejects with 403 INSUFFICIENT_SCOPE before handler entry.
Per-handler @RequireRoles(...) decorator — declarative and contract-tested.
Postgres RLS on hold_queue — final defence for tenant isolation.
Body-redaction interceptor — serializes payload.body and payload.to only when the caller role is platform.compliance.admin; otherwise replaces with <redacted> and the masked number.

3. Data protection

3.1 PII inventory & classification

Field	Classification	Storage	Transit
`hold_queue.payload.body`	RESTRICTED (message content)	Encrypted at rest via per-tenant KMS envelope (AES-256-GCM; DEK wrapped by tenant-scoped KEK in Vault Transit)	TLS 1.2+ only
`hold_queue.payload.to`	CONFIDENTIAL (phone number)	Not encrypted at field level; relies on DB + disk encryption; API-level masking outside `platform.compliance.admin`	TLS 1.2+
`hold_queue.payload.senderId`	INTERNAL	Plain	TLS 1.2+
`evaluation_log.findings[].evidence`	CONFIDENTIAL (partially redacted)	Plain	TLS 1.2+
`dlr_stats.*`	INTERNAL (aggregates)	Plain	TLS 1.2+
`audit_log.*` (may contain redacted excerpts)	CONFIDENTIAL	Plain	TLS 1.2+

3.2 Encryption keys

Key	Store	Rotation
Per-tenant KEK for `hold_queue.payload`	Vault Transit (`transit/ghasi-compliance-<tenantId>`)	Annual, or on tenant request / incident
DEK for each `hold_queue.payload` row	Wrapped inline, unwrapped per read via Vault Transit	Implicit (DEKs are per-row)
mTLS certs (server + client)	Vault PKI engine	30 days
LLM API tokens (external fallback)	Vault KV	On provider request or quarterly
JWT validation: no keys stored — JWKS pulled from auth-service via Kong	—	N/A

3.3 Redaction rules

In events. to appears only as toMasked (+CCNNN***). body never appears. Rule evidence strings have the matching span replaced by *** with ≥ 4-char prefix/suffix context.
In logs. Pino redactor masks to and forbids body entirely. An ESLint rule forbids logger.info(..., { body }) patterns at PR time.
In REST responses. The body-redaction interceptor replaces payload.body outside platform.compliance.admin. payload.to is replaced with toMasked unless caller role = platform.compliance.admin.

3.4 LLM data residency

Local LLM is primary (on-cluster compliance-ai). Message body is sent only after anonymisation (see APPLICATION_LOGIC ANONYMIZE_BODY_BEFORE_AI=true).
External LLM fallback is region-scoped. A per-tenant compliance.ai_fallback_allowed flag (managed in admin-dashboard) gates external calls. Tenants in regulated regions (EU, sovereign-cloud markets) have this flag forced to false; fallbackAction=HOLD applies instead.
No prompt or completion is logged by compliance-engine beyond latency/status metrics. The LLM cache key is sha256(anonymisedBody) so the cache reveals no PII.

4. Audit

All state changes are recorded in compliance.audit_log with actor, before/after snapshots, IP, user agent, and trace ID. The table is append-only at the database level (Postgres rules reject UPDATE and DELETE). Retention ≥ 13 months, enforced by partition pruning.

Events mirroring audit-relevant state changes are emitted on compliance.audit.v1 (per evaluation) and compliance.rule.changed.v1 (per config change). analytics-service subscribes for long-term archival; admin-dashboard subscribes for live views.

5. Fail-closed posture

Compliance-engine is designed to never let an un-evaluated message through. Operational implications:

gRPC handler errors and timeouts translate to NATS non-ack in sms-orchestrator. The message stays in EVALUATING, retries, and eventually moves to DLQ with reason compliance_unavailable.
AI rule with fallbackAction = HOLD becomes HOLD if the LLM is unavailable.
Redis unavailability: rule/ruleset caches miss and load from DB; RATE_VOLUME rules fail-closed to HOLD.
DB unavailability: INTERNAL returned; orchestrator redelivers.
A circuit-breaker trip on external LLM short-circuits to local LLM; if local LLM also trips, AI-classification rules apply fallbackAction.

Security-wise, fail-closed means availability attacks cannot bypass policy — at worst they delay dispatch, which is acceptable.

6. Tenant isolation

Postgres RLS on hold_queue keyed on tenant_id.
ruleset and evaluation caches in Redis are keyed by {tenantId} so one tenant's cache invalidation never affects another.
Per-tenant KEK for body encryption ensures a key compromise is scoped.
platform.compliance.* roles are platform-wide by definition; there is no tenant-scoped equivalent (tenants self-serve via portal endpoints only, with RLS enforcement).

7. Secrets

Secret	Store	Injected as
gRPC server cert + key	Vault PKI → K8s Secret (Vault Agent)	File mount
gRPC client cert for sms-orchestrator	Vault PKI → K8s Secret	File mount (in orchestrator pod)
PostgreSQL credentials	Vault DB dynamic secret	Env var
Redis credentials	Vault KV	Env var
NATS credentials	Vault KV	Env var
KMS keys for hold payload encryption	Vault Transit (referenced, not exported)	—
External LLM API keys	Vault KV (per tenant / per region)	Env var

No secret is ever written to logs, events, or config files. Pre-commit gitleaks scan blocks accidental commits.

8. Threat model

Threat	Mitigation
Malicious admin authors a rule that short-circuits BLOCK to ALLOW	Rule mutations are audit-logged with `before`/`after`; `compliance.rule.changed.v1` fans out to the SOC channel; a second-reviewer workflow is in roadmap (EP-CE-04 hardening)
Compromised sms-orchestrator pod floods `EvaluateCompliance`	mTLS + per-cert rate limit + autoscaler; evaluation is cached so replay cost is bounded
Attacker registers a COMPOSITE rule loop to DoS the engine	Cycle detection on save + max depth 5 at eval + hard 10 ms per regex + 450 ms budget
ReDoS via REGEX rule	`re2` engine (linear time) + pattern length cap + eval-time timeout
PII leak via rule `evidence`	Redaction in evaluator; contract tests verify no raw body passes
Review API abuse (unauthorised release)	RBAC + optimistic lock + mandatory `reviewNotes` + full audit of before/after
Replay of LLM cache to infer traffic	Cache keyed by `sha256(anonymisedBody)`; anonymisation strips phone numbers, amounts, OTPs before hashing
External LLM exfiltration of message content	Disabled in regulated regions; otherwise anonymisation pre-processor + per-tenant opt-out
Auditor abuses unrestricted read to export PII	`platform.auditor` cannot read raw body; audit-log entries recording an auditor's reads are also recorded (meta-audit)

Right to erasure (GDPR Art. 17): auth.user.erased.v1 is consumed by compliance-engine. The consumer:
- Redacts hold_queue.payload.body and hold_queue.payload.to for the affected user/tenant (sets them to null with a tombstone note).
- Does not delete audit_log or evaluation_log rows; retention for regulatory evidence overrides the erasure right for the data fields required by law. Affected fields are redacted/pseudonymised per regulator guidance.
Data portability: tenants can export their own compliance score history, violations, and held-message metadata (never content) via /v1/portal/compliance/* and the report generator.
Audit evidence window: ≥ 13 months rolling. Regulators receive exports via TENANT_AUDIT reports, role-gated to platform.auditor.
Sub-processor list: external LLMs, when enabled, appear on the platform sub-processor list with DPA in place.

10. Security testing

Contract tests per API_CONTRACTS §6.
Property-based tests on rule evaluators (hypothesis-style) including known ReDoS patterns, Unicode edge cases, worst-case composites.
ZAP baseline + API scan run on each main-branch build.
Quarterly penetration test scoped to the compliance-engine REST surface.
Role-matrix integration test — every endpoint × every role — verifies 200/403 behaviour.
Secret scanning in CI (gitleaks); dependency scanning (osv-scanner); container scanning (trivy).

1. Authentication​

1.1 gRPC plane — EvaluateCompliance​

1.2 REST plane — admin + tenant portal​

1.3 IdP-agnostic​

2. Authorization (RBAC)​

3. Data protection​

3.1 PII inventory & classification​

3.2 Encryption keys​

3.3 Redaction rules​

3.4 LLM data residency​

4. Audit​

5. Fail-closed posture​

6. Tenant isolation​

7. Secrets​

8. Threat model​

9. GDPR & regulatory​

10. Security testing​