Skip to main content

Compliance Engine — Security Model

Status: populated Owner: Security + Trust & Safety Last updated: 2026-04-19 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · 13 Security, Compliance, Tenancy

1. Authentication

1.1 gRPC plane — EvaluateCompliance

  • mTLS required. The gRPC server accepts only connections presenting a client certificate signed by the platform CA.
  • The only authorised client is sms-orchestrator (CN sms-orchestrator, issued by the platform CA). A SAN allowlist pins the exact expected identity; other certs are rejected with UNAUTHENTICATED.
  • Client certs are mounted from Vault via the Vault Agent Sidecar Injector and rotated every 30 days; the server hot-reloads on file change.
  • Local dev bypass via GRPC_TLS_ENABLED=false is prohibited in any non-local environment (enforced by a start-up guard that refuses to boot with TLS disabled when NODE_ENV != 'development').

1.2 REST plane — admin + tenant portal

  • Kong validates the platform JWT (issued by auth-service, RS256, JWKS-backed). Requests without a valid token are rejected at the edge.
  • Kong forwards X-Tenant-Id, X-Account-Id, X-User-Id, X-Roles headers on the authenticated upstream request. compliance-engine never parses JWTs directly — it trusts Kong's injection.
  • For tenant portal endpoints (/v1/portal/compliance/*), compliance-engine sets SET LOCAL app.current_tenant_id = <X-Tenant-Id> per request; Row-Level Security on hold_queue enforces tenant isolation at the DB level (belt and braces).

1.3 IdP-agnostic

Per ADR-0002, the identity provider that authenticated the caller (Keycloak default / tenant OIDC / tenant SAML / Firebase legacy) is irrelevant to compliance-engine. All it ever sees is the platform JWT. The idp claim is captured in audit_log.before/after for forensic purposes only.

2. Authorization (RBAC)

Compliance operations require specific platform roles. Role definitions live in auth-service; compliance-engine enforces scope/role checks at the handler boundary using NestJS Guards.

RoleCapabilities
platform.compliance.adminFull CRUD on rules, rule-sets, blocklists, keyword lists; tenant tier override; bulk-review; view full message body in hold queue; activate default rule set
platform.compliance.reviewerRead hold queue (body redacted); perform single-item RELEASE / REJECT; read tenant scores
platform.auditorRead-only on audit log, reports, tenant scores, violations; no message-body access; cannot change any state
platform.supportRead-only on hold queue (heavily redacted) for L2 support triage
Tenant sms:compliance:read scopeRead own tenant's holds, score, violations via /v1/portal/compliance/*
Tenant sms:compliance:appeal scopeSubmit appeals on own held/blocked messages

Enforcement points:

  1. NestJS RoleGuard — runs first, rejects with 403 INSUFFICIENT_SCOPE before handler entry.
  2. Per-handler @RequireRoles(...) decorator — declarative and contract-tested.
  3. Postgres RLS on hold_queue — final defence for tenant isolation.
  4. Body-redaction interceptor — serializes payload.body and payload.to only when the caller role is platform.compliance.admin; otherwise replaces with <redacted> and the masked number.

3. Data protection

3.1 PII inventory & classification

FieldClassificationStorageTransit
hold_queue.payload.bodyRESTRICTED (message content)Encrypted at rest via per-tenant KMS envelope (AES-256-GCM; DEK wrapped by tenant-scoped KEK in Vault Transit)TLS 1.2+ only
hold_queue.payload.toCONFIDENTIAL (phone number)Not encrypted at field level; relies on DB + disk encryption; API-level masking outside platform.compliance.adminTLS 1.2+
hold_queue.payload.senderIdINTERNALPlainTLS 1.2+
evaluation_log.findings[].evidenceCONFIDENTIAL (partially redacted)PlainTLS 1.2+
dlr_stats.*INTERNAL (aggregates)PlainTLS 1.2+
audit_log.* (may contain redacted excerpts)CONFIDENTIALPlainTLS 1.2+

3.2 Encryption keys

KeyStoreRotation
Per-tenant KEK for hold_queue.payloadVault Transit (transit/ghasi-compliance-<tenantId>)Annual, or on tenant request / incident
DEK for each hold_queue.payload rowWrapped inline, unwrapped per read via Vault TransitImplicit (DEKs are per-row)
mTLS certs (server + client)Vault PKI engine30 days
LLM API tokens (external fallback)Vault KVOn provider request or quarterly
JWT validation: no keys stored — JWKS pulled from auth-service via KongN/A

3.3 Redaction rules

  • In events. to appears only as toMasked (+CCNNN***). body never appears. Rule evidence strings have the matching span replaced by *** with ≥ 4-char prefix/suffix context.
  • In logs. Pino redactor masks to and forbids body entirely. An ESLint rule forbids logger.info(..., { body }) patterns at PR time.
  • In REST responses. The body-redaction interceptor replaces payload.body outside platform.compliance.admin. payload.to is replaced with toMasked unless caller role = platform.compliance.admin.

3.4 LLM data residency

  • Local LLM is primary (on-cluster compliance-ai). Message body is sent only after anonymisation (see APPLICATION_LOGIC ANONYMIZE_BODY_BEFORE_AI=true).
  • External LLM fallback is region-scoped. A per-tenant compliance.ai_fallback_allowed flag (managed in admin-dashboard) gates external calls. Tenants in regulated regions (EU, sovereign-cloud markets) have this flag forced to false; fallbackAction=HOLD applies instead.
  • No prompt or completion is logged by compliance-engine beyond latency/status metrics. The LLM cache key is sha256(anonymisedBody) so the cache reveals no PII.

4. Audit

All state changes are recorded in compliance.audit_log with actor, before/after snapshots, IP, user agent, and trace ID. The table is append-only at the database level (Postgres rules reject UPDATE and DELETE). Retention ≥ 13 months, enforced by partition pruning.

Events mirroring audit-relevant state changes are emitted on compliance.audit.v1 (per evaluation) and compliance.rule.changed.v1 (per config change). analytics-service subscribes for long-term archival; admin-dashboard subscribes for live views.

5. Fail-closed posture

Compliance-engine is designed to never let an un-evaluated message through. Operational implications:

  • gRPC handler errors and timeouts translate to NATS non-ack in sms-orchestrator. The message stays in EVALUATING, retries, and eventually moves to DLQ with reason compliance_unavailable.
  • AI rule with fallbackAction = HOLD becomes HOLD if the LLM is unavailable.
  • Redis unavailability: rule/ruleset caches miss and load from DB; RATE_VOLUME rules fail-closed to HOLD.
  • DB unavailability: INTERNAL returned; orchestrator redelivers.
  • A circuit-breaker trip on external LLM short-circuits to local LLM; if local LLM also trips, AI-classification rules apply fallbackAction.

Security-wise, fail-closed means availability attacks cannot bypass policy — at worst they delay dispatch, which is acceptable.

6. Tenant isolation

  • Postgres RLS on hold_queue keyed on tenant_id.
  • ruleset and evaluation caches in Redis are keyed by {tenantId} so one tenant's cache invalidation never affects another.
  • Per-tenant KEK for body encryption ensures a key compromise is scoped.
  • platform.compliance.* roles are platform-wide by definition; there is no tenant-scoped equivalent (tenants self-serve via portal endpoints only, with RLS enforcement).

7. Secrets

SecretStoreInjected as
gRPC server cert + keyVault PKI → K8s Secret (Vault Agent)File mount
gRPC client cert for sms-orchestratorVault PKI → K8s SecretFile mount (in orchestrator pod)
PostgreSQL credentialsVault DB dynamic secretEnv var
Redis credentialsVault KVEnv var
NATS credentialsVault KVEnv var
KMS keys for hold payload encryptionVault Transit (referenced, not exported)
External LLM API keysVault KV (per tenant / per region)Env var

No secret is ever written to logs, events, or config files. Pre-commit gitleaks scan blocks accidental commits.

8. Threat model

ThreatMitigation
Malicious admin authors a rule that short-circuits BLOCK to ALLOWRule mutations are audit-logged with before/after; compliance.rule.changed.v1 fans out to the SOC channel; a second-reviewer workflow is in roadmap (EP-CE-04 hardening)
Compromised sms-orchestrator pod floods EvaluateCompliancemTLS + per-cert rate limit + autoscaler; evaluation is cached so replay cost is bounded
Attacker registers a COMPOSITE rule loop to DoS the engineCycle detection on save + max depth 5 at eval + hard 10 ms per regex + 450 ms budget
ReDoS via REGEX rulere2 engine (linear time) + pattern length cap + eval-time timeout
PII leak via rule evidenceRedaction in evaluator; contract tests verify no raw body passes
Review API abuse (unauthorised release)RBAC + optimistic lock + mandatory reviewNotes + full audit of before/after
Replay of LLM cache to infer trafficCache keyed by sha256(anonymisedBody); anonymisation strips phone numbers, amounts, OTPs before hashing
External LLM exfiltration of message contentDisabled in regulated regions; otherwise anonymisation pre-processor + per-tenant opt-out
Auditor abuses unrestricted read to export PIIplatform.auditor cannot read raw body; audit-log entries recording an auditor's reads are also recorded (meta-audit)

9. GDPR & regulatory

  • Right to erasure (GDPR Art. 17): auth.user.erased.v1 is consumed by compliance-engine. The consumer:
    • Redacts hold_queue.payload.body and hold_queue.payload.to for the affected user/tenant (sets them to null with a tombstone note).
    • Does not delete audit_log or evaluation_log rows; retention for regulatory evidence overrides the erasure right for the data fields required by law. Affected fields are redacted/pseudonymised per regulator guidance.
  • Data portability: tenants can export their own compliance score history, violations, and held-message metadata (never content) via /v1/portal/compliance/* and the report generator.
  • Audit evidence window: ≥ 13 months rolling. Regulators receive exports via TENANT_AUDIT reports, role-gated to platform.auditor.
  • Sub-processor list: external LLMs, when enabled, appear on the platform sub-processor list with DPA in place.

10. Security testing

  • Contract tests per API_CONTRACTS §6.
  • Property-based tests on rule evaluators (hypothesis-style) including known ReDoS patterns, Unicode edge cases, worst-case composites.
  • ZAP baseline + API scan run on each main-branch build.
  • Quarterly penetration test scoped to the compliance-engine REST surface.
  • Role-matrix integration test — every endpoint × every role — verifies 200/403 behaviour.
  • Secret scanning in CI (gitleaks); dependency scanning (osv-scanner); container scanning (trivy).