Compliance Engine — Security Model
Status: populated Owner: Security + Trust & Safety Last updated: 2026-04-19 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · 13 Security, Compliance, Tenancy
1. Authentication
1.1 gRPC plane — EvaluateCompliance
- mTLS required. The gRPC server accepts only connections presenting a client certificate signed by the platform CA.
- The only authorised client is
sms-orchestrator(CNsms-orchestrator, issued by the platform CA). A SAN allowlist pins the exact expected identity; other certs are rejected withUNAUTHENTICATED. - Client certs are mounted from Vault via the Vault Agent Sidecar Injector and rotated every 30 days; the server hot-reloads on file change.
- Local dev bypass via
GRPC_TLS_ENABLED=falseis prohibited in any non-local environment (enforced by a start-up guard that refuses to boot with TLS disabled whenNODE_ENV != 'development').
1.2 REST plane — admin + tenant portal
- Kong validates the platform JWT (issued by
auth-service, RS256, JWKS-backed). Requests without a valid token are rejected at the edge. - Kong forwards
X-Tenant-Id,X-Account-Id,X-User-Id,X-Rolesheaders on the authenticated upstream request.compliance-enginenever parses JWTs directly — it trusts Kong's injection. - For tenant portal endpoints (
/v1/portal/compliance/*),compliance-enginesetsSET LOCAL app.current_tenant_id = <X-Tenant-Id>per request; Row-Level Security onhold_queueenforces tenant isolation at the DB level (belt and braces).
1.3 IdP-agnostic
Per ADR-0002, the identity provider that authenticated the caller (Keycloak default / tenant OIDC / tenant SAML / Firebase legacy) is irrelevant to compliance-engine. All it ever sees is the platform JWT. The idp claim is captured in audit_log.before/after for forensic purposes only.
2. Authorization (RBAC)
Compliance operations require specific platform roles. Role definitions live in auth-service; compliance-engine enforces scope/role checks at the handler boundary using NestJS Guards.
| Role | Capabilities |
|---|---|
platform.compliance.admin | Full CRUD on rules, rule-sets, blocklists, keyword lists; tenant tier override; bulk-review; view full message body in hold queue; activate default rule set |
platform.compliance.reviewer | Read hold queue (body redacted); perform single-item RELEASE / REJECT; read tenant scores |
platform.auditor | Read-only on audit log, reports, tenant scores, violations; no message-body access; cannot change any state |
platform.support | Read-only on hold queue (heavily redacted) for L2 support triage |
Tenant sms:compliance:read scope | Read own tenant's holds, score, violations via /v1/portal/compliance/* |
Tenant sms:compliance:appeal scope | Submit appeals on own held/blocked messages |
Enforcement points:
- NestJS
RoleGuard— runs first, rejects with 403INSUFFICIENT_SCOPEbefore handler entry. - Per-handler
@RequireRoles(...)decorator — declarative and contract-tested. - Postgres RLS on
hold_queue— final defence for tenant isolation. - Body-redaction interceptor — serializes
payload.bodyandpayload.toonly when the caller role isplatform.compliance.admin; otherwise replaces with<redacted>and the masked number.
3. Data protection
3.1 PII inventory & classification
| Field | Classification | Storage | Transit |
|---|---|---|---|
hold_queue.payload.body | RESTRICTED (message content) | Encrypted at rest via per-tenant KMS envelope (AES-256-GCM; DEK wrapped by tenant-scoped KEK in Vault Transit) | TLS 1.2+ only |
hold_queue.payload.to | CONFIDENTIAL (phone number) | Not encrypted at field level; relies on DB + disk encryption; API-level masking outside platform.compliance.admin | TLS 1.2+ |
hold_queue.payload.senderId | INTERNAL | Plain | TLS 1.2+ |
evaluation_log.findings[].evidence | CONFIDENTIAL (partially redacted) | Plain | TLS 1.2+ |
dlr_stats.* | INTERNAL (aggregates) | Plain | TLS 1.2+ |
audit_log.* (may contain redacted excerpts) | CONFIDENTIAL | Plain | TLS 1.2+ |
3.2 Encryption keys
| Key | Store | Rotation |
|---|---|---|
Per-tenant KEK for hold_queue.payload | Vault Transit (transit/ghasi-compliance-<tenantId>) | Annual, or on tenant request / incident |
DEK for each hold_queue.payload row | Wrapped inline, unwrapped per read via Vault Transit | Implicit (DEKs are per-row) |
| mTLS certs (server + client) | Vault PKI engine | 30 days |
| LLM API tokens (external fallback) | Vault KV | On provider request or quarterly |
| JWT validation: no keys stored — JWKS pulled from auth-service via Kong | — | N/A |
3.3 Redaction rules
- In events.
toappears only astoMasked(+CCNNN***).bodynever appears. Ruleevidencestrings have the matching span replaced by***with ≥ 4-char prefix/suffix context. - In logs. Pino redactor masks
toand forbidsbodyentirely. An ESLint rule forbidslogger.info(..., { body })patterns at PR time. - In REST responses. The body-redaction interceptor replaces
payload.bodyoutsideplatform.compliance.admin.payload.tois replaced withtoMaskedunless caller role =platform.compliance.admin.
3.4 LLM data residency
- Local LLM is primary (on-cluster
compliance-ai). Message body is sent only after anonymisation (see APPLICATION_LOGICANONYMIZE_BODY_BEFORE_AI=true). - External LLM fallback is region-scoped. A per-tenant
compliance.ai_fallback_allowedflag (managed inadmin-dashboard) gates external calls. Tenants in regulated regions (EU, sovereign-cloud markets) have this flag forced tofalse;fallbackAction=HOLDapplies instead. - No prompt or completion is logged by compliance-engine beyond latency/status metrics. The LLM cache key is
sha256(anonymisedBody)so the cache reveals no PII.
4. Audit
All state changes are recorded in compliance.audit_log with actor, before/after snapshots, IP, user agent, and trace ID. The table is append-only at the database level (Postgres rules reject UPDATE and DELETE). Retention ≥ 13 months, enforced by partition pruning.
Events mirroring audit-relevant state changes are emitted on compliance.audit.v1 (per evaluation) and compliance.rule.changed.v1 (per config change). analytics-service subscribes for long-term archival; admin-dashboard subscribes for live views.
5. Fail-closed posture
Compliance-engine is designed to never let an un-evaluated message through. Operational implications:
- gRPC handler errors and timeouts translate to NATS non-ack in
sms-orchestrator. The message stays inEVALUATING, retries, and eventually moves to DLQ with reasoncompliance_unavailable. - AI rule with
fallbackAction = HOLDbecomes HOLD if the LLM is unavailable. - Redis unavailability: rule/ruleset caches miss and load from DB;
RATE_VOLUMErules fail-closed to HOLD. - DB unavailability:
INTERNALreturned; orchestrator redelivers. - A circuit-breaker trip on external LLM short-circuits to local LLM; if local LLM also trips, AI-classification rules apply
fallbackAction.
Security-wise, fail-closed means availability attacks cannot bypass policy — at worst they delay dispatch, which is acceptable.
6. Tenant isolation
- Postgres RLS on
hold_queuekeyed ontenant_id. rulesetand evaluation caches in Redis are keyed by{tenantId}so one tenant's cache invalidation never affects another.- Per-tenant KEK for body encryption ensures a key compromise is scoped.
platform.compliance.*roles are platform-wide by definition; there is no tenant-scoped equivalent (tenants self-serve via portal endpoints only, with RLS enforcement).
7. Secrets
| Secret | Store | Injected as |
|---|---|---|
| gRPC server cert + key | Vault PKI → K8s Secret (Vault Agent) | File mount |
| gRPC client cert for sms-orchestrator | Vault PKI → K8s Secret | File mount (in orchestrator pod) |
| PostgreSQL credentials | Vault DB dynamic secret | Env var |
| Redis credentials | Vault KV | Env var |
| NATS credentials | Vault KV | Env var |
| KMS keys for hold payload encryption | Vault Transit (referenced, not exported) | — |
| External LLM API keys | Vault KV (per tenant / per region) | Env var |
No secret is ever written to logs, events, or config files. Pre-commit gitleaks scan blocks accidental commits.
8. Threat model
| Threat | Mitigation |
|---|---|
| Malicious admin authors a rule that short-circuits BLOCK to ALLOW | Rule mutations are audit-logged with before/after; compliance.rule.changed.v1 fans out to the SOC channel; a second-reviewer workflow is in roadmap (EP-CE-04 hardening) |
Compromised sms-orchestrator pod floods EvaluateCompliance | mTLS + per-cert rate limit + autoscaler; evaluation is cached so replay cost is bounded |
| Attacker registers a COMPOSITE rule loop to DoS the engine | Cycle detection on save + max depth 5 at eval + hard 10 ms per regex + 450 ms budget |
| ReDoS via REGEX rule | re2 engine (linear time) + pattern length cap + eval-time timeout |
PII leak via rule evidence | Redaction in evaluator; contract tests verify no raw body passes |
| Review API abuse (unauthorised release) | RBAC + optimistic lock + mandatory reviewNotes + full audit of before/after |
| Replay of LLM cache to infer traffic | Cache keyed by sha256(anonymisedBody); anonymisation strips phone numbers, amounts, OTPs before hashing |
| External LLM exfiltration of message content | Disabled in regulated regions; otherwise anonymisation pre-processor + per-tenant opt-out |
| Auditor abuses unrestricted read to export PII | platform.auditor cannot read raw body; audit-log entries recording an auditor's reads are also recorded (meta-audit) |
9. GDPR & regulatory
- Right to erasure (GDPR Art. 17):
auth.user.erased.v1is consumed by compliance-engine. The consumer:- Redacts
hold_queue.payload.bodyandhold_queue.payload.tofor the affected user/tenant (sets them tonullwith a tombstone note). - Does not delete
audit_logorevaluation_logrows; retention for regulatory evidence overrides the erasure right for the data fields required by law. Affected fields are redacted/pseudonymised per regulator guidance.
- Redacts
- Data portability: tenants can export their own compliance score history, violations, and held-message metadata (never content) via
/v1/portal/compliance/*and the report generator. - Audit evidence window: ≥ 13 months rolling. Regulators receive exports via
TENANT_AUDITreports, role-gated toplatform.auditor. - Sub-processor list: external LLMs, when enabled, appear on the platform sub-processor list with DPA in place.
10. Security testing
- Contract tests per API_CONTRACTS §6.
- Property-based tests on rule evaluators (hypothesis-style) including known ReDoS patterns, Unicode edge cases, worst-case composites.
- ZAP baseline + API scan run on each main-branch build.
- Quarterly penetration test scoped to the compliance-engine REST surface.
- Role-matrix integration test — every endpoint × every role — verifies 200/403 behaviour.
- Secret scanning in CI (
gitleaks); dependency scanning (osv-scanner); container scanning (trivy).