Skip to main content

ADR-0003 — Compliance Layer as a first-class architectural tier

Status: Approved Date: 2026-04-19 Owners: Platform Architecture, Trust & Safety, Security Related: 01-enterprise-architecture §3.2, §4, compliance-engine SERVICE_OVERVIEW, 13-security-compliance-tenancy

1. Context

An SMS gateway that is careless about compliance is a short-lived SMS gateway. Regulatory fines, carrier de-listings, tenant abuse, and spam complaints all flow through the same chokepoint: outbound messages. Two implementation shapes were on the table:

Option A — Inline checks in sms-orchestrator. Each rule (keyword, regex, sender-id, rate/volume, geo, temporal, DLR-abuse, AI classification, composite) becomes a piece of logic inside the orchestrator. Simple at first; brittle fast. Rule versioning, tenant-specific rule sets, hold queues, manual review UX, tenant scoring, and audit retention all become orchestrator concerns — violating single-responsibility and coupling unrelated scale/latency profiles.

Option B — Compliance as a dedicated architectural tier. A compliance-engine microservice sits between orchestration and routing. Every outbound SMS is evaluated (asynchronously, in the NATS consumer) before routing; a non-ALLOW verdict quarantines the message. Rule authoring, hold queue, tenant scoring, and audit all live in the same bounded context, owned by one team.

2. Decision

Adopt Option B. The Compliance Layer is a first-class architectural tier alongside ingestion (Kong + sms-orchestrator HTTP handler), routing (routing-engine), and transport (smpp-connector):

  1. A dedicated microservice compliance-engine owns all rule evaluation, hold queue, tenant scoring, audit log, and compliance reporting.
  2. It exposes gRPC EvaluateCompliance (P95 ≤ 500 ms) to sms-orchestrator, HTTPS REST to admin-dashboard, and NATS producer/consumer pairs for audit and DLR-stats intake.
  3. The pipeline is fail-closed: if compliance-engine is unavailable, the message remains in EVALUATING and is retried from NATS. It is never released to routing-engine without an explicit ALLOW.
  4. Verdicts are ALLOW / FLAG / HOLD / BLOCK. Non-ALLOW messages never reach a carrier. HOLD routes to an admin review workflow in admin-dashboard; BLOCK is terminal.
  5. Audit log is append-only, partitioned by month, retained ≥ 13 months (regulatory evidence window).
  6. AI classification runs on a local LLM (compliance-ai container) with an external LLM fallback governed by per-tenant data-residency policy.
  7. Tenant notifications of holds/blocks are async via notification-service; tenants never wait on compliance evaluation in the HTTP response.

3. Consequences

Positive

  • Clean bounded context with a single owner (Trust & Safety).
  • Regulators and carriers get a coherent evidence trail — we can answer "why was this message blocked?" in one query against compliance.audit_log.
  • Scale and latency profile for compliance (rule eval + LLM) are decoupled from the orchestrator's throughput profile.
  • Adding a new rule type does not touch sms-orchestrator.

Negative / costs

  • One more service to operate, observe, and pager-rotate.
  • Added per-message latency in the evaluation pipeline (async, but still on the path to carrier dispatch).
  • gRPC contract becomes a coupling point: schema evolution must be backwards-compatible.

Neutral

  • Because evaluation is async in the NATS consumer, the tenant's API experience is unchanged: POST /v1/sms/send returns 202 Accepted in ~50 ms as before.

4. Alternatives considered

OptionWhy rejected
Inline in sms-orchestrator (Option A)Coupling, single-responsibility violation, scaling conflict
Inline in routing-engineWould push compliance after routing; some rules depend on routing context but carrier dispatch coupling is unacceptable
Sidecar library (shared lib)Ownership unclear; upgrades coupled across services; hold queue / review UX cannot live in a library

5. Implementation

Detailed specs and epic breakdown: compliance-engine service docs (SERVICE_OVERVIEW, APPLICATION_LOGIC, DEPLOYMENT_TOPOLOGY, SYNC_CONTRACT, AI_INTEGRATION, OBSERVABILITY, FAILURE_MODES, TESTING_STRATEGY, MIGRATION_PLAN, SERVICE_READINESS, SERVICE_RISK_REGISTER, _report.md).

6. Status

Approved 2026-04-19. Compliance Layer is baselined into 01-enterprise-architecture (§2 system context, §3 container view, §3.2 container-level compliance view, §4 outbound SMS pipeline with compliance step, §6 NATS topology with compliance.* streams, §7 database ownership with compliance schema, §9 technology stack) and into 03-platform-services (topology, docker-compose inventory, Kubernetes namespace strategy). Requirements tracked as PLT-REQ-026 through PLT-REQ-031 in the traceability matrix.