ADR-0003 — Compliance Layer as a first-class architectural tier
Status: Approved Date: 2026-04-19 Owners: Platform Architecture, Trust & Safety, Security Related: 01-enterprise-architecture §3.2, §4, compliance-engine SERVICE_OVERVIEW, 13-security-compliance-tenancy
1. Context
An SMS gateway that is careless about compliance is a short-lived SMS gateway. Regulatory fines, carrier de-listings, tenant abuse, and spam complaints all flow through the same chokepoint: outbound messages. Two implementation shapes were on the table:
Option A — Inline checks in sms-orchestrator. Each rule (keyword, regex, sender-id, rate/volume, geo, temporal, DLR-abuse, AI classification, composite) becomes a piece of logic inside the orchestrator. Simple at first; brittle fast. Rule versioning, tenant-specific rule sets, hold queues, manual review UX, tenant scoring, and audit retention all become orchestrator concerns — violating single-responsibility and coupling unrelated scale/latency profiles.
Option B — Compliance as a dedicated architectural tier. A compliance-engine microservice sits between orchestration and routing. Every outbound SMS is evaluated (asynchronously, in the NATS consumer) before routing; a non-ALLOW verdict quarantines the message. Rule authoring, hold queue, tenant scoring, and audit all live in the same bounded context, owned by one team.
2. Decision
Adopt Option B. The Compliance Layer is a first-class architectural tier alongside ingestion (Kong + sms-orchestrator HTTP handler), routing (routing-engine), and transport (smpp-connector):
- A dedicated microservice
compliance-engineowns all rule evaluation, hold queue, tenant scoring, audit log, and compliance reporting. - It exposes gRPC
EvaluateCompliance(P95 ≤ 500 ms) tosms-orchestrator, HTTPS REST toadmin-dashboard, and NATS producer/consumer pairs for audit and DLR-stats intake. - The pipeline is fail-closed: if
compliance-engineis unavailable, the message remains inEVALUATINGand is retried from NATS. It is never released torouting-enginewithout an explicitALLOW. - Verdicts are
ALLOW/FLAG/HOLD/BLOCK. Non-ALLOWmessages never reach a carrier.HOLDroutes to an admin review workflow inadmin-dashboard;BLOCKis terminal. - Audit log is append-only, partitioned by month, retained ≥ 13 months (regulatory evidence window).
- AI classification runs on a local LLM (
compliance-aicontainer) with an external LLM fallback governed by per-tenant data-residency policy. - Tenant notifications of holds/blocks are async via
notification-service; tenants never wait on compliance evaluation in the HTTP response.
3. Consequences
Positive
- Clean bounded context with a single owner (Trust & Safety).
- Regulators and carriers get a coherent evidence trail — we can answer "why was this message blocked?" in one query against
compliance.audit_log. - Scale and latency profile for compliance (rule eval + LLM) are decoupled from the orchestrator's throughput profile.
- Adding a new rule type does not touch
sms-orchestrator.
Negative / costs
- One more service to operate, observe, and pager-rotate.
- Added per-message latency in the evaluation pipeline (async, but still on the path to carrier dispatch).
- gRPC contract becomes a coupling point: schema evolution must be backwards-compatible.
Neutral
- Because evaluation is async in the NATS consumer, the tenant's API experience is unchanged: POST
/v1/sms/sendreturns202 Acceptedin ~50 ms as before.
4. Alternatives considered
| Option | Why rejected |
|---|---|
Inline in sms-orchestrator (Option A) | Coupling, single-responsibility violation, scaling conflict |
Inline in routing-engine | Would push compliance after routing; some rules depend on routing context but carrier dispatch coupling is unacceptable |
| Sidecar library (shared lib) | Ownership unclear; upgrades coupled across services; hold queue / review UX cannot live in a library |
5. Implementation
Detailed specs and epic breakdown: compliance-engine service docs (SERVICE_OVERVIEW, APPLICATION_LOGIC, DEPLOYMENT_TOPOLOGY, SYNC_CONTRACT, AI_INTEGRATION, OBSERVABILITY, FAILURE_MODES, TESTING_STRATEGY, MIGRATION_PLAN, SERVICE_READINESS, SERVICE_RISK_REGISTER, _report.md).
6. Status
Approved 2026-04-19. Compliance Layer is baselined into 01-enterprise-architecture (§2 system context, §3 container view, §3.2 container-level compliance view, §4 outbound SMS pipeline with compliance step, §6 NATS topology with compliance.* streams, §7 database ownership with compliance schema, §9 technology stack) and into 03-platform-services (topology, docker-compose inventory, Kubernetes namespace strategy). Requirements tracked as PLT-REQ-026 through PLT-REQ-031 in the traceability matrix.