compliance-engine — Service Readiness

Status: populated | Last updated: 2026-04-18

This document tracks the readiness criteria for taking compliance-engine from development to production.

1. Code Readiness

Criterion	Status	Notes
Core gRPC handler implemented	☐
All 10 rule type evaluators implemented	☐	KEYWORD, REGEX, SENDER_ID, RECIPIENT, RATE_VOLUME, GEO_RESTRICTION, TEMPORAL, DLR_ABUSE, AI_CLASSIFICATION, COMPOSITE
Hold queue CRUD complete	☐
Tenant scoring worker implemented	☐
Auto-expiry job for hold queue	☐
AI classification integration (Claude + OpenAI)	☐
AI result caching (24 h by body hash)	☐
Evaluation result dedup cache (5 min fingerprint)	☐
REST API for rule/hold/tenant management	☐
Audit log with immutability enforcement	☐
NATS publishers for all 7 event types	☐
DLR consumer for `sms.dlr.inbound`	☐
Budget enforcement (70 ms evaluation cap)	☐
Circuit breaker on AI client	☐
Fail-closed redelivery behaviour verified (no ACK on compliance error)	☐
Distributed lock for cron jobs (multi-replica safe)	☐

Criterion	Target	Status
Unit test coverage	≥ 85% line, ≥ 80% branch	☐
Unit tests for each rule type	10–15 per type	☐
Property-based tests (fast-check)	Evaluation determinism, ALLOW override invariants	☐
Integration tests against real PG + Redis	≥ 50 tests	☐
Contract tests with sms-orchestrator	Pact or gRPC reflection	☐
Load test: 500 RPS sustained, P95 < 80 ms	Passed	☐
Load test: 1000 RPS burst, P99 < 200 ms	Passed	☐
Chaos test: DB unavailable, Redis unavailable, AI unavailable	Graceful degradation verified	☐
Security test: ReDoS patterns rejected at save	Passed	☐
Security test: role escalation attempts blocked	Passed	☐
Security test: cross-tenant hold access blocked	Passed	☐
Audit log immutability test	UPDATE/DELETE rejected	☐

Criterion	Status
All Prometheus metrics emitting correctly	☐
Grafana dashboard `compliance-engine.json` deployed	☐
All alerts configured and tested	☐
Structured JSON logs with PII masking verified	☐
OpenTelemetry tracing propagation verified end-to-end	☐
Log aggregation (Loki) parsing rules validated	☐
Runbook written for each alert	☐

Criterion	Status
mTLS enforced on gRPC port in production	☐
TLS certificates provisioned via cert-manager	☐
NetworkPolicy restricting ingress to sms-orchestrator + admin-dashboard	☐
JWT validation on all REST endpoints	☐
RBAC roles enforced per endpoint	☐
LLM API keys stored in Vault, injected at pod start	☐
Database credentials rotated and in Vault	☐
`ANONYMIZE_BODY_BEFORE_AI=true` in production	☐
RLS policies on `hold_queue` and `evaluation_log` verified	☐
Penetration test completed	☐
Security review signed off by Security team	☐

Criterion	Status
Kubernetes Deployment manifest reviewed	☐
HPA configured with CPU + custom latency metric	☐
PodDisruptionBudget ensures minAvailable = 2	☐
Rolling update strategy tested (no dropped gRPC calls)	☐
Resource requests/limits validated under load	☐
Database connection pool sized appropriately	☐
Redis connection pool sized appropriately	☐
Graceful shutdown drains in-flight requests (SIGTERM handler)	☐
On-call playbook written	☐
Training delivered to compliance reviewer team	☐
Escalation path defined for SUSPENDED tenants	☐

Document	Status
SERVICE_OVERVIEW.md	Complete
DOMAIN_MODEL.md	Complete
APPLICATION_LOGIC.md	Complete
API_CONTRACTS.md	Complete
DATA_MODEL.md	Complete
EVENT_SCHEMAS.md	Complete
SYNC_CONTRACT.md	Complete
SECURITY_MODEL.md	Complete
OBSERVABILITY.md	Complete
FAILURE_MODES.md	Complete
DEPLOYMENT_TOPOLOGY.md	Complete
TESTING_STRATEGY.md	Complete
LOCAL_DEV_SETUP.md	Complete
MIGRATION_PLAN.md	Complete
SERVICE_RISK_REGISTER.md	Complete
AI_INTEGRATION.md	Complete
Runbook for on-call	☐
Compliance reviewer training guide	☐
Rule authoring handbook	☐

Criterion	Status
Legal review of data processing with LLM provider	☐
Data Processing Agreement (DPA) signed with Anthropic / OpenAI	☐
Regulatory consultation with national telecom authority	☐
Initial keyword lists reviewed by Trust & Safety lead	☐
Initial rule sets reviewed by legal counsel	☐
Audit log retention policy agreed (7 years minimum for regulated deployments)	☐
Compliance report format approved by regulatory stakeholders	☐
Tenant opt-out/opt-in flow documented for end users	☐

Production deployment is GO when all of the following are met:

Within 30 days of full enforcement (Phase 3):

False-positive rate audit (target: < 1% of BLOCK verdicts are false positives)
False-negative rate audit via red-team sampling
Review hold queue review turnaround times (SLA: 95% reviewed within 4 hours)
Tenant complaint review — adjust rules if patterns of legitimate messages are being blocked
Cost analysis — AI API spend per 1,000 evaluations
Performance review — any p99 latency regressions?
Adjust HPA thresholds based on observed load patterns