Skip to main content

Compliance Layer — Application Logic

Status: populated | Last updated: 2026-04-18

1. Use Cases


UC-01: EvaluateCompliance (gRPC handler — async pipeline)

Trigger: sms-orchestrator's NATS consumer calls ComplianceService/EvaluateCompliance for every message after it is dequeued from sms.outbound.request.

Input: MessageContext (messageId, tenantId, accountId, to, senderId, body, messageType, segments, encoding, idempotencyKey, metadata)

Output: EvaluateComplianceResponse (evaluationId, verdict, findings[], ruleSetId, evaluationLatencyMs, holdId?)

SLA: P95 ≤ 500 ms (async pipeline — no tenant waiting on HTTP response)

Note: Since the tenant has already received a 202 response, this call is on the platform's internal async path. The 500 ms SLA is an operational budget for throughput planning, not user-perceived latency.

Steps:

  1. Input validation. Reject with INVALID_ARGUMENT if any required field is missing or malformed.

  2. Deduplication check. Compute MessageFingerprint = SHA-256(accountId:senderId:to:body). Attempt GET eval:cache:{fingerprint} from Redis (TTL 5 min). If HIT, return cached verdict immediately.

  3. Load tenant risk state. Attempt GET tenant:risk:{tenantId} from Redis (TTL 60 s). On MISS, query compliance.tenant_compliance_scores for (tenantId). If no score exists yet (new tenant), treat as CLEAR tier.

  4. Load applicable rule sets. Attempt GET ruleset:{tenantId}:{accountId} from Redis (TTL 300 s). On MISS:

    • Load tenant-specific rule set assignments from compliance.tenant_rule_set_assignments
    • Load default rule set (if any is marked is_default = true)
    • Merge into a single ordered rule list (tenant-specific rules take priority over default)
    • Write to Redis cache
  5. Evaluate ALLOW rules first (action = ALLOW, ordered by priority ASC):

    • If any ALLOW rule matches the MessageContext, return verdict = ALLOW immediately.
    • ALLOW rules whitelist trusted sender IDs, approved templates, or specific account segments.
  6. Auto-HOLD for SUSPENDED tenants. If tenant risk tier = SUSPENDED, skip rule evaluation and proceed directly to step 10 with verdict = HOLD and reason = "tenant_suspended".

  7. Evaluate BLOCK and HOLD rules (ordered by priority ASC, BLOCK evaluated before HOLD at same priority):

    • For each active rule, evaluate all conditions against MessageContext (AND logic within a rule)
    • On first BLOCK match: record finding; continue evaluating only FLAG/ALERT rules
    • On first HOLD match (no BLOCK found): record finding; continue evaluating only FLAG/ALERT rules
    • AI_CLASSIFICATION rules: see step 8
    • COMPOSITE rules: resolve child rules recursively (max depth 5; fail-closed on cycle detection)
  8. AI classification (only when AI_CLASSIFICATION rules are present):

    • Compute body_hash = SHA-256(body) (post-anonymisation)
    • Attempt GET ai:cache:{body_hash} from Redis (TTL 24 h)
    • On MISS: call local LLM; receive category → confidence map; cache result
    • Compare each category's confidence against rule's minConfidence threshold
    • On LLM unavailable: apply rule's fallbackAction (HOLD is the default and recommended — fail-closed)
  9. Evaluate FLAG and ALERT rules. Always evaluated; annotate the findings but do not change the primary verdict.

  10. Determine final verdict:

    • BLOCK if any BLOCK rule matched
    • HOLD if no BLOCK but at least one HOLD rule matched (or tenant is SUSPENDED)
    • FLAG if no BLOCK/HOLD but at least one FLAG rule matched
    • ALLOW otherwise
  11. Side-effects (synchronous, part of gRPC response):

    • Write evaluation_log row to PostgreSQL
    • If verdict = HOLD: insert hold_queue row; populate holdId in response
    • Update evaluation result cache: SET eval:cache:{fingerprint} EX 300
  12. Side-effects (asynchronous, fire-and-forget):

    • Publish compliance.audit.v1 event
    • If verdict = HOLD: publish compliance.message.held event
    • If verdict = BLOCK: publish compliance.message.blocked event
    • Increment Prometheus counters
  13. Return EvaluateComplianceResponse to sms-orchestrator.

Error codes:

gRPC statusCondition
INVALID_ARGUMENTMissing required field; malformed UUID or phone number
INTERNALUnhandled internal error (logged, not exposed). The NATS consumer treats this as a transient failure and retries.

Fail-closed behaviour: On INTERNAL or gRPC deadline exceeded, the sms-orchestrator NATS consumer does not ACK the message. NATS JetStream redelivers after the ack wait (30 s). After 3 delivery attempts, the message moves to sms.outbound.deadletter with reason compliance_unavailable. The message is never dispatched to a carrier.


UC-02: Orchestrator Consumer — Message State Handler

Trigger: sms-orchestrator NATS consumer, after receiving a compliance verdict.

Steps:

VerdictOrchestrator actionsms_messages statusTenant notification
ALLOWContinue to routing-engineROUTINGnone (success path)
FLAGContinue to routing-engine with annotationROUTING (with flagged: true)none
BLOCKDo not route; mark terminalBLOCKED"Message blocked" alert via web portal
HOLDDo not route; await reviewON_HOLD (with holdId)"Message held for review" alert via web portal

All state transitions write sms.events.status events for the audit trail.


UC-03: ReviewHeldMessage (Admin REST API)

Trigger: Platform admin or auditor invokes POST /compliance/hold-queue/{holdId}/review Authorization: Role platform.compliance.reviewer or platform.compliance.admin

Input: { action: 'RELEASE' | 'REJECT', notes: string }

Steps:

  1. Load HeldMessage from compliance.hold_queue by holdId. Return 404 if not found.
  2. Verify status = PENDING or REVIEWING. Return 409 Conflict if already reviewed/expired.
  3. Update status to REVIEWING (optimistic lock).
  4. If action = RELEASE:
    • Update sms_messages status: ON_HOLDROUTING (with complianceOverride: true, releasedBy, releasedAt)
    • Publish sms.outbound.retry NATS event (sms-orchestrator re-consumes and proceeds directly to routing — compliance re-evaluation is skipped on release unless FORCE_RECHECK_ON_RELEASE=true)
    • Update hold_queue status → REVIEWED_RELEASED
    • Publish compliance.message.released NATS event
  5. If action = REJECT:
    • Update sms_messages status: ON_HOLDBLOCKED (terminal)
    • Update hold_queue status → REVIEWED_REJECTED
    • Publish compliance.message.rejected NATS event
  6. Write compliance.audit_log entry with before/after state, actor, IP, timestamp.
  7. Decrement hold:queue:{tenantId}:size Redis counter.
  8. notification-service consumes event and pushes alert to tenant's web portal.

Release pathway rationale: Once a human reviewer has approved release, re-evaluating compliance would create a loop (same rules would hold the message again). The reviewer's decision is the authoritative override, recorded in the audit log.


UC-04: ManageComplianceRule (Admin REST CRUD)

Authorization: Role platform.compliance.admin

Create rule:

  1. Validate rule schema (type-specific config validation, regex compilation check, composite cycle check).
  2. Insert into compliance.rules; insert initial row into compliance.rule_versions.
  3. Invalidate Redis rule set caches for affected tenants.
  4. Publish compliance.rule.changed to NATS.
  5. Write audit log.

Update rule: Increment version; insert new rule_versions snapshot; invalidate caches; publish event; write audit log.

Enable / Disable rule: Set is_active; invalidate caches; publish event; write audit log.

Delete rule: Soft-delete (is_active = false, deleted_at set). Rules referenced by COMPOSITE rules cannot be deleted.


UC-05: RecalculateTenantScores (Background Worker — every 15 min)

Trigger: Internal cron (@Cron('*/15 * * * *')) with distributed Redis lock for multi-replica safety.

For each tenant with activity in the past 7 days:

  1. Aggregate metrics from evaluation_log, dlr_stats, and operator feedback tables.

  2. Compute score dimensions:

    contentScore = max(0, 25 × (1 - violations_7d / max(messages_sent_7d, 1)))
    volumeScore = max(0, 20 × (1 - rate_limit_violations_7d / max(messages_sent_7d, 1)))
    dlrScore = 20 × dlr_success_rate
    optoutScore = 15 × (1 - min(optout_rate / 0.05, 1))
    complaintScore = 10 × (1 - min(complaint_rate / 0.01, 1))
    tenureScore = min(10, account_age_days / 90)
    overallScore = sum of all dimensions
  3. Determine risk tier from overallScore: 80–100 CLEAR · 60–79 MONITOR · 30–59 RESTRICTED · 0–29 SUSPENDED.

  4. Detect tier transitions. On tier change: publish compliance.tenant.tier.changed; notify tenant admin via notification-service; if transition to SUSPENDED, publish compliance.tenant.suspended.

  5. Persist: UPSERT tenant_compliance_scores; INSERT score_history; SET tenant:risk:{tenantId} in Redis EX 900.

  6. Emit metric: compliance_tenant_score{tenant_id} gauge.


UC-06: ConsumeDeliveryReceiptEvent (NATS Consumer)

Subject: sms.dlr.inbound · Consumer group: compliance-engine-dlr

Steps:

  1. Deserialize DlrEvent.
  2. Map SMPP status codes to canonical DlrStatus.
  3. UPSERT compliance.dlr_stats for (tenantId, accountId) across three windows (1h, 24h, 7d).
  4. ACK NATS message on successful DB write.

UC-07: GenerateComplianceReport (On-demand + scheduled)

Trigger: POST /compliance/reports or daily cron at 02:00 UTC

Report types:

  • TENANT_RANKING — tenants sorted by compliance score with drill-down links
  • VIOLATION_SUMMARY — violations by rule type, category, and trend (7/30/90 day)
  • HOLD_QUEUE_SUMMARY — pending / released / rejected / expired counts
  • TIER_TRANSITIONS — tenants that changed tier in the period
  • TOP_TRIGGERED_RULES — rules ranked by match count
  • TENANT_AUDIT — full compliance evidence for a single tenant (regulatory export)

Output formats: JSON (API), CSV (export), PDF (future).


2. Rule Evaluation Performance Optimisation

Fast-path checks (no external calls)

Ordered first to enable early termination:

  1. GEO_RESTRICTION — country code from E.164 prefix
  2. TEMPORAL — current time in rule timezone
  3. SENDER_ID simple string checks
  4. KEYWORD exact-match (pre-loaded keyword sets in process memory)

Medium-path checks (Redis only)

  1. RATE_VOLUME — Redis sliding window counters
  2. DLR_ABUSE — Redis projection of compliance.dlr_stats

Slow-path checks (DB or external API)

  1. REGEX (re2 engine, linear time)
  2. AI_CLASSIFICATION — local LLM call (cache-checked first)
  3. COMPOSITE — recursive child rule evaluation

Budget enforcement

Each EvaluateCompliance call has a 450 ms internal budget (leaving 50 ms margin for gRPC + response serialisation):

  • If budget is exhausted mid-evaluation, remaining slow-path rules are skipped with a FLAG finding (evidence: "skipped_budget_exceeded")
  • Fail-closed applies to budget exhaustion for HOLD-eligible rules: if a HOLD-configured AI rule cannot run due to budget and its fallbackAction is HOLD, the verdict becomes HOLD
  • Budget violations emit compliance_evaluation_budget_exceeded_total metric

3. Hold Queue Priority Scoring

When a message is placed in the hold queue, its review_priority is computed so auditors see the most important items first:

priority = (tenant_risk_score_inverse × 40) // lower tenant score → higher urgency
+ (rule_severity_weight × 35) // BLOCK-triggering rules → higher urgency
+ (volume_spike_indicator × 15) // spike → higher urgency
+ (recency_decay × 10) // newer → higher urgency

Rule severity weights:

  • TERRORISM, PHISHING → weight 10
  • SPAM, FINANCIAL_FRAUD → weight 8
  • ADULT_CONTENT, GAMBLING → weight 6
  • other rule types → weight 4