Compliance Layer — Deployment Topology
Status: populated | Last updated: 2026-04-18
1. Kubernetes Resources
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: compliance-engine
namespace: sms-platform
spec:
replicas: 3
selector:
matchLabels:
app: compliance-engine
template:
metadata:
labels:
app: compliance-engine
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3002"
prometheus.io/path: "/metrics"
spec:
containers:
- name: compliance-engine
image: ghcr.io/ghasi/compliance-engine:latest
ports:
- containerPort: 50052 # gRPC
name: grpc
- containerPort: 3002 # HTTP (metrics, health, REST)
name: http
env:
- name: NODE_ENV
value: production
- name: LOG_LEVEL
value: info
- name: GRPC_PORT
value: "50052"
- name: HTTP_PORT
value: "3002"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: compliance-engine-db-secret
key: url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: compliance-engine-redis-secret
key: url
- name: NATS_URL
valueFrom:
secretKeyRef:
name: nats-credentials
key: url
# AI provider — local LLM primary
- name: AI_PROVIDER
value: local
- name: LOCAL_LLM_URL
value: http://local-llm-service.sms-platform.svc.cluster.local:8000
- name: LOCAL_LLM_MODEL
value: llama-3.1-8b-instruct-awq
- name: ANONYMIZE_BODY_BEFORE_AI
value: "true"
# External LLM failover (optional)
- name: AI_FAILOVER_PROVIDER
value: "" # set to 'claude' or 'openai' to enable failover
# Budget / timing
- name: EVAL_BUDGET_MS
value: "450"
- name: AI_TIMEOUT_MS
value: "2000"
- name: HOLD_QUEUE_TTL_HOURS
value: "24"
- name: SCORING_INTERVAL_MINUTES
value: "15"
envFrom:
- secretRef:
name: compliance-engine-vault-secrets
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 1Gi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
volumeMounts:
- name: tls-certs
mountPath: /etc/tls
readOnly: true
volumes:
- name: tls-certs
secret:
secretName: compliance-engine-tls
Local LLM Deployment (separate, GPU-backed)
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-llm
namespace: sms-platform
spec:
replicas: 2
selector:
matchLabels:
app: local-llm
template:
metadata:
labels:
app: local-llm
spec:
nodeSelector:
gpu: "true"
containers:
- name: vllm
image: vllm/vllm-openai:latest
args:
- "--model=casperhansen/llama-3.1-8b-instruct-awq"
- "--quantization=awq"
- "--max-model-len=4096"
- "--gpu-memory-utilization=0.85"
ports:
- containerPort: 8000
name: http
resources:
requests:
cpu: 4
memory: 16Gi
nvidia.com/gpu: 1
limits:
cpu: 8
memory: 24Gi
nvidia.com/gpu: 1
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 120 # model load time
periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: local-llm-service
namespace: sms-platform
spec:
selector:
app: local-llm
ports:
- port: 8000
targetPort: http
type: ClusterIP
Horizontal Pod Autoscaler (compliance-engine)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: compliance-engine-hpa
namespace: sms-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: compliance-engine
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
- type: Pods
pods:
metric:
name: compliance_evaluation_duration_seconds_p95
target:
type: AverageValue
averageValue: "0.4" # scale up if P95 approaches 400 ms
PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: compliance-engine-pdb
namespace: sms-platform
spec:
minAvailable: 2
selector:
matchLabels:
app: compliance-engine
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: local-llm-pdb
namespace: sms-platform
spec:
minAvailable: 1
selector:
matchLabels:
app: local-llm
Services
apiVersion: v1
kind: Service
metadata:
name: compliance-engine-grpc
namespace: sms-platform
spec:
selector:
app: compliance-engine
ports:
- name: grpc
port: 50052
targetPort: grpc
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: compliance-engine-http
namespace: sms-platform
spec:
selector:
app: compliance-engine
ports:
- name: http
port: 3002
targetPort: http
type: ClusterIP
NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: compliance-engine-netpol
namespace: sms-platform
spec:
podSelector:
matchLabels:
app: compliance-engine
policyTypes: [Ingress, Egress]
ingress:
- from:
- podSelector:
matchLabels:
app: sms-orchestrator
ports:
- port: 50052
- from:
- podSelector:
matchLabels:
app: admin-dashboard
ports:
- port: 3002
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- port: 3002
egress:
- to:
- podSelector:
matchLabels: { app: postgresql }
ports: [{ port: 5432 }]
- to:
- podSelector:
matchLabels: { app: redis }
ports: [{ port: 6379 }]
- to:
- podSelector:
matchLabels: { app: nats }
ports: [{ port: 4222 }]
- to:
- podSelector:
matchLabels: { app: local-llm }
ports: [{ port: 8000 }]
# External LLM egress (only when AI_FAILOVER_PROVIDER is set)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except: [10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16]
ports: [{ port: 443 }]
2. Background Workers
| Worker | Schedule | Description |
|---|---|---|
TenantScoringWorker | Every 15 min | Recalculates compliance scores for all active tenants |
HoldQueueExpiryWorker | Every 5 min | Auto-expires PENDING holds past their auto_expires_at |
PartitionMaintenanceWorker | Daily at 03:00 UTC | Creates next month's evaluation_log + score_history partitions |
KeywordListReloadWorker | Every 5 min | Reloads keyword sets into process memory if DB version changed |
DlrStatsRollupWorker | Every hour | Rolls up per-window DLR stats and purges expired windows |
Workers use Redis distributed locks for multi-replica safety (SET NX EX on lock:worker:{name}).
3. Infrastructure Dependencies
| Dependency | Version | Topology |
|---|---|---|
| PostgreSQL | 15+ | Primary + read replica |
| Redis | 7.0+ | Cluster mode; compliance-engine uses DB 3 |
| NATS JetStream | 2.10+ | 3-node cluster; COMPLIANCE_EVENTS and SMS_DLR streams |
| Local LLM | vLLM 0.5+ | Separate deployment with GPU nodes |
| External LLM API (optional) | Claude v1 / OpenAI v1 | External HTTPS |
4. Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
NODE_ENV | Yes | — | production / staging / development |
GRPC_PORT | No | 50052 | gRPC listener port |
HTTP_PORT | No | 3002 | HTTP listener port |
DATABASE_URL | Yes | — | PostgreSQL connection string |
REDIS_URL | Yes | — | Redis connection string |
NATS_URL | Yes | — | NATS server URL |
NATS_CREDS_PATH | Yes | — | Path to NATS credentials file |
AI_PROVIDER | No | local | local / claude / openai / mock |
LOCAL_LLM_URL | If AI_PROVIDER=local | — | URL to local LLM OpenAI-compatible endpoint |
LOCAL_LLM_MODEL | If AI_PROVIDER=local | — | Model name to pass in requests |
AI_FAILOVER_PROVIDER | No | "" (disabled) | claude / openai / "" |
AI_API_KEY | If external LLM | — | External LLM provider API key (from Vault) |
AI_MODEL | No | — | External LLM model (e.g., claude-haiku-4-5-20251001) |
ANONYMIZE_BODY_BEFORE_AI | No | true | Redact PII before inference |
AI_TIMEOUT_MS | No | 2000 | LLM call timeout |
EVAL_BUDGET_MS | No | 450 | Per-evaluation internal budget |
GRPC_TLS_ENABLED | No | true | Set false for local dev |
TLS_CERT_PATH | If TLS | — | Path to server TLS certificate |
TLS_KEY_PATH | If TLS | — | Path to server TLS private key |
TLS_CA_PATH | If TLS | — | Path to CA bundle for mTLS |
LOG_LEVEL | No | info | debug / info / warn / error |
HOLD_QUEUE_TTL_HOURS | No | 24 | Auto-expiry duration for held messages |
SCORING_INTERVAL_MINUTES | No | 15 | Tenant scoring cycle interval |
Note:
COMPLIANCE_FAILURE_MODEis intentionally removed — the Compliance Layer is always fail-closed. This is architectural, not configurable.
5. Deployment Environments
| Environment | compliance-engine replicas | Local LLM | External LLM failover | Notes |
|---|---|---|---|---|
| Production | 3–20 (HPA) | 2 × A10 GPU | Optional, disabled by default | Fail-closed always |
| Staging | 2 | 1 × A10 GPU (shared) | Claude Haiku (disabled by default) | |
| Development | 1 | Ollama (local workstation) or Mock | Mock | No GPU required |
| CI | 1 | Mock | Mock | Deterministic test responses |