Fraud Intelligence Service — Deployment Topology
Version: 1.0 Status: Draft Owner: Trust and Safety + Platform SRE Last Updated: 2026-04-21 Companion: LOCAL_DEV_SETUP · FAILURE_MODES · SECURITY_MODEL · docs/architecture/ADR-0004
1. Kubernetes Resources
The service splits into three workloads:
fraud-intel-service— NestJS API + gRPC + REST + NATS consumer + outbox relay. Stateless. 3-10 replicas.fraud-intel-worker— Python ML pipelines (AIT, SIM-box, OTP-harvest, grey-route, cohort, scoring, tenant-score recompute). KEDA-scaled by NATS lag and cron. 0-20 replicas.triton-fraud-cpu+triton-fraud-gpu— Triton Inference Server for model serving.
Plus the offline training stack:
- Airflow scheduler + workers (training DAGs)
- MLflow tracking server
- GPU training nodes (spot, autoscaled)
1.1 fraud-intel-service Deployment (NestJS)
apiVersion: apps/v1
kind: Deployment
metadata:
name: fraud-intel-service
namespace: sms-platform
spec:
replicas: 3
selector:
matchLabels: { app: fraud-intel-service }
template:
metadata:
labels: { app: fraud-intel-service }
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3014"
prometheus.io/path: "/metrics"
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector: { matchLabels: { app: fraud-intel-service } }
topologyKey: topology.kubernetes.io/zone
containers:
- name: fraud-intel-service
image: ghcr.io/ghasi/fraud-intel-service:latest
ports:
- { containerPort: 50054, name: grpc }
- { containerPort: 3014, name: http }
- { containerPort: 3015, name: http-internal }
env:
- { name: NODE_ENV, value: production }
- { name: LOG_LEVEL, value: info }
- { name: GRPC_PORT, value: "50054" }
- { name: HTTP_PORT, value: "3014" }
- { name: HTTP_INTERNAL_PORT, value: "3015" }
- name: DATABASE_URL
valueFrom: { secretKeyRef: { name: fraud-intel-db-secret, key: url } }
- name: CLICKHOUSE_URL
valueFrom: { secretKeyRef: { name: fraud-intel-ch-secret, key: url } }
- name: REDIS_URL
valueFrom: { secretKeyRef: { name: fraud-intel-redis-secret, key: url } }
- name: NATS_URL
valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
- { name: NATS_CREDS_PATH, value: /etc/nats/creds.nk }
- { name: TRITON_GRPC_URL, value: triton-fraud-cpu.sms-platform.svc.cluster.local:8001 }
- { name: TRITON_GPU_GRPC_URL, value: triton-fraud-gpu.sms-platform.svc.cluster.local:8001 }
- { name: INFERENCE_PROVIDER, value: triton }
- { name: ANONYMIZE_BEFORE_INFERENCE, value: "true" }
- { name: NATIONAL_SALT_PATH, value: /etc/secrets/national-salt }
- { name: SCORE_CACHE_TTL_S, value: "900" }
- { name: REGION, value: kbl }
envFrom:
- { secretRef: { name: fraud-intel-vault-secrets } }
resources:
requests: { cpu: 1000m, memory: 1Gi }
limits: { cpu: 4000m, memory: 4Gi }
livenessProbe:
httpGet: { path: /health/live, port: http }
initialDelaySeconds: 20
periodSeconds: 10
readinessProbe:
httpGet: { path: /health/ready, port: http }
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 3
volumeMounts:
- { name: tls-certs, mountPath: /etc/tls, readOnly: true }
- { name: nats-creds, mountPath: /etc/nats, readOnly: true }
- { name: secrets, mountPath: /etc/secrets, readOnly: true }
volumes:
- { name: tls-certs, secret: { secretName: fraud-intel-tls } }
- { name: nats-creds, secret: { secretName: fraud-intel-nats-creds } }
- { name: secrets, secret: { secretName: fraud-intel-app-secrets } }
1.2 fraud-intel-worker Deployment (Python ML pipelines)
apiVersion: apps/v1
kind: Deployment
metadata: { name: fraud-intel-worker, namespace: sms-platform }
spec:
replicas: 2
selector: { matchLabels: { app: fraud-intel-worker } }
template:
metadata:
labels: { app: fraud-intel-worker }
annotations: { prometheus.io/scrape: "true", prometheus.io/port: "9091" }
spec:
nodeSelector: { workload: ml-cpu }
containers:
- name: worker
image: ghcr.io/ghasi/fraud-intel-worker:latest
env:
- { name: WORKER_MODE, value: pipelines } # pipelines | streaming | scoring
- { name: TRITON_GRPC_URL, value: triton-fraud-cpu.sms-platform.svc.cluster.local:8001 }
- { name: CLICKHOUSE_URL, valueFrom: { secretKeyRef: { name: fraud-intel-ch-secret, key: url } } }
- { name: PG_URL, valueFrom: { secretKeyRef: { name: fraud-intel-db-secret, key: url } } }
- { name: REDIS_URL, valueFrom: { secretKeyRef: { name: fraud-intel-redis-secret, key: url } } }
- { name: NATS_URL, valueFrom: { secretKeyRef: { name: nats-credentials, key: url } } }
- { name: METRICS_PORT, value: "9091" }
resources:
requests: { cpu: 4, memory: 16Gi }
limits: { cpu: 8, memory: 32Gi }
1.3 KEDA scaler (worker autoscaling on NATS lag)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata: { name: fraud-intel-worker-scaler, namespace: sms-platform }
spec:
scaleTargetRef: { name: fraud-intel-worker }
minReplicaCount: 2
maxReplicaCount: 20
triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: nats.sms-platform.svc:8222
stream: SMS_STATUS
consumer: fraud-ingestor
lagThreshold: "5000"
- type: cron
metadata:
timezone: Asia/Kabul
start: "*/5 * * * *"
end: "*/5 * * * *"
desiredReplicas: "5"
1.4 Triton Inference Server (CPU pool)
apiVersion: apps/v1
kind: Deployment
metadata: { name: triton-fraud-cpu, namespace: sms-platform }
spec:
replicas: 3
selector: { matchLabels: { app: triton-fraud-cpu } }
template:
metadata:
labels: { app: triton-fraud-cpu }
annotations: { prometheus.io/scrape: "true", prometheus.io/port: "8002" }
spec:
containers:
- name: triton
image: nvcr.io/nvidia/tritonserver:24.06-py3
args:
- tritonserver
- --model-repository=/models
- --model-control-mode=poll
- --repository-poll-secs=30
- --strict-model-config=false
- --backend-config=fil,backend_config.cmdline=use_cuda=false
ports:
- { containerPort: 8000, name: http }
- { containerPort: 8001, name: grpc }
- { containerPort: 8002, name: metrics }
resources:
requests: { cpu: 4, memory: 8Gi }
limits: { cpu: 16, memory: 16Gi }
volumeMounts:
- { name: model-repo, mountPath: /models, readOnly: true }
volumes:
- { name: model-repo, persistentVolumeClaim: { claimName: triton-model-repo-pvc } }
1.5 Triton Inference Server (GPU pool)
apiVersion: apps/v1
kind: Deployment
metadata: { name: triton-fraud-gpu, namespace: sms-platform }
spec:
replicas: 2
template:
spec:
nodeSelector: { gpu: t4 }
tolerations:
- { key: nvidia.com/gpu, operator: Exists, effect: NoSchedule }
containers:
- name: triton
image: nvcr.io/nvidia/tritonserver:24.06-py3
args: [ tritonserver, --model-repository=/models, --strict-model-config=false ]
resources:
requests: { cpu: 4, memory: 16Gi, nvidia.com/gpu: 1 }
limits: { cpu: 8, memory: 24Gi, nvidia.com/gpu: 1 }
1.6 HPA for fraud-intel-service
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: fraud-intel-service-hpa, namespace: sms-platform }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: fraud-intel-service }
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 65 } }
- type: Resource
resource: { name: memory, target: { type: Utilization, averageUtilization: 75 } }
- type: Pods
pods:
metric: { name: fraud_score_grpc_duration_seconds_p95 }
target: { type: AverageValue, averageValue: "0.04" } # scale up if P95 > 40 ms
1.7 PodDisruptionBudgets
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: fraud-intel-service-pdb, namespace: sms-platform }
spec:
minAvailable: 2
selector: { matchLabels: { app: fraud-intel-service } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: triton-fraud-cpu-pdb, namespace: sms-platform }
spec:
minAvailable: 2
selector: { matchLabels: { app: triton-fraud-cpu } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: triton-fraud-gpu-pdb, namespace: sms-platform }
spec:
minAvailable: 1
selector: { matchLabels: { app: triton-fraud-gpu } }
1.8 Services
apiVersion: v1
kind: Service
metadata: { name: fraud-intel-grpc, namespace: sms-platform }
spec:
selector: { app: fraud-intel-service }
ports: [{ name: grpc, port: 50054, targetPort: grpc }]
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: fraud-intel-http, namespace: sms-platform }
spec:
selector: { app: fraud-intel-service }
ports: [{ name: http, port: 3014, targetPort: http }]
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: fraud-intel-internal, namespace: sms-platform }
spec:
selector: { app: fraud-intel-service }
ports: [{ name: http-internal, port: 3015, targetPort: http-internal }]
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: triton-fraud-cpu, namespace: sms-platform }
spec:
selector: { app: triton-fraud-cpu }
ports:
- { name: http, port: 8000, targetPort: http }
- { name: grpc, port: 8001, targetPort: grpc }
- { name: metrics, port: 8002, targetPort: metrics }
1.9 NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: fraud-intel-netpol, namespace: sms-platform }
spec:
podSelector: { matchLabels: { app: fraud-intel-service } }
policyTypes: [Ingress, Egress]
ingress:
# gRPC consumers
- from:
- { podSelector: { matchLabels: { app: compliance-engine } } }
- { podSelector: { matchLabels: { app: routing-engine } } }
- { podSelector: { matchLabels: { app: sender-id-registry-service } } }
- { podSelector: { matchLabels: { app: noc-dashboard } } }
ports: [{ port: 50054 }]
# REST admin (via Kong)
- from: [{ podSelector: { matchLabels: { app: kong } } }]
ports: [{ port: 3014 }]
# Internal mTLS (regulator-portal-service, peer-mno-bridge)
- from:
- { podSelector: { matchLabels: { app: regulator-portal-service } } }
- { podSelector: { matchLabels: { app: peer-mno-bridge } } }
ports: [{ port: 3015 }]
# Prometheus
- from: [{ namespaceSelector: { matchLabels: { name: monitoring } } }]
ports: [{ port: 3014 }]
egress:
- to: [{ podSelector: { matchLabels: { app: postgresql } } }]
ports: [{ port: 5432 }]
- to: [{ podSelector: { matchLabels: { app: clickhouse } } }]
ports: [{ port: 9000 }, { port: 8123 }]
- to: [{ podSelector: { matchLabels: { app: redis } } }]
ports: [{ port: 6379 }]
- to: [{ podSelector: { matchLabels: { app: nats } } }]
ports: [{ port: 4222 }]
- to: [{ podSelector: { matchLabels: { app: triton-fraud-cpu } } }]
ports: [{ port: 8001 }]
- to: [{ podSelector: { matchLabels: { app: triton-fraud-gpu } } }]
ports: [{ port: 8001 }]
- to: [{ podSelector: { matchLabels: { app: minio } } }]
ports: [{ port: 9000 }]
- to: [{ podSelector: { matchLabels: { app: vault } } }]
ports: [{ port: 8200 }]
# Egress to regulator SFTP (cloud — IP-allowlisted)
- to:
- ipBlock: { cidr: 41.74.0.0/16 } # ATRA SFTP CIDR (placeholder)
ports: [{ port: 22 }]
# NO egress to public LLM providers — explicit deny by omission
2. Background Workers
| Worker | Schedule | Replicas | Description |
|---|---|---|---|
IngestionConsumer | always-on | KEDA (NATS lag) | Stream firewall.audit.v1, sms.events.status.v1, sms.dlr.inbound.v1, cdr.generated.v1, consent.revoked.v1 → ClickHouse events |
OtpGrindingStreaming | always-on | 2 | Real-time OTP-grinding aggregator (Redis sorted sets) |
AitPipeline | */5 * * * * | 1-3 (KEDA cron) | 5-min AIT XGBoost pipeline |
AitCohortJob | 0 * * * * | 1 | Hourly cohort GraphSAGE |
SimboxPipeline | */30 * * * * | 1 | 30-min SIM-box detector |
GreyRoutePipeline | 15 * * * * | 1 | Hourly grey-route |
OtpHarvestPipeline | */30 * * * * | 1 | 30-min OTP-harvest cohort+revocation |
TenantScoreRecompute | 0 * * * * | 1 | Hourly score recompute |
MispFeedExport | 0 4 * * * Asia/Kabul | 1 | Daily MISP/STIX export to MinIO + SFTP |
MispFeedDecayJob | 0 5 * * * | 1 | Apply daily decay to imported indicators |
PartitionMaintenance | 0 3 * * * | 1 | Provision next 3 months of Postgres partitions |
OutboxRelay | always-on (in-process) | per pod | Publishes outbox to NATS |
CaseStaleScanner | 0 6 * * * | 1 | Auto-close cases > 30 d |
ModelDriftScanner | 0 1 * * * | 1 | PSI / Wasserstein drift checks; emit alerts |
Workers use Redis distributed locks for multi-replica safety (SET NX EX on fraud:lock:<worker>).
3. Region Affinity
Per ADR-0004:
| Region | Role | Replicas (service / worker / Triton-CPU / Triton-GPU) |
|---|---|---|
| kbl (Kabul) | Primary write region; all pipelines run here; canonical model registry | 3 / 2-20 / 3 / 2 |
| mzr (Mazar-i-Sharif) | Warm standby; reads entity_scores from Postgres replica; Score gRPC serves regional traffic | 2 / 0 (paused) / 2 / 0 |
| Failover RTO | 5 min (manual operator confirmation per ADR-0004 §3.4) | |
| Cross-region NATS bridging | NATS Leaf Node FRAUD_* streams mirror kbl → mzr | Lag P95 ≤ 5 s |
4. Infrastructure Dependencies
| Dependency | Version | Topology | Owner |
|---|---|---|---|
| PostgreSQL | 15+ | Primary + read replica per region; PgBouncer in transaction pool mode | Platform DBA |
| ClickHouse | 23.8+ | 3 shards × 2 replicas; ZooKeeper/Keeper coordination | Platform SRE (data) |
| Redis | 7.0+ | Cluster mode; fraud uses DB 4 | Platform SRE |
| NATS JetStream | 2.10+ | 3-node cluster; dedicated FRAUD_* streams | Platform SRE |
| Triton Inference Server | 24.06+ | CPU pool (3 replicas) + GPU pool (2 × T4) | Platform Engineering |
| MinIO | RELEASE.2024-08-29T+ | 4-node erasure-coded cluster; bucket-policy enforced | Platform SRE |
| Vault | 1.16+ | HA mode; PKI engine for mTLS, KV v2 for secrets, Transit for hashing | Security |
| HSM (PKCS#11) | nCipher nShield (shared) | Isolated partition fraud-intel from sms-firewall | Security |
| Airflow | 2.9+ | KubernetesExecutor on dedicated airflow-fraud namespace | Data Engineering |
| MLflow | 2.13+ | Tracking server + S3 artifact store | Data Engineering |
5. Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
NODE_ENV | Yes | — | production / staging / development |
GRPC_PORT | No | 50054 | gRPC listener |
HTTP_PORT | No | 3014 | REST listener |
HTTP_INTERNAL_PORT | No | 3015 | Internal mTLS listener |
DATABASE_URL | Yes | — | Postgres connection string |
CLICKHOUSE_URL | Yes | — | ClickHouse native protocol URL |
REDIS_URL | Yes | — | Redis (DB 4) |
NATS_URL | Yes | — | NATS server URL |
NATS_CREDS_PATH | Yes | — | Path to NATS credentials nkey file |
TRITON_GRPC_URL | Yes | — | Triton CPU pool gRPC |
TRITON_GPU_GRPC_URL | Yes | — | Triton GPU pool gRPC |
INFERENCE_PROVIDER | No | triton | triton / mock only |
ANONYMIZE_BEFORE_INFERENCE | No | true | Forced true in non-dev |
NATIONAL_SALT_PATH | Yes | — | File mount of nationalSalt |
SCORE_CACHE_TTL_S | No | 900 | Redis L1 TTL |
SCORE_REFRESH_QUEUE | No | fraud:score:refresh:queue | Redis list name |
EVAL_BUDGET_MS | No | 45 | Score gRPC internal budget |
REGION | Yes | — | kbl / mzr |
GRPC_TLS_ENABLED | No | true | Forced true non-dev (start-up guard) |
TLS_CERT_PATH, TLS_KEY_PATH, TLS_CA_PATH | If TLS | — | mTLS certs |
LOG_LEVEL | No | info | debug / info / warn / error |
HSM_PIN_PATH | If feed export | — | File mount of HSM partition PIN |
INFERENCE_PROVIDER=cloud(Anthropic/OpenAI) is disallowed. The start-up guard refuses to boot.
6. Deployment Environments
| Environment | service replicas | worker replicas | Triton CPU | Triton GPU | Notes |
|---|---|---|---|---|---|
| Production (kbl) | 3-10 (HPA) | 2-20 (KEDA) | 3 | 2 × T4 | Full feature set |
| Production (mzr) | 2 | 0 (paused) | 2 | 0 | Score gRPC only; pipelines paused |
| Staging | 2 | 2 | 1 | 1 (shared) | Daily synthetic load |
| Development | 1 | 1 | 1 (CPU only) | mock | Dockerised; no GPU |
| CI | 1 | 1 | mock | mock | Deterministic responses |
7. Image Tagging & CI/CD
- Image:
ghcr.io/ghasi/fraud-intel-service:<git-sha> - Helm chart:
charts/fraud-intel-serviceversioned alongside. - Argo CD application:
apps/fraud-intel-service.yamlwith sync wave4(after compliance-engine, before NOC dashboard). - Canary: Argo Rollouts with 10% → 25% → 50% → 100% over 30 min, checking SLO burn rate at each step.
- Rollback: automatic on
FraudScoreP95HighorFraudScoreUnavailablefiring during canary.
8. Resource Budget Summary
| Component | CPU req | Memory req | GPU | Pods | Total CPU | Total Mem |
|---|---|---|---|---|---|---|
| fraud-intel-service | 1 | 1 Gi | — | 3-10 | 3-10 | 3-10 Gi |
| fraud-intel-worker | 4 | 16 Gi | — | 2-20 | 8-80 | 32-320 Gi |
| triton-fraud-cpu | 4 | 8 Gi | — | 3 | 12 | 24 Gi |
| triton-fraud-gpu | 4 | 16 Gi | 1 × T4 | 2 | 8 | 32 Gi |
| Steady-state (median) | ~50 vCPU | ~150 Gi | ||||
| Burst | ~110 vCPU | ~390 Gi |