Fraud Intelligence Service — Deployment Topology

Version: 1.0 Status: Draft Owner: Trust and Safety + Platform SRE Last Updated: 2026-04-21 Companion: LOCAL_DEV_SETUP · FAILURE_MODES · SECURITY_MODEL · docs/architecture/ADR-0004

1. Kubernetes Resources

The service splits into three workloads:

fraud-intel-service — NestJS API + gRPC + REST + NATS consumer + outbox relay. Stateless. 3-10 replicas.
fraud-intel-worker — Python ML pipelines (AIT, SIM-box, OTP-harvest, grey-route, cohort, scoring, tenant-score recompute). KEDA-scaled by NATS lag and cron. 0-20 replicas.
triton-fraud-cpu + triton-fraud-gpu — Triton Inference Server for model serving.

Plus the offline training stack:

Airflow scheduler + workers (training DAGs)
MLflow tracking server
GPU training nodes (spot, autoscaled)

1.1 `fraud-intel-service` Deployment (NestJS)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fraud-intel-service
  namespace: sms-platform
spec:
  replicas: 3
  selector:
    matchLabels: { app: fraud-intel-service }
  template:
    metadata:
      labels: { app: fraud-intel-service }
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "3014"
        prometheus.io/path: "/metrics"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector: { matchLabels: { app: fraud-intel-service } }
                topologyKey: topology.kubernetes.io/zone
      containers:
        - name: fraud-intel-service
          image: ghcr.io/ghasi/fraud-intel-service:latest
          ports:
            - { containerPort: 50054, name: grpc }
            - { containerPort: 3014,  name: http }
            - { containerPort: 3015,  name: http-internal }
          env:
            - { name: NODE_ENV, value: production }
            - { name: LOG_LEVEL, value: info }
            - { name: GRPC_PORT, value: "50054" }
            - { name: HTTP_PORT, value: "3014" }
            - { name: HTTP_INTERNAL_PORT, value: "3015" }
            - name: DATABASE_URL
              valueFrom: { secretKeyRef: { name: fraud-intel-db-secret, key: url } }
            - name: CLICKHOUSE_URL
              valueFrom: { secretKeyRef: { name: fraud-intel-ch-secret, key: url } }
            - name: REDIS_URL
              valueFrom: { secretKeyRef: { name: fraud-intel-redis-secret, key: url } }
            - name: NATS_URL
              valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
            - { name: NATS_CREDS_PATH, value: /etc/nats/creds.nk }
            - { name: TRITON_GRPC_URL, value: triton-fraud-cpu.sms-platform.svc.cluster.local:8001 }
            - { name: TRITON_GPU_GRPC_URL, value: triton-fraud-gpu.sms-platform.svc.cluster.local:8001 }
            - { name: INFERENCE_PROVIDER, value: triton }
            - { name: ANONYMIZE_BEFORE_INFERENCE, value: "true" }
            - { name: NATIONAL_SALT_PATH, value: /etc/secrets/national-salt }
            - { name: SCORE_CACHE_TTL_S, value: "900" }
            - { name: REGION, value: kbl }
          envFrom:
            - { secretRef: { name: fraud-intel-vault-secrets } }
          resources:
            requests: { cpu: 1000m, memory: 1Gi }
            limits:   { cpu: 4000m, memory: 4Gi }
          livenessProbe:
            httpGet: { path: /health/live, port: http }
            initialDelaySeconds: 20
            periodSeconds: 10
          readinessProbe:
            httpGet: { path: /health/ready, port: http }
            initialDelaySeconds: 15
            periodSeconds: 5
            failureThreshold: 3
          volumeMounts:
            - { name: tls-certs,  mountPath: /etc/tls,     readOnly: true }
            - { name: nats-creds, mountPath: /etc/nats,    readOnly: true }
            - { name: secrets,    mountPath: /etc/secrets, readOnly: true }
      volumes:
        - { name: tls-certs,  secret: { secretName: fraud-intel-tls } }
        - { name: nats-creds, secret: { secretName: fraud-intel-nats-creds } }
        - { name: secrets,    secret: { secretName: fraud-intel-app-secrets } }

1.2 `fraud-intel-worker` Deployment (Python ML pipelines)

apiVersion: apps/v1
kind: Deployment
metadata: { name: fraud-intel-worker, namespace: sms-platform }
spec:
  replicas: 2
  selector: { matchLabels: { app: fraud-intel-worker } }
  template:
    metadata:
      labels: { app: fraud-intel-worker }
      annotations: { prometheus.io/scrape: "true", prometheus.io/port: "9091" }
    spec:
      nodeSelector: { workload: ml-cpu }
      containers:
        - name: worker
          image: ghcr.io/ghasi/fraud-intel-worker:latest
          env:
            - { name: WORKER_MODE, value: pipelines }   # pipelines | streaming | scoring
            - { name: TRITON_GRPC_URL, value: triton-fraud-cpu.sms-platform.svc.cluster.local:8001 }
            - { name: CLICKHOUSE_URL, valueFrom: { secretKeyRef: { name: fraud-intel-ch-secret, key: url } } }
            - { name: PG_URL, valueFrom: { secretKeyRef: { name: fraud-intel-db-secret, key: url } } }
            - { name: REDIS_URL, valueFrom: { secretKeyRef: { name: fraud-intel-redis-secret, key: url } } }
            - { name: NATS_URL, valueFrom: { secretKeyRef: { name: nats-credentials, key: url } } }
            - { name: METRICS_PORT, value: "9091" }
          resources:
            requests: { cpu: 4, memory: 16Gi }
            limits:   { cpu: 8, memory: 32Gi }

1.3 KEDA scaler (worker autoscaling on NATS lag)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata: { name: fraud-intel-worker-scaler, namespace: sms-platform }
spec:
  scaleTargetRef: { name: fraud-intel-worker }
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
    - type: nats-jetstream
      metadata:
        natsServerMonitoringEndpoint: nats.sms-platform.svc:8222
        stream: SMS_STATUS
        consumer: fraud-ingestor
        lagThreshold: "5000"
    - type: cron
      metadata:
        timezone: Asia/Kabul
        start: "*/5 * * * *"
        end:   "*/5 * * * *"
        desiredReplicas: "5"

1.4 Triton Inference Server (CPU pool)

apiVersion: apps/v1
kind: Deployment
metadata: { name: triton-fraud-cpu, namespace: sms-platform }
spec:
  replicas: 3
  selector: { matchLabels: { app: triton-fraud-cpu } }
  template:
    metadata:
      labels: { app: triton-fraud-cpu }
      annotations: { prometheus.io/scrape: "true", prometheus.io/port: "8002" }
    spec:
      containers:
        - name: triton
          image: nvcr.io/nvidia/tritonserver:24.06-py3
          args:
            - tritonserver
            - --model-repository=/models
            - --model-control-mode=poll
            - --repository-poll-secs=30
            - --strict-model-config=false
            - --backend-config=fil,backend_config.cmdline=use_cuda=false
          ports:
            - { containerPort: 8000, name: http }
            - { containerPort: 8001, name: grpc }
            - { containerPort: 8002, name: metrics }
          resources:
            requests: { cpu: 4,  memory: 8Gi }
            limits:   { cpu: 16, memory: 16Gi }
          volumeMounts:
            - { name: model-repo, mountPath: /models, readOnly: true }
      volumes:
        - { name: model-repo, persistentVolumeClaim: { claimName: triton-model-repo-pvc } }

1.5 Triton Inference Server (GPU pool)

apiVersion: apps/v1
kind: Deployment
metadata: { name: triton-fraud-gpu, namespace: sms-platform }
spec:
  replicas: 2
  template:
    spec:
      nodeSelector: { gpu: t4 }
      tolerations:
        - { key: nvidia.com/gpu, operator: Exists, effect: NoSchedule }
      containers:
        - name: triton
          image: nvcr.io/nvidia/tritonserver:24.06-py3
          args: [ tritonserver, --model-repository=/models, --strict-model-config=false ]
          resources:
            requests: { cpu: 4, memory: 16Gi, nvidia.com/gpu: 1 }
            limits:   { cpu: 8, memory: 24Gi, nvidia.com/gpu: 1 }

1.6 HPA for `fraud-intel-service`

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: fraud-intel-service-hpa, namespace: sms-platform }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: fraud-intel-service }
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource: { name: cpu,    target: { type: Utilization, averageUtilization: 65 } }
    - type: Resource
      resource: { name: memory, target: { type: Utilization, averageUtilization: 75 } }
    - type: Pods
      pods:
        metric: { name: fraud_score_grpc_duration_seconds_p95 }
        target: { type: AverageValue, averageValue: "0.04" }   # scale up if P95 > 40 ms

1.7 PodDisruptionBudgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: fraud-intel-service-pdb, namespace: sms-platform }
spec:
  minAvailable: 2
  selector: { matchLabels: { app: fraud-intel-service } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: triton-fraud-cpu-pdb, namespace: sms-platform }
spec:
  minAvailable: 2
  selector: { matchLabels: { app: triton-fraud-cpu } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: triton-fraud-gpu-pdb, namespace: sms-platform }
spec:
  minAvailable: 1
  selector: { matchLabels: { app: triton-fraud-gpu } }

1.8 Services

apiVersion: v1
kind: Service
metadata: { name: fraud-intel-grpc, namespace: sms-platform }
spec:
  selector: { app: fraud-intel-service }
  ports: [{ name: grpc, port: 50054, targetPort: grpc }]
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: fraud-intel-http, namespace: sms-platform }
spec:
  selector: { app: fraud-intel-service }
  ports: [{ name: http, port: 3014, targetPort: http }]
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: fraud-intel-internal, namespace: sms-platform }
spec:
  selector: { app: fraud-intel-service }
  ports: [{ name: http-internal, port: 3015, targetPort: http-internal }]
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: triton-fraud-cpu, namespace: sms-platform }
spec:
  selector: { app: triton-fraud-cpu }
  ports:
    - { name: http,    port: 8000, targetPort: http }
    - { name: grpc,    port: 8001, targetPort: grpc }
    - { name: metrics, port: 8002, targetPort: metrics }

1.9 NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: fraud-intel-netpol, namespace: sms-platform }
spec:
  podSelector: { matchLabels: { app: fraud-intel-service } }
  policyTypes: [Ingress, Egress]
  ingress:
    # gRPC consumers
    - from:
        - { podSelector: { matchLabels: { app: compliance-engine } } }
        - { podSelector: { matchLabels: { app: routing-engine } } }
        - { podSelector: { matchLabels: { app: sender-id-registry-service } } }
        - { podSelector: { matchLabels: { app: noc-dashboard } } }
      ports: [{ port: 50054 }]
    # REST admin (via Kong)
    - from: [{ podSelector: { matchLabels: { app: kong } } }]
      ports: [{ port: 3014 }]
    # Internal mTLS (regulator-portal-service, peer-mno-bridge)
    - from:
        - { podSelector: { matchLabels: { app: regulator-portal-service } } }
        - { podSelector: { matchLabels: { app: peer-mno-bridge } } }
      ports: [{ port: 3015 }]
    # Prometheus
    - from: [{ namespaceSelector: { matchLabels: { name: monitoring } } }]
      ports: [{ port: 3014 }]
  egress:
    - to: [{ podSelector: { matchLabels: { app: postgresql } } }]
      ports: [{ port: 5432 }]
    - to: [{ podSelector: { matchLabels: { app: clickhouse } } }]
      ports: [{ port: 9000 }, { port: 8123 }]
    - to: [{ podSelector: { matchLabels: { app: redis } } }]
      ports: [{ port: 6379 }]
    - to: [{ podSelector: { matchLabels: { app: nats } } }]
      ports: [{ port: 4222 }]
    - to: [{ podSelector: { matchLabels: { app: triton-fraud-cpu } } }]
      ports: [{ port: 8001 }]
    - to: [{ podSelector: { matchLabels: { app: triton-fraud-gpu } } }]
      ports: [{ port: 8001 }]
    - to: [{ podSelector: { matchLabels: { app: minio } } }]
      ports: [{ port: 9000 }]
    - to: [{ podSelector: { matchLabels: { app: vault } } }]
      ports: [{ port: 8200 }]
    # Egress to regulator SFTP (cloud — IP-allowlisted)
    - to:
        - ipBlock: { cidr: 41.74.0.0/16 }   # ATRA SFTP CIDR (placeholder)
      ports: [{ port: 22 }]
    # NO egress to public LLM providers — explicit deny by omission

2. Background Workers

Worker	Schedule	Replicas	Description
`IngestionConsumer`	always-on	KEDA (NATS lag)	Stream `firewall.audit.v1`, `sms.events.status.v1`, `sms.dlr.inbound.v1`, `cdr.generated.v1`, `consent.revoked.v1` → ClickHouse `events`
`OtpGrindingStreaming`	always-on	2	Real-time OTP-grinding aggregator (Redis sorted sets)
`AitPipeline`	`/5 * * *`	1-3 (KEDA cron)	5-min AIT XGBoost pipeline
`AitCohortJob`	`0 * * * *`	1	Hourly cohort GraphSAGE
`SimboxPipeline`	`/30 * * *`	1	30-min SIM-box detector
`GreyRoutePipeline`	`15 * * * *`	1	Hourly grey-route
`OtpHarvestPipeline`	`/30 * * *`	1	30-min OTP-harvest cohort+revocation
`TenantScoreRecompute`	`0 * * * *`	1	Hourly score recompute
`MispFeedExport`	`0 4 * * *` Asia/Kabul	1	Daily MISP/STIX export to MinIO + SFTP
`MispFeedDecayJob`	`0 5 * * *`	1	Apply daily decay to imported indicators
`PartitionMaintenance`	`0 3 * * *`	1	Provision next 3 months of Postgres partitions
`OutboxRelay`	always-on (in-process)	per pod	Publishes outbox to NATS
`CaseStaleScanner`	`0 6 * * *`	1	Auto-close cases > 30 d
`ModelDriftScanner`	`0 1 * * *`	1	PSI / Wasserstein drift checks; emit alerts

Workers use Redis distributed locks for multi-replica safety (SET NX EX on fraud:lock:<worker>).

3. Region Affinity

Per ADR-0004:

Region	Role	Replicas (service / worker / Triton-CPU / Triton-GPU)
kbl (Kabul)	Primary write region; all pipelines run here; canonical model registry	3 / 2-20 / 3 / 2
mzr (Mazar-i-Sharif)	Warm standby; reads `entity_scores` from Postgres replica; Score gRPC serves regional traffic	2 / 0 (paused) / 2 / 0
Failover RTO	5 min (manual operator confirmation per ADR-0004 §3.4)
Cross-region NATS bridging	NATS Leaf Node `FRAUD_*` streams mirror kbl → mzr	Lag P95 ≤ 5 s

4. Infrastructure Dependencies

Dependency	Version	Topology	Owner
PostgreSQL	15+	Primary + read replica per region; PgBouncer in transaction pool mode	Platform DBA
ClickHouse	23.8+	3 shards × 2 replicas; ZooKeeper/Keeper coordination	Platform SRE (data)
Redis	7.0+	Cluster mode; fraud uses DB 4	Platform SRE
NATS JetStream	2.10+	3-node cluster; dedicated `FRAUD_*` streams	Platform SRE
Triton Inference Server	24.06+	CPU pool (3 replicas) + GPU pool (2 × T4)	Platform Engineering
MinIO	RELEASE.2024-08-29T+	4-node erasure-coded cluster; bucket-policy enforced	Platform SRE
Vault	1.16+	HA mode; PKI engine for mTLS, KV v2 for secrets, Transit for hashing	Security
HSM (PKCS#11)	nCipher nShield (shared)	Isolated partition `fraud-intel` from `sms-firewall`	Security
Airflow	2.9+	KubernetesExecutor on dedicated `airflow-fraud` namespace	Data Engineering
MLflow	2.13+	Tracking server + S3 artifact store	Data Engineering

5. Environment Variables

Variable	Required	Default	Description
`NODE_ENV`	Yes	—	`production` / `staging` / `development`
`GRPC_PORT`	No	`50054`	gRPC listener
`HTTP_PORT`	No	`3014`	REST listener
`HTTP_INTERNAL_PORT`	No	`3015`	Internal mTLS listener
`DATABASE_URL`	Yes	—	Postgres connection string
`CLICKHOUSE_URL`	Yes	—	ClickHouse native protocol URL
`REDIS_URL`	Yes	—	Redis (DB 4)
`NATS_URL`	Yes	—	NATS server URL
`NATS_CREDS_PATH`	Yes	—	Path to NATS credentials nkey file
`TRITON_GRPC_URL`	Yes	—	Triton CPU pool gRPC
`TRITON_GPU_GRPC_URL`	Yes	—	Triton GPU pool gRPC
`INFERENCE_PROVIDER`	No	`triton`	`triton` / `mock` only
`ANONYMIZE_BEFORE_INFERENCE`	No	`true`	Forced `true` in non-dev
`NATIONAL_SALT_PATH`	Yes	—	File mount of `nationalSalt`
`SCORE_CACHE_TTL_S`	No	`900`	Redis L1 TTL
`SCORE_REFRESH_QUEUE`	No	`fraud:score:refresh:queue`	Redis list name
`EVAL_BUDGET_MS`	No	`45`	Score gRPC internal budget
`REGION`	Yes	—	`kbl` / `mzr`
`GRPC_TLS_ENABLED`	No	`true`	Forced `true` non-dev (start-up guard)
`TLS_CERT_PATH`, `TLS_KEY_PATH`, `TLS_CA_PATH`	If TLS	—	mTLS certs
`LOG_LEVEL`	No	`info`	`debug` / `info` / `warn` / `error`
`HSM_PIN_PATH`	If feed export	—	File mount of HSM partition PIN

INFERENCE_PROVIDER=cloud (Anthropic/OpenAI) is disallowed. The start-up guard refuses to boot.

6. Deployment Environments

Environment	service replicas	worker replicas	Triton CPU	Triton GPU	Notes
Production (kbl)	3-10 (HPA)	2-20 (KEDA)	3	2 × T4	Full feature set
Production (mzr)	2	0 (paused)	2	0	Score gRPC only; pipelines paused
Staging	2	2	1	1 (shared)	Daily synthetic load
Development	1	1	1 (CPU only)	mock	Dockerised; no GPU
CI	1	1	mock	mock	Deterministic responses

7. Image Tagging & CI/CD

Image: ghcr.io/ghasi/fraud-intel-service:<git-sha>
Helm chart: charts/fraud-intel-service versioned alongside.
Argo CD application: apps/fraud-intel-service.yaml with sync wave 4 (after compliance-engine, before NOC dashboard).
Canary: Argo Rollouts with 10% → 25% → 50% → 100% over 30 min, checking SLO burn rate at each step.
Rollback: automatic on FraudScoreP95High or FraudScoreUnavailable firing during canary.

8. Resource Budget Summary

Component	CPU req	Memory req	GPU	Pods	Total CPU	Total Mem
fraud-intel-service	1	1 Gi	—	3-10	3-10	3-10 Gi
fraud-intel-worker	4	16 Gi	—	2-20	8-80	32-320 Gi
triton-fraud-cpu	4	8 Gi	—	3	12	24 Gi
triton-fraud-gpu	4	16 Gi	1 × T4	2	8	32 Gi
Steady-state (median)					~50 vCPU	~150 Gi
Burst					~110 vCPU	~390 Gi

1. Kubernetes Resources​

1.1 fraud-intel-service Deployment (NestJS)​

1.2 fraud-intel-worker Deployment (Python ML pipelines)​

1.3 KEDA scaler (worker autoscaling on NATS lag)​

1.4 Triton Inference Server (CPU pool)​

1.5 Triton Inference Server (GPU pool)​

1.6 HPA for fraud-intel-service​

1.7 PodDisruptionBudgets​

1.8 Services​

1.9 NetworkPolicy​

2. Background Workers​

3. Region Affinity​

4. Infrastructure Dependencies​

5. Environment Variables​

6. Deployment Environments​

7. Image Tagging & CI/CD​

8. Resource Budget Summary​

1. Kubernetes Resources

1.1 `fraud-intel-service` Deployment (NestJS)

1.2 `fraud-intel-worker` Deployment (Python ML pipelines)

1.3 KEDA scaler (worker autoscaling on NATS lag)

1.4 Triton Inference Server (CPU pool)

1.5 Triton Inference Server (GPU pool)

1.6 HPA for `fraud-intel-service`

1.7 PodDisruptionBudgets

1.8 Services

1.9 NetworkPolicy

2. Background Workers

3. Region Affinity

4. Infrastructure Dependencies

5. Environment Variables

6. Deployment Environments

7. Image Tagging & CI/CD

8. Resource Budget Summary