Fraud Intelligence Service — Sync Contract

Version: 1.0 Status: Draft Owner: Trust and Safety Last Updated: 2026-04-21 Companion: API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL

This document defines what other services depend on from fraud-intel-service and what fraud-intel-service depends on from others. It also specifies per-aggregate conflict policy and the outbox/inbox patterns used for cross-service consistency.

1. Consumers of Fraud Intelligence

Service	Interface	Dependency type	SLA expectation
`compliance-engine`	gRPC `FraudIntelService/Score`	Synchronous read in async pipeline	P95 ≤ 50 ms; availability 99.5%
`compliance-engine`	NATS consumer `fraud.tenant_score.updated.v1`, `fraud.detected.otp_grinding.v1`	Async event-driven	Event lag ≤ 60 s
`routing-engine`	gRPC `FraudIntelService/Score`	Synchronous read for high-risk sender check	P95 ≤ 50 ms; availability 99.5%
`sms-firewall-service`	NATS consumer `fraud.detected.simbox.v1`, `.ait_ring.v1`, `.greyroute.v1`, `.simbox_network.v1`	Async event-driven	Event lag ≤ 60 s
`sender-id-registry-service`	NATS consumer `fraud.case.action_dispatched.v1` (filter `SUSPEND_SENDER_ID`)	Async event-driven	Event lag ≤ 60 s
`noc-dashboard`	HTTP REST `/v1/fraud/dashboard/*`, SSE on `fraud.detected.>`	Synchronous + push	P95 ≤ 500 ms; SSE latency ≤ 2 s
`regulator-portal-service`	HTTP REST `/v1/internal/fraud/feed/import` (push); consumer of `fraud.feed.exported.v1` for SFTP mirror	Mixed	Daily batch; alert on > 30 min lag
`analytics-service`	NATS consumer `fraud.audit.v1`, `fraud.detected.>`	Long-term archival	Best-effort

Async contract semantics

The fraud-intel-service is detection-plane, not enforcement-plane: every detection is published as a NATS event consumed by the appropriate enforcement service. Consumers are responsible for their own enforcement state and idempotency. fraud-intel-service does not directly mutate firewall.peer_quarantine, sender_id.lifecycle_state, or any other consumer's data.

The synchronous Score gRPC is fail-closed-with-default: callers treat UNAVAILABLE / DEADLINE_EXCEEDED / INTERNAL as tier = PROBATION (neutral) and proceed. This is the fraud-intel-service's deliberate choice not to cascade outages into a compliance freeze.

2. Dependencies of Fraud Intelligence

Dependency	Interface	Failure mode if unavailable
PostgreSQL `fraud` schema	Read/write SQL via PgBouncer	Service returns `INTERNAL`; readiness probe fails → HPA-replaced; events queue in outbox
ClickHouse `fraud_features` cluster	TCP / native protocol	Stream ingestion buffers to on-disk WAL (1 h × 10K eps); pipelines skip the run with alert; Score gRPC uses Postgres L2 fallback
Redis	GET/SET/ZADD/ZCOUNT/BITOP	L1 cache miss → DB lookup; OTP-grinding streaming aggregator degrades to per-pod state with cross-replica drift
NATS JetStream	Consumer + publish	Outbox absorbs publish failures; consumer durable from last ACK on recovery
Triton Inference Server	gRPC `ModelInfer`	Pipeline falls back to rule-based pattern matchers (`FraudPattern` rows for that category); `fraud.alert.model.unavailable.v1` emitted
MinIO (model artifacts, feed exports)	S3	Model promotion blocked; feed export deferred; alert
Vault Transit / PKI	mTLS cert + signing key access	Cert rotation blocked (existing certs valid until expiry); HSM signing failure → MISP export deferred + CRITICAL alert
HSM (PKCS#11)	Library call	MISP export deferred; alert
`sender-id-registry-service` (lookup)	gRPC `LookupSenderId`	OTP-destination-class set to `GENERIC` (downgrade); detection still proceeds
`consent-ledger-service`	NATS consumer	OTP-harvest pipeline runs without revocation join; downgraded to flag-only output
`cdr-mediation-service`	NATS consumer	Grey-route pipeline runs with reduced fidelity

3. Proto definition

syntax = "proto3";
package ghasi.sms.fraud.v1;
option go_package = "github.com/ghasi/sms-gateway/fraud/v1";
import "google/protobuf/timestamp.proto";
import "google/protobuf/struct.proto";

service FraudIntelService {
  // Hot path. Called by compliance-engine, routing-engine, sender-id-registry.
  rpc Score(ScoreRequest) returns (ScoreResponse);

  // Server-streaming bulk score. Max 1000 entries per request.
  rpc BulkScore(BulkScoreRequest) returns (stream ScoreResponse);

  // Read recent fraud signals for a subject. Used by NOC dashboards.
  rpc GetSignals(GetSignalsRequest) returns (GetSignalsResponse);
}

enum ScoreScope {
  SCORE_SCOPE_UNSPECIFIED = 0;
  TENANT     = 1;
  SENDER_ID  = 2;
  MSISDN     = 3;
  PEER_ASN   = 4;
}

enum FraudTier {
  FRAUD_TIER_UNSPECIFIED = 0;
  SAFE       = 1;
  WATCH      = 2;
  RISKY      = 3;
  HIGH_RISK  = 4;
  PROBATION  = 5;
}

message ScoreRequest {
  ScoreScope scope    = 1;
  string id           = 2;
  string trace_id     = 3;
}

message ScoreResponse {
  string subject_id                              = 1;
  ScoreScope scope                               = 2;
  float score                                    = 3;   // 0..1
  FraudTier tier                                 = 4;
  repeated ContributingFactor contributing_factors = 5;
  string model_id                                = 6;
  string model_version                           = 7;
  google.protobuf.Timestamp computed_at          = 8;
  int32 stale_seconds                            = 9;
  string trace_id                                = 10;
}

message ContributingFactor {
  string category      = 1;
  float weight         = 2;
  string detection_id  = 3;
}

message BulkScoreRequest {
  repeated ScoreRequest entries = 1;
  string trace_id               = 2;
}

message GetSignalsRequest {
  ScoreScope scope                  = 1;
  string id                         = 2;
  google.protobuf.Timestamp since   = 3;
  int32 limit                       = 4;
}

message GetSignalsResponse {
  repeated FraudSignal signals = 1;
  string next_cursor           = 2;
}

message FraudSignal {
  string signal_id                       = 1;
  google.protobuf.Timestamp event_ts     = 2;
  string source_stream                   = 3;
  string category                        = 4;
  google.protobuf.Struct evidence        = 5;
}

4. Per-aggregate conflict policy

Aggregate	Conflict policy	Rationale
`FraudSignal`	append_only	Raw evidence is immutable. Replicas append independently; reads union.
`FraudDetection`	append_only	Detections are immutable findings; revisions create new detections.
`FraudCase`	server_authoritative (kbl-primary)	Workflow state with referential integrity to decisions; only kbl-primary mutates; mzr is read-only replica.
`FraudFeed.Indicators`	idempotent_upsert on (feedId, sourceUuid)	Re-import is the norm; (source, source_uuid) is the natural key; latest write wins per indicator.
`MlModel`, `ModelVersion`	server_authoritative (kbl-primary)	Promotion/rollback must be atomic; mzr replicas read-only and lag by < 30 s.
`EntityScore`	server_authoritative (kbl-primary)	Computed by kbl scoring worker; mzr serves Score gRPC reads with eventual consistency.
`Allowlist`	server_authoritative + two-person rule	Mutations blocked unless `addedBy != approvedBy`.
`AuditLog`	append_only	Regulatory immutability.
`Outbox`	server_authoritative (kbl-primary)	Each region has its own outbox; only kbl publishes events globally to avoid duplicates.

5. Multi-region replication

Per ADR-0004 §3:

kbl = primary write region for all aggregates above. ClickHouse cluster spans both regions for analytical reads but writes are kbl-primary.
mzr = warm standby. Reads entity_scores, detections, signals via replication; serves regional Score gRPC traffic with read-from-replica policy.
Failover. On kbl outage, mzr promotes within 5 minutes (manual operator confirmation per ADR-0004 §3.4 to avoid split-brain). Pipelines pause during failover; backfill runs on recovery.
Cross-region event flow. FRAUD_EVENTS, FRAUD_TENANT_SCORE, FRAUD_MODEL streams are mirrored kbl → mzr via NATS Leaf Node bridging. Mirror lag P95 ≤ 5 s.

6. Outbox pattern

All state-changing transactions in fraud-intel-service write to fraud.outbox in the same Postgres transaction. A relay process polls unpublished rows every 100 ms (in-process scheduler) and publishes to NATS with:

Nats-Msg-Id: {eventId} for JetStream dedup window.
Exponential backoff: 100 ms, 500 ms, 2 s, 10 s, 60 s, then DLQ.
published_at set on first successful ack; row retained 7 days for audit.

This guarantees:

No emitted event without a persisted state change (ACID-bound).
No unpublished state change for more than a few seconds (relay polling).
Idempotent consumers (NATS dedup window 2 min + downstream idempotency keys).

7. Inbox pattern

For consumed events (firewall.audit.v1, sms.dlr.inbound.v1, etc.) we use NATS JetStream durable consumers with AckExplicit. The consumer:

Receives a message and computes inboxKey = sha256(subject + msgId).
Attempts INSERT INTO fraud.inbox (inbox_key, processed_at) VALUES (...). PK violation → already processed → ack and skip.
Processes the message in a Postgres transaction (signal projection + outbox emit if applicable).
ACKs NATS on transaction commit.

This guarantees exactly-once-effective processing for consumed events, even on consumer restart or NATS redelivery.

CREATE TABLE fraud.inbox (
  inbox_key     TEXT PRIMARY KEY,
  source_stream TEXT NOT NULL,
  processed_at  TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (processed_at);
-- Daily partitions; retention 7 days.

8. Schema stability guarantees

gRPC proto

Field	Stability
`ScoreRequest.scope`, `id`, `trace_id`	Stable
`ScoreResponse.score`, `tier`, `model_id`, `model_version`	Stable
`ScoreResponse.contributing_factors`	Stable shape; new categories may appear (additive)
`FraudTier` enum values	Stable; new values additive; consumers must handle `UNSPECIFIED`
`BulkScore` streaming order	Preserves request order

REST API

Routes under /v1/fraud/* maintain backwards compatibility within the major version.
Breaking changes require /v2/fraud/ prefix and 90-day deprecation window.

9. Versioning policy

gRPC package: ghasi.sms.fraud.v1. Breaking change → v2 package, 90-day deprecation, dual-publish during overlap.
REST API: /v1/fraud/* → /v2/fraud/* with Sunset header on /v1 during deprecation.
Event subjects: fraud.<topic>.v1 → .v2 per EVENT_SCHEMAS §7.
Model artifact format versioned in ModelVersion.evaluation_metrics.artifact_format_version.

1. Consumers of Fraud Intelligence​

Async contract semantics​

2. Dependencies of Fraud Intelligence​

3. Proto definition​

4. Per-aggregate conflict policy​

5. Multi-region replication​

6. Outbox pattern​

7. Inbox pattern​

8. Schema stability guarantees​

gRPC proto​

REST API​

9. Versioning policy​