Skip to main content

Fraud Intelligence Service — Sync Contract

Version: 1.0 Status: Draft Owner: Trust and Safety Last Updated: 2026-04-21 Companion: API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL

This document defines what other services depend on from fraud-intel-service and what fraud-intel-service depends on from others. It also specifies per-aggregate conflict policy and the outbox/inbox patterns used for cross-service consistency.


1. Consumers of Fraud Intelligence

ServiceInterfaceDependency typeSLA expectation
compliance-enginegRPC FraudIntelService/ScoreSynchronous read in async pipelineP95 ≤ 50 ms; availability 99.5%
compliance-engineNATS consumer fraud.tenant_score.updated.v1, fraud.detected.otp_grinding.v1Async event-drivenEvent lag ≤ 60 s
routing-enginegRPC FraudIntelService/ScoreSynchronous read for high-risk sender checkP95 ≤ 50 ms; availability 99.5%
sms-firewall-serviceNATS consumer fraud.detected.simbox.v1, .ait_ring.v1, .greyroute.v1, .simbox_network.v1Async event-drivenEvent lag ≤ 60 s
sender-id-registry-serviceNATS consumer fraud.case.action_dispatched.v1 (filter SUSPEND_SENDER_ID)Async event-drivenEvent lag ≤ 60 s
noc-dashboardHTTP REST /v1/fraud/dashboard/*, SSE on fraud.detected.>Synchronous + pushP95 ≤ 500 ms; SSE latency ≤ 2 s
regulator-portal-serviceHTTP REST /v1/internal/fraud/feed/import (push); consumer of fraud.feed.exported.v1 for SFTP mirrorMixedDaily batch; alert on > 30 min lag
analytics-serviceNATS consumer fraud.audit.v1, fraud.detected.>Long-term archivalBest-effort

Async contract semantics

The fraud-intel-service is detection-plane, not enforcement-plane: every detection is published as a NATS event consumed by the appropriate enforcement service. Consumers are responsible for their own enforcement state and idempotency. fraud-intel-service does not directly mutate firewall.peer_quarantine, sender_id.lifecycle_state, or any other consumer's data.

The synchronous Score gRPC is fail-closed-with-default: callers treat UNAVAILABLE / DEADLINE_EXCEEDED / INTERNAL as tier = PROBATION (neutral) and proceed. This is the fraud-intel-service's deliberate choice not to cascade outages into a compliance freeze.


2. Dependencies of Fraud Intelligence

DependencyInterfaceFailure mode if unavailable
PostgreSQL fraud schemaRead/write SQL via PgBouncerService returns INTERNAL; readiness probe fails → HPA-replaced; events queue in outbox
ClickHouse fraud_features clusterTCP / native protocolStream ingestion buffers to on-disk WAL (1 h × 10K eps); pipelines skip the run with alert; Score gRPC uses Postgres L2 fallback
RedisGET/SET/ZADD/ZCOUNT/BITOPL1 cache miss → DB lookup; OTP-grinding streaming aggregator degrades to per-pod state with cross-replica drift
NATS JetStreamConsumer + publishOutbox absorbs publish failures; consumer durable from last ACK on recovery
Triton Inference ServergRPC ModelInferPipeline falls back to rule-based pattern matchers (FraudPattern rows for that category); fraud.alert.model.unavailable.v1 emitted
MinIO (model artifacts, feed exports)S3Model promotion blocked; feed export deferred; alert
Vault Transit / PKImTLS cert + signing key accessCert rotation blocked (existing certs valid until expiry); HSM signing failure → MISP export deferred + CRITICAL alert
HSM (PKCS#11)Library callMISP export deferred; alert
sender-id-registry-service (lookup)gRPC LookupSenderIdOTP-destination-class set to GENERIC (downgrade); detection still proceeds
consent-ledger-serviceNATS consumerOTP-harvest pipeline runs without revocation join; downgraded to flag-only output
cdr-mediation-serviceNATS consumerGrey-route pipeline runs with reduced fidelity

3. Proto definition

syntax = "proto3";
package ghasi.sms.fraud.v1;
option go_package = "github.com/ghasi/sms-gateway/fraud/v1";
import "google/protobuf/timestamp.proto";
import "google/protobuf/struct.proto";

service FraudIntelService {
// Hot path. Called by compliance-engine, routing-engine, sender-id-registry.
rpc Score(ScoreRequest) returns (ScoreResponse);

// Server-streaming bulk score. Max 1000 entries per request.
rpc BulkScore(BulkScoreRequest) returns (stream ScoreResponse);

// Read recent fraud signals for a subject. Used by NOC dashboards.
rpc GetSignals(GetSignalsRequest) returns (GetSignalsResponse);
}

enum ScoreScope {
SCORE_SCOPE_UNSPECIFIED = 0;
TENANT = 1;
SENDER_ID = 2;
MSISDN = 3;
PEER_ASN = 4;
}

enum FraudTier {
FRAUD_TIER_UNSPECIFIED = 0;
SAFE = 1;
WATCH = 2;
RISKY = 3;
HIGH_RISK = 4;
PROBATION = 5;
}

message ScoreRequest {
ScoreScope scope = 1;
string id = 2;
string trace_id = 3;
}

message ScoreResponse {
string subject_id = 1;
ScoreScope scope = 2;
float score = 3; // 0..1
FraudTier tier = 4;
repeated ContributingFactor contributing_factors = 5;
string model_id = 6;
string model_version = 7;
google.protobuf.Timestamp computed_at = 8;
int32 stale_seconds = 9;
string trace_id = 10;
}

message ContributingFactor {
string category = 1;
float weight = 2;
string detection_id = 3;
}

message BulkScoreRequest {
repeated ScoreRequest entries = 1;
string trace_id = 2;
}

message GetSignalsRequest {
ScoreScope scope = 1;
string id = 2;
google.protobuf.Timestamp since = 3;
int32 limit = 4;
}

message GetSignalsResponse {
repeated FraudSignal signals = 1;
string next_cursor = 2;
}

message FraudSignal {
string signal_id = 1;
google.protobuf.Timestamp event_ts = 2;
string source_stream = 3;
string category = 4;
google.protobuf.Struct evidence = 5;
}

4. Per-aggregate conflict policy

AggregateConflict policyRationale
FraudSignalappend_onlyRaw evidence is immutable. Replicas append independently; reads union.
FraudDetectionappend_onlyDetections are immutable findings; revisions create new detections.
FraudCaseserver_authoritative (kbl-primary)Workflow state with referential integrity to decisions; only kbl-primary mutates; mzr is read-only replica.
FraudFeed.Indicatorsidempotent_upsert on (feedId, sourceUuid)Re-import is the norm; (source, source_uuid) is the natural key; latest write wins per indicator.
MlModel, ModelVersionserver_authoritative (kbl-primary)Promotion/rollback must be atomic; mzr replicas read-only and lag by < 30 s.
EntityScoreserver_authoritative (kbl-primary)Computed by kbl scoring worker; mzr serves Score gRPC reads with eventual consistency.
Allowlistserver_authoritative + two-person ruleMutations blocked unless addedBy != approvedBy.
AuditLogappend_onlyRegulatory immutability.
Outboxserver_authoritative (kbl-primary)Each region has its own outbox; only kbl publishes events globally to avoid duplicates.

5. Multi-region replication

Per ADR-0004 §3:

  • kbl = primary write region for all aggregates above. ClickHouse cluster spans both regions for analytical reads but writes are kbl-primary.
  • mzr = warm standby. Reads entity_scores, detections, signals via replication; serves regional Score gRPC traffic with read-from-replica policy.
  • Failover. On kbl outage, mzr promotes within 5 minutes (manual operator confirmation per ADR-0004 §3.4 to avoid split-brain). Pipelines pause during failover; backfill runs on recovery.
  • Cross-region event flow. FRAUD_EVENTS, FRAUD_TENANT_SCORE, FRAUD_MODEL streams are mirrored kbl → mzr via NATS Leaf Node bridging. Mirror lag P95 ≤ 5 s.

6. Outbox pattern

All state-changing transactions in fraud-intel-service write to fraud.outbox in the same Postgres transaction. A relay process polls unpublished rows every 100 ms (in-process scheduler) and publishes to NATS with:

  • Nats-Msg-Id: {eventId} for JetStream dedup window.
  • Exponential backoff: 100 ms, 500 ms, 2 s, 10 s, 60 s, then DLQ.
  • published_at set on first successful ack; row retained 7 days for audit.

This guarantees:

  1. No emitted event without a persisted state change (ACID-bound).
  2. No unpublished state change for more than a few seconds (relay polling).
  3. Idempotent consumers (NATS dedup window 2 min + downstream idempotency keys).

7. Inbox pattern

For consumed events (firewall.audit.v1, sms.dlr.inbound.v1, etc.) we use NATS JetStream durable consumers with AckExplicit. The consumer:

  1. Receives a message and computes inboxKey = sha256(subject + msgId).
  2. Attempts INSERT INTO fraud.inbox (inbox_key, processed_at) VALUES (...). PK violation → already processed → ack and skip.
  3. Processes the message in a Postgres transaction (signal projection + outbox emit if applicable).
  4. ACKs NATS on transaction commit.

This guarantees exactly-once-effective processing for consumed events, even on consumer restart or NATS redelivery.

CREATE TABLE fraud.inbox (
inbox_key TEXT PRIMARY KEY,
source_stream TEXT NOT NULL,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (processed_at);
-- Daily partitions; retention 7 days.

8. Schema stability guarantees

gRPC proto

FieldStability
ScoreRequest.scope, id, trace_idStable
ScoreResponse.score, tier, model_id, model_versionStable
ScoreResponse.contributing_factorsStable shape; new categories may appear (additive)
FraudTier enum valuesStable; new values additive; consumers must handle UNSPECIFIED
BulkScore streaming orderPreserves request order

REST API

  • Routes under /v1/fraud/* maintain backwards compatibility within the major version.
  • Breaking changes require /v2/fraud/ prefix and 90-day deprecation window.

9. Versioning policy

  • gRPC package: ghasi.sms.fraud.v1. Breaking change → v2 package, 90-day deprecation, dual-publish during overlap.
  • REST API: /v1/fraud/*/v2/fraud/* with Sunset header on /v1 during deprecation.
  • Event subjects: fraud.<topic>.v1.v2 per EVENT_SCHEMAS §7.
  • Model artifact format versioned in ModelVersion.evaluation_metrics.artifact_format_version.