Skip to main content

Fraud Intelligence Service — Data Model

Version: 1.0 Status: Draft Owner: Trust and Safety Last Updated: 2026-04-21 Companion: DOMAIN_MODEL · SECURITY_MODEL

The service uses two storage tiers:

  1. PostgreSQL (fraud schema) — Owned exclusively by fraud-intel-service. Holds the registry, case-management workflow, model catalog, MISP indicators, allowlists, audit log, outbox. Online-transactional, low-volume relative to the feature store.
  2. ClickHouse (fraud_features schema) — Owned by fraud-intel-service, operated by SRE. Holds the columnar feature store, raw signals, model predictions, training-set snapshots. High-volume (10 M events/h target), monthly partitioned.

No other service reads or writes either schema directly — integration is via gRPC (Score), HTTP REST (admin), or NATS events.


1. PostgreSQL fraud schema (DDL)

-- =====================================================================
-- ENUM TYPES
-- =====================================================================

CREATE TYPE fraud.category AS ENUM (
'AIT','AIT_RING','SIMBOX','SIMBOX_NETWORK','OTP_HARVEST','OTP_GRINDING',
'GREY_ROUTE','SENDER_ID_ABUSE','DLR_UNIFORMITY','PHISHING','SPAM'
);
CREATE TYPE fraud.subject_scope AS ENUM (
'TENANT','SENDER_ID','MSISDN','MSISDN_BLOCK','PEER_ASN','MSISDN_COHORT'
);
CREATE TYPE fraud.confidence_tier AS ENUM ('LOW','MEDIUM','HIGH');
CREATE TYPE fraud.tier AS ENUM ('SAFE','WATCH','RISKY','HIGH_RISK','PROBATION');
CREATE TYPE fraud.case_status AS ENUM (
'PENDING_REVIEW','IN_REVIEW','CONFIRMED','DISMISSED','REFINE_FEATURES','STALE'
);
CREATE TYPE fraud.case_decision AS ENUM ('CONFIRM_FRAUD','DISMISS','REFINE_FEATURES');
CREATE TYPE fraud.suggested_action AS ENUM (
'BLOCKLIST_MSISDN','QUARANTINE_MSISDN_BLOCK','SUSPEND_SENDER_ID',
'DEPEER_PEER_ASN','THROTTLE_TENANT','BLOCKLIST_COHORT','NO_ACTION'
);
CREATE TYPE fraud.enforcement_status AS ENUM (
'EMITTED','CONSUMED','ENFORCED','SUPPRESSED','EXPIRED'
);
CREATE TYPE fraud.model_version_status AS ENUM (
'REGISTERED','SHADOW','ACTIVE','RETIRED','REJECTED'
);
CREATE TYPE fraud.feed_direction AS ENUM ('IMPORT','EXPORT');
CREATE TYPE fraud.feed_format AS ENUM ('MISP_2_4','STIX_2_1');
CREATE TYPE fraud.indicator_type AS ENUM (
'MSISDN','SENDER_ID','ASN','MSISDN_BLOCK','TEMPLATE_HASH','URL','IP_CIDR'
);

-- =====================================================================
-- SIGNALS (write-through projection of ClickHouse hot path)
-- =====================================================================

CREATE TABLE fraud.signals (
signal_id UUID NOT NULL,
event_ts TIMESTAMPTZ NOT NULL,
source_stream TEXT NOT NULL,
tenant_id UUID,
src_msisdn TEXT,
dst_msisdn TEXT,
sender_id TEXT,
mno_id TEXT,
peer_asn INT,
verdict TEXT,
dlr_status TEXT,
template_hash TEXT,
payload_hash TEXT NOT NULL,
attempt_count INT NOT NULL DEFAULT 1,
is_otp_likely BOOLEAN NOT NULL DEFAULT FALSE,
otp_destination_class TEXT,
ingested_at TIMESTAMPTZ NOT NULL DEFAULT now(),
trace_id TEXT,
PRIMARY KEY (event_ts, signal_id)
) PARTITION BY RANGE (event_ts);
-- Monthly partitions provisioned 3 months ahead by maintenance worker.

CREATE INDEX ix_sig_tenant_ts ON fraud.signals (tenant_id, event_ts DESC);
CREATE INDEX ix_sig_src_msisdn ON fraud.signals (src_msisdn) WHERE src_msisdn IS NOT NULL;
CREATE INDEX ix_sig_dst_msisdn ON fraud.signals (dst_msisdn) WHERE dst_msisdn IS NOT NULL;
CREATE INDEX ix_sig_payload_hash ON fraud.signals (payload_hash, event_ts DESC);

-- =====================================================================
-- DETECTIONS (high-confidence findings)
-- =====================================================================

CREATE TABLE fraud.detections (
detection_id UUID PRIMARY KEY,
category fraud.category NOT NULL,
subject_scope fraud.subject_scope NOT NULL,
subject_id TEXT NOT NULL,
score NUMERIC(4,3) NOT NULL CHECK (score >= 0 AND score <= 1),
confidence_tier fraud.confidence_tier NOT NULL,
evidence JSONB NOT NULL,
ai_provenance JSONB NOT NULL,
window_start TIMESTAMPTZ NOT NULL,
window_end TIMESTAMPTZ NOT NULL,
source_model_id UUID,
source_pipeline TEXT NOT NULL,
enforcement_status fraud.enforcement_status NOT NULL DEFAULT 'EMITTED',
suppression_reason TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ
);
CREATE INDEX ix_det_subject ON fraud.detections (subject_scope, subject_id, created_at DESC);
CREATE INDEX ix_det_category_ts ON fraud.detections (category, created_at DESC);
CREATE INDEX ix_det_active ON fraud.detections (enforcement_status, expires_at)
WHERE enforcement_status NOT IN ('EXPIRED','SUPPRESSED');

-- =====================================================================
-- CASES (HITL workflow)
-- =====================================================================

CREATE TABLE fraud.cases (
case_id UUID PRIMARY KEY,
category fraud.category NOT NULL,
subject_scope fraud.subject_scope NOT NULL,
subject_id TEXT NOT NULL,
score NUMERIC(4,3) NOT NULL CHECK (score >= 0.6 AND score < 0.85),
evidence JSONB NOT NULL,
ai_provenance JSONB NOT NULL,
suggested_action fraud.suggested_action NOT NULL DEFAULT 'NO_ACTION',
status fraud.case_status NOT NULL DEFAULT 'PENDING_REVIEW',
assigned_to UUID,
opened_at TIMESTAMPTZ NOT NULL DEFAULT now(),
opened_by TEXT NOT NULL, -- 'system:auto' or userId
decided_at TIMESTAMPTZ,
decided_by UUID,
decision fraud.case_decision,
reason TEXT,
action_executed BOOLEAN NOT NULL DEFAULT FALSE,
CHECK (decided_by IS NULL OR opened_by != decided_by::text)
);
CREATE INDEX ix_case_status_priority ON fraud.cases (status, opened_at)
WHERE status IN ('PENDING_REVIEW','IN_REVIEW');
CREATE INDEX ix_case_assigned ON fraud.cases (assigned_to) WHERE assigned_to IS NOT NULL;
CREATE INDEX ix_case_subject ON fraud.cases (subject_scope, subject_id);

CREATE TABLE fraud.case_decisions (
decision_id UUID PRIMARY KEY,
case_id UUID NOT NULL REFERENCES fraud.cases(case_id),
decision fraud.case_decision NOT NULL,
reason TEXT NOT NULL CHECK (length(reason) >= 20),
feature_corrections JSONB,
decided_at TIMESTAMPTZ NOT NULL DEFAULT now(),
decided_by UUID NOT NULL
);
CREATE INDEX ix_decisions_case ON fraud.case_decisions (case_id);
CREATE INDEX ix_decisions_decided ON fraud.case_decisions (decided_at);

-- =====================================================================
-- MODEL REGISTRY
-- =====================================================================

CREATE TABLE fraud.models (
model_id UUID PRIMARY KEY,
category fraud.category NOT NULL,
pipeline TEXT NOT NULL, -- XGBOOST / GRAPHSAGE / IFOREST / TRANSFORMER_TEXT
description TEXT,
active_version_id UUID,
shadow_version_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX ux_models_active_per_category
ON fraud.models (category, pipeline)
WHERE active_version_id IS NOT NULL;

CREATE TABLE fraud.model_versions (
version_id UUID PRIMARY KEY,
model_id UUID NOT NULL REFERENCES fraud.models(model_id),
version TEXT NOT NULL, -- semver
artifact_uri TEXT NOT NULL, -- s3://...
artifact_sha256 TEXT NOT NULL,
cosign_signature TEXT,
training_set_hash TEXT NOT NULL,
feature_set_hash TEXT NOT NULL,
evaluation_metrics JSONB NOT NULL, -- { auc, f1, precision, recall, brier, fairness:{...} }
model_card_uri TEXT NOT NULL,
status fraud.model_version_status NOT NULL DEFAULT 'REGISTERED',
registered_at TIMESTAMPTZ NOT NULL DEFAULT now(),
registered_by UUID NOT NULL,
promoted_at TIMESTAMPTZ,
promoted_by UUID,
retired_at TIMESTAMPTZ,
UNIQUE (model_id, version)
);
CREATE INDEX ix_mv_model_status ON fraud.model_versions (model_id, status);

CREATE TABLE fraud.model_evaluations (
eval_id UUID PRIMARY KEY,
version_id UUID NOT NULL REFERENCES fraud.model_versions(version_id),
evaluated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
test_set_hash TEXT NOT NULL,
metrics JSONB NOT NULL, -- per-fold and aggregate
evaluation_window TEXT, -- 'shadow_24h' | 'training_holdout'
decision TEXT, -- 'PASS' | 'FAIL' | 'INCONCLUSIVE'
decision_reason TEXT
);
CREATE INDEX ix_meval_version ON fraud.model_evaluations (version_id, evaluated_at DESC);

-- =====================================================================
-- MISP / STIX FEEDS
-- =====================================================================

CREATE TABLE fraud.feeds (
feed_id UUID PRIMARY KEY,
feed_name TEXT NOT NULL UNIQUE,
direction fraud.feed_direction NOT NULL,
format fraud.feed_format NOT NULL,
transport TEXT NOT NULL, -- HTTP_PUSH / SFTP_MIRROR / S3_BUCKET
signature_policy TEXT NOT NULL, -- HSM_REQUIRED / HSM_OPTIONAL / NONE_INTERNAL_ONLY
public_key_ref TEXT, -- vault://path
decay_profile JSONB NOT NULL DEFAULT '{"halfLifeDays":30,"floorWeight":0.05}',
schedule_cron TEXT,
last_sync_at TIMESTAMPTZ,
next_run_at TIMESTAMPTZ,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
created_by UUID
);

CREATE TABLE fraud.feed_indicators (
indicator_id UUID PRIMARY KEY,
feed_id UUID NOT NULL REFERENCES fraud.feeds(feed_id),
source_uuid TEXT NOT NULL,
type fraud.indicator_type NOT NULL,
value TEXT NOT NULL,
confidence NUMERIC(4,3) NOT NULL DEFAULT 1.000,
decay_factor NUMERIC(4,3) NOT NULL DEFAULT 1.000,
tags TEXT[],
imported_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expired_at TIMESTAMPTZ,
UNIQUE (feed_id, source_uuid)
);
CREATE INDEX ix_fi_value ON fraud.feed_indicators (type, value);
CREATE INDEX ix_fi_active ON fraud.feed_indicators (feed_id, imported_at DESC)
WHERE expired_at IS NULL;

CREATE TABLE fraud.feed_imports (
import_id UUID PRIMARY KEY,
feed_id UUID NOT NULL REFERENCES fraud.feeds(feed_id),
signature_valid BOOLEAN NOT NULL,
signature_key_id TEXT,
added_count INT NOT NULL DEFAULT 0,
updated_count INT NOT NULL DEFAULT 0,
expired_count INT NOT NULL DEFAULT 0,
payload_sha256 TEXT NOT NULL,
imported_at TIMESTAMPTZ NOT NULL DEFAULT now(),
imported_by TEXT
);

-- =====================================================================
-- FEEDBACK & ALLOWLIST
-- =====================================================================

CREATE TABLE fraud.allowlists (
allowlist_id UUID PRIMARY KEY,
scope fraud.subject_scope NOT NULL,
value TEXT NOT NULL,
reason TEXT NOT NULL,
added_by UUID NOT NULL,
approved_by UUID NOT NULL, -- two-person rule
added_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ,
UNIQUE (scope, value)
);

CREATE TABLE fraud.cohort_exclusions (
exclusion_id UUID PRIMARY KEY,
cohort_hash TEXT NOT NULL UNIQUE,
reason TEXT NOT NULL,
added_by UUID NOT NULL,
added_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE fraud.template_allowlist (
template_hash TEXT PRIMARY KEY,
description TEXT NOT NULL,
added_by UUID NOT NULL,
added_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE fraud.feature_imputation_policy (
feature_name TEXT PRIMARY KEY,
strategy TEXT NOT NULL, -- ZERO / MEAN_24H / MEDIAN_7D / CONSTANT
constant_value NUMERIC,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE fraud.otp_patterns (
pattern_id UUID PRIMARY KEY,
language TEXT NOT NULL, -- ISO 639-1
regex TEXT NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
version INT NOT NULL DEFAULT 1,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- =====================================================================
-- ENTITY SCORES (Score gRPC L2)
-- =====================================================================

CREATE TABLE fraud.entity_scores (
scope fraud.subject_scope NOT NULL,
subject_id TEXT NOT NULL,
score NUMERIC(4,3) NOT NULL,
tier fraud.tier NOT NULL,
contributing_factors JSONB NOT NULL DEFAULT '[]',
model_versions JSONB NOT NULL DEFAULT '{}',
computed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (scope, subject_id)
);
CREATE INDEX ix_scores_tier ON fraud.entity_scores (tier);
CREATE INDEX ix_scores_computed ON fraud.entity_scores (computed_at);

CREATE TABLE fraud.entity_score_history (
scope fraud.subject_scope NOT NULL,
subject_id TEXT NOT NULL,
computed_at TIMESTAMPTZ NOT NULL,
score NUMERIC(4,3) NOT NULL,
tier fraud.tier NOT NULL,
contributing_factors JSONB NOT NULL,
PRIMARY KEY (computed_at, scope, subject_id)
) PARTITION BY RANGE (computed_at);

-- =====================================================================
-- AUDIT LOG (append-only)
-- =====================================================================

CREATE TYPE fraud.audit_entity AS ENUM (
'CASE','MODEL','MODEL_VERSION','FEED','ALLOWLIST','PATTERN','TENANT_SCORE','OUTBOX'
);
CREATE TYPE fraud.audit_action AS ENUM (
'CREATE','UPDATE','DELETE','DECIDE','PROMOTE','ROLLBACK','IMPORT','EXPORT','SUPPRESS','OVERRIDE'
);

CREATE TABLE fraud.audit_log (
audit_id UUID NOT NULL,
entity_type fraud.audit_entity NOT NULL,
entity_id TEXT NOT NULL,
action fraud.audit_action NOT NULL,
actor_user_id UUID, -- null = system
before JSONB,
after JSONB,
ip INET,
user_agent TEXT,
trace_id TEXT,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (occurred_at, audit_id)
) PARTITION BY RANGE (occurred_at);
CREATE INDEX ix_audit_entity ON fraud.audit_log (entity_type, entity_id, occurred_at DESC);
CREATE INDEX ix_audit_actor ON fraud.audit_log (actor_user_id, occurred_at DESC);

-- =====================================================================
-- OUTBOX
-- =====================================================================

CREATE TABLE fraud.outbox (
event_id UUID PRIMARY KEY,
subject TEXT NOT NULL,
payload JSONB NOT NULL,
published_at TIMESTAMPTZ,
attempts INT NOT NULL DEFAULT 0,
last_error TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ix_outbox_unpublished ON fraud.outbox (created_at) WHERE published_at IS NULL;

2. ClickHouse fraud_features schema (DDL)

-- Distributed cluster `fraud_cluster` (3 shards × 2 replicas)

CREATE TABLE fraud_features.events_local ON CLUSTER fraud_cluster
(
event_ts DateTime64(3, 'UTC'),
signal_id UUID,
source_stream LowCardinality(String),
tenant_id Nullable(UUID),
src_msisdn Nullable(String),
dst_msisdn Nullable(String),
sender_id Nullable(String),
mno_id LowCardinality(Nullable(String)),
peer_asn Nullable(UInt32),
verdict LowCardinality(Nullable(String)),
dlr_status LowCardinality(Nullable(String)),
template_hash Nullable(String),
payload_hash String,
segments UInt8 DEFAULT 1,
attempt_count UInt8 DEFAULT 1,
is_otp_likely UInt8 DEFAULT 0,
otp_destination_class LowCardinality(Nullable(String)),
imported_indicator_msisdn UInt8 DEFAULT 0,
imported_indicator_template UInt8 DEFAULT 0,
ingested_at DateTime64(3,'UTC') DEFAULT now64(3),
trace_id String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/fraud_events', '{replica}')
PARTITION BY toYYYYMM(event_ts)
ORDER BY (tenant_id, event_ts, signal_id)
TTL event_ts + INTERVAL 90 DAY DELETE;

CREATE TABLE fraud_features.events ON CLUSTER fraud_cluster
AS fraud_features.events_local
ENGINE = Distributed('fraud_cluster','fraud_features','events_local', cityHash64(tenant_id));

CREATE TABLE fraud_features.events_dlq
( event_ts DateTime64(3), source_stream LowCardinality(String),
raw String, reject_reason String )
ENGINE = MergeTree() ORDER BY event_ts TTL event_ts + INTERVAL 14 DAY;

-- AIT pipeline outputs
CREATE TABLE fraud_features.ait_window_features ON CLUSTER fraud_cluster
( window_start DateTime,
tenant_id UUID, dst_mno LowCardinality(String), sender_id String,
submit_count UInt64,
dlr_delivered_count UInt64, dlr_failed_count UInt64,
dlr_success_rate Float32,
unique_dst_msisdns UInt64,
mean_segments_per_msg Float32,
entropy_of_dst_prefix Float32,
unique_sender_ids UInt32,
repeated_body_ratio Float32,
peer_asn_diversity UInt16,
cohort_anomaly_score Nullable(Float32) )
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ait_window','{replica}')
PARTITION BY toYYYYMM(window_start)
ORDER BY (window_start, tenant_id, dst_mno, sender_id)
TTL window_start + INTERVAL 60 DAY;

CREATE TABLE fraud_features.ait_predictions ON CLUSTER fraud_cluster
( window_start DateTime, tenant_id UUID, dst_mno String, sender_id String,
score Float32, model_id UUID, model_version String,
shap_top3 String, predicted_at DateTime DEFAULT now() )
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ait_pred','{replica}')
PARTITION BY toYYYYMM(window_start)
ORDER BY (window_start, tenant_id);

CREATE TABLE fraud_features.ait_cohorts ON CLUSTER fraud_cluster
( cohort_hash UInt64,
dst_msisdn_count UInt32,
contributing_tenants UInt16,
contributing_sender_ids UInt16,
cohort_anomaly_score Float32,
first_seen_ts DateTime, last_seen_ts DateTime )
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ait_cohorts','{replica}')
ORDER BY cohort_hash
TTL last_seen_ts + INTERVAL 30 DAY;

CREATE TABLE fraud_features.shadow_predictions ON CLUSTER fraud_cluster
( window_start DateTime, version_id UUID,
subject_scope String, subject_id String,
score Float32, predicted_at DateTime DEFAULT now() )
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/shadow','{replica}')
PARTITION BY toYYYYMM(window_start)
ORDER BY (version_id, window_start, subject_id)
TTL window_start + INTERVAL 30 DAY;

-- DLR baseline
CREATE MATERIALIZED VIEW fraud_features.dlr_baseline_7d
ENGINE = AggregatingMergeTree()
PARTITION BY tenant_id
ORDER BY (tenant_id, dst_mno)
AS SELECT
tenant_id,
mno_id AS dst_mno,
avgState(toFloat32(dlr_status='DELIVERED')) AS mean_state,
varSampState(toFloat32(dlr_status='DELIVERED')) AS var_state
FROM fraud_features.events
WHERE event_ts >= now() - INTERVAL 7 DAY
AND source_stream = 'SMS_DLR'
GROUP BY tenant_id, dst_mno;

3. Redis key namespaces

All keys namespaced fraud:…. Tenant-scoped keys encode {tenantId} for per-tenant invalidation.

KeyTTLPurpose
fraud:score:{scope}:{id}900 s (15 min)L1 cache for Score gRPC
fraud:graph:cohort:{cohortHash}3600 sCohort metadata cache
fraud:simbox:{msisdnBlock}86400 sSIM-box detection cooldown to prevent re-emission
fraud:otp:dst:{msisdnHash}:60s60 s (sliding)Sorted set of OTP timestamps for grinding detector
fraud:throttle:dst:{msisdnHash}21600 s (6 h)Throttle handle published with grinding detection
fraud:imported:msisdn:bloomno TTLBloom filter of imported MISP MSISDN indicators
fraud:lock:ait:5m300 sDistributed lock for AIT pipeline
fraud:lock:cohort:1h3600 sDistributed lock for cohort job
fraud:lock:simbox:30m1800 sDistributed lock for SIM-box pipeline
fraud:lock:greyroute:1h3600 sDistributed lock for grey-route pipeline
fraud:lock:otp_harvest:30m1800 sDistributed lock for OTP-harvest pipeline
fraud:lock:scoring:hourly3600 sDistributed lock for hourly score recompute
fraud:model:active:{category}:{pipeline}60 sCached pointer to active ModelVersion (worker hot-reload)
fraud:feature:imputation:policy300 sFeature imputation policy mirror
fraud:otp:patterns:active300 sOTP regex set version marker
fraud:case:queue:{category}:sizeno TTLCounter of pending cases per category

4. Row-Level Security & append-only protection

4.1 Append-only on signals, audit_log

CREATE RULE signals_no_update AS ON UPDATE TO fraud.signals DO INSTEAD NOTHING;
CREATE RULE signals_no_delete AS ON DELETE TO fraud.signals DO INSTEAD NOTHING;
CREATE RULE audit_log_no_update AS ON UPDATE TO fraud.audit_log DO INSTEAD NOTHING;
CREATE RULE audit_log_no_delete AS ON DELETE TO fraud.audit_log DO INSTEAD NOTHING;

Retention is enforced by partition pruning, not by DELETE.

4.2 RLS on per-tenant scoring read

ALTER TABLE fraud.entity_scores ENABLE ROW LEVEL SECURITY;

CREATE POLICY scores_tenant_read ON fraud.entity_scores
FOR SELECT
USING (
scope = 'TENANT'
AND subject_id = current_setting('app.current_tenant_id', TRUE)
);

CREATE POLICY scores_admin_all ON fraud.entity_scores
USING (current_setting('app.caller_role') IN (
'tns-fraud-analyst','tns-fraud-analyst-lead',
'noc-operator','platform.compliance.admin','platform.auditor'
));

The tenant portal does not currently expose fraud scores directly (cross-tenant data sensitivity); the policy is scaffolding for future opt-in.


5. ID prefixes

PrefixEntity
fs_FraudSignal
fd_FraudDetection
fc_FraudCase
ff_FraudFeed
fi_FeedIndicator
fp_FraudPattern
ml_MlModel
mv_ModelVersion
me_ModelEvaluation
al_AuditLog
aw_AllowlistEntry
ot_OtpPattern
tr_TrainingRun

Internally UUIDv4; prefixes apply to externally exposed IDs in REST responses.


6. PII, encryption & retention

FieldClassProtection
signals.src_msisdn, signals.dst_msisdnCONFIDENTIALDB + disk encryption; API masking; RLS-gated
signals.template_hashINTERNALPlain
events.dst_msisdn (ClickHouse)CONFIDENTIALDisk-level encryption (LUKS); cluster network is intra-VPC only
feed_indicators.value (when MSISDN)CONFIDENTIALSame
cases.evidenceINTERNAL (no body)Plain (analyst-visible)
audit_log.before/afterCONFIDENTIAL (may include MSISDN)Same
model_versions.training_set_hashINTERNALPlain

Encryption. No field-level envelope encryption is required for fraud data (no SMS body is ever stored; MSISDNs are pseudonymous). All data is protected by DB + disk encryption and tenant-isolated network controls.

Retention.

  • fraud.signals: 90 days (Postgres partition pruning) + ClickHouse 90 days hot. Cold archive to S3 with object-lock for 13 months.
  • fraud.detections: 365 days hot, then archived.
  • fraud.cases: 7 years (regulatory dispute window).
  • fraud.audit_log: 13 months minimum, regulator-mandated.
  • fraud.feed_indicators: indefinite (decay-weighted; effective expiry when decay_factor < floor).
  • fraud_features.events_dlq: 14 days.
  • fraud_features.shadow_predictions: 30 days.

7. Migration strategy

  • Partitions are provisioned 3 months ahead by PartitionMaintenanceWorker (daily 03:00 UTC). Missing future partitions trigger FraudPartitionMissing HIGH alert.
  • ClickHouse retention enforced by TTL clauses; verified daily by clickhouse-backup integrity check.
  • Schema changes follow expand/contract; all migrations forward-only, reviewed by Security for any column that holds MSISDN.
  • Cross-region ClickHouse replication uses Replicated*MergeTree over Keeper; kbl primary, mzr warm read replica.

8. Backups

StoreCadenceRetentionRTO / RPO
Postgres fraud schemaContinuous WAL archive to S3 + nightly logical dump35 days WAL, 90 days logicalRTO 30 min, RPO 5 min
ClickHouse fraud_featuresDaily incremental via clickhouse-backup to S330 days incrementalRTO 4 h (acceptable — async pipelines tolerate gap), RPO 24 h
MinIO model artifacts + feed exportsObject versioning + cross-region replicationIndefinite (5y minimum)RTO 1 h, RPO 0