Skip to main content

fraud-intel-service — Service Readiness

Version: 1.0 Status: Draft Owner: Trust and Safety + ML Ops + SRE Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, _report.md, AI_INTEGRATION.md, FAILURE_MODES.md

Readiness criteria for production deployment. Fraud-intel is fail-open (informational). The bar therefore focuses on: model quality (precision, recall, F1 per category), drift monitoring, fairness, adversarial-robustness, and downstream integration contracts with firewall + sender-id-registry + compliance-engine.


1. Code Readiness

CriterionStatusNotes
gRPC FraudIntelService.v1 (Score, GetSignals, BulkScore)
REST admin: signal browsing, model lifecycle, feed admin, retroactive-scan triggers, dashboard query
Triton integration for 3 production models (AIT, SIM-box, OTP-harvest)
Rule-based fallback on Triton circuit-breaker open
Feature-store integration (Redis + Postgres) with in-process LRU
PII anonymisation (MSISDN SHA-256 hash) before any inferenceMandatory per AI_INTEGRATION
Budget enforcement: Score 100 ms P99 cap, fallback at 80 ms
Circuit breaker on Triton with 30 s half-open
NATS consumers: sms.dlr.inbound, sms.mo.inbound, compliance.audit.v1, firewall.audit.v1, sender.id.suspended.v1
MISP feed sync worker (STIX 2.1) with pluggable source adapter
Model registry (Postgres + S3 artifacts) with immutable versioning
Training pipeline (Python + Airflow) with quarterly cadence
Model deployment pipeline with checksum verification + A/B shadow
Drift detection job (weekly F1 vs. baseline)
Feedback API (T&S correction loop) with weight-capping
Per-signal audit row with model version
mTLS gRPC + SPIRE SVID

2. Testing Readiness

CriterionTargetStatus
Unit coverage≥ 90% line (domain) / ≥ 80% branch
Unit tests for feature extractors≥ 30
Unit tests for score normalisers + category mapping≥ 20
Property-based tests (fast-check): feature determinism≥ 10
Model evaluation: held-out test set per model10 k labelled per category
Model evaluation targets (AIT)precision ≥ 0.92, recall ≥ 0.80
Model evaluation targets (SIM-box)precision ≥ 0.88, recall ≥ 0.75
Model evaluation targets (OTP-harvest)precision ≥ 0.90, recall ≥ 0.70
Adversarial test corpus per model≥ 500 crafted examples
Fairness audit per model (per-MNO disparate recall ≤ 15%)Passed
Integration: Score @ 1 000 RPS P99 ≤ 100 msPassed
Integration: signal NATS consumers sustain 10 000 events/minPassed
Integration: MISP feed sync with mock endpointPassed
Integration: model deploy + rollback via registryPassed
Contract test with firewall consumerPassed
Contract test with sender-id-registry consumerPassed
Contract test with compliance-engine tenant scoring feedPassed
Chaos: Triton unavailable → rule-based fallbackPassed
Chaos: Postgres unavailable → read-only degradedPassed
Chaos: NATS lag → consumer scaling + stale-signal handlingPassed
Chaos: feature store partial outage → LRU fallbackPassed
Security: feedback-loop poisoning resistance (synthetic attack)Passed
Security: model-artifact tamper detection via checksumPassed

3. Observability Readiness

CriterionStatus
All Prometheus metrics emitting (OBSERVABILITY.md §1)
Grafana dashboard fraud-intel-service.json deployed
All alerts configured with runbooks
Structured logs with MSISDN hashing
OTel tracing across Score calls verified
Loki parsing validated
SIEM forwarding of fraud.detected.* verified

Alerts Configured

  • FraudScoreLatencyHigh (P95 > 100 ms for 5 min)
  • FraudScoreErrorHigh (> 0.1% 5xx)
  • FraudMlUnavailable (circuit breaker open > 2 min)
  • FraudModelDriftHigh (F1 drop > 5% vs. baseline)
  • FraudFeedSyncStale (no successful sync > 6 h)
  • FraudNatsConsumerLag (> 5 min)
  • FraudSignalStorm (signal ingest > 3× baseline for 5 min)
  • FraudDbUnavailable
  • FraudFeatureStoreUnavailable
  • FraudFeedbackAnomaly (single labeller > 5% of week's corrections)
  • FraudModelArtifactCorrupt (checksum mismatch)

4. Security Readiness

CriterionStatus
mTLS on gRPC + SPIRE SVIDs
NetworkPolicy restricting gRPC ingress to firewall, sender-id-registry, compliance, NOC
Kong JWT on REST admin endpoints
MSISDN hashing verified on every model call path
Model artifact S3 bucket immutable + versioned + object-locked
Model deploy dual-control + checksum verified
Feedback API role-restricted (T&S only; auditable identity)
No cloud-LLM / external-API call with PII
Training-data PII scrubbing verified
Pen test against REST admin + gRPC
Security team sign-off

5. Operational Readiness

CriterionStatus
K8s Deployment (3–10 replicas) + HPA on RPS
Triton Deployment (3 replicas, GPU) on np-data node pool
Training Airflow setup on separate node pool (CPU + optional GPU on-demand)
PDB minAvailable: 2 per region
Rolling update: no dropped Score calls under steady 500 RPS
Graceful shutdown (15 s SIGTERM)
Postgres conn pool sized
Redis conn pool sized
Model deployment runbook drafted
Model drift incident runbook drafted
Feed sync failure runbook drafted
Feedback poisoning runbook drafted
On-call: T&S primary, ML Ops secondary, SRE tertiary

6. Documentation Readiness

All 16 SERVICE_TEMPLATE docs at "Complete". Plus runbooks, model cards (AI_INTEGRATION §12 reference), feedback-labeller handbook, and drift-response playbook.


7. Compliance / Regulatory Readiness

CriterionStatus
DPIA authored for signal processing and ML inference
Fairness audit signed off by Trust & Safety lead
Model cards published per model (accuracy, fairness, training data lineage, known limitations)
MISP reciprocal-sharing terms agreed with external parties (if any)
SIEM forwarding of fraud events approved by regulator-portal team
Audit retention policy configured (90 d hot, 7 y cold for detections)

8. Go/No-Go Criteria Summary

Production deployment is GO when:

  • All §1 Code Readiness complete.
  • Per-model test-set targets met (precision/recall per §2).
  • Fairness audit: per-MNO disparate recall ≤ 15%.
  • Adversarial corpus: < 2% bypass rate.
  • Load test at 1.5× expected peak RPS (target 1 500 Score RPS) P99 ≤ 120 ms.
  • 14-day shadow mode: signal emission rate matches forecast.
  • Drift monitoring baseline established + alert tested.
  • Chaos drill (ML, DB, NATS, feature store) all degrade correctly.
  • Security + ML Ops + Legal sign-offs.
  • Model cards published + reviewed.
  • Rollback plan validated in staging.

9. Post-Launch Review

Within 30 days:

  • Detection-to-enforcement latency (signal emission → firewall / sender-id-registry action).
  • Per-model real-world precision vs. held-out test-set precision.
  • False-positive / false-negative rate review with T&S.
  • Feedback-loop volume; label-source distribution.
  • Cost analysis: Triton inference hours, NATS bandwidth, Postgres + S3 storage.
  • Model drift trend review; determine retraining trigger.
  • Signal volume per MNO; check for bias indicators.
  • Downstream integration health: firewall pick-up rate, sender-id-registry reputation deltas traced to fraud signals.

10. Phased Rollout

PhaseDurationBehaviourExit criteria
P1 — Signals emitted, not enforced14 dfraud.detected.* published; downstream consumers log but don't act on themFP projection < 2% per category
P2 — Enforcement: sender-id reputation only7 dSender-ID registry honours fraud signals (reputation updates); firewall still observesNo unexpected auto-suspension cluster
P3 — Full EnforcementOngoingFirewall + compliance engine honour fraud signals; NOC dashboards live; feedback loop activeSteady state

Rollback flags: FRAUD_SIGNAL_EMISSION_ENABLED, FRAUD_ML_ENABLED, FRAUD_FEEDBACK_API_ENABLED.