fraud-intel-service — Service Readiness

Version: 1.0 Status: Draft Owner: Trust and Safety + ML Ops + SRE Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, _report.md, AI_INTEGRATION.md, FAILURE_MODES.md

Readiness criteria for production deployment. Fraud-intel is fail-open (informational). The bar therefore focuses on: model quality (precision, recall, F1 per category), drift monitoring, fairness, adversarial-robustness, and downstream integration contracts with firewall + sender-id-registry + compliance-engine.

1. Code Readiness

Criterion	Status	Notes
gRPC `FraudIntelService.v1` (Score, GetSignals, BulkScore)	☐
REST admin: signal browsing, model lifecycle, feed admin, retroactive-scan triggers, dashboard query	☐
Triton integration for 3 production models (AIT, SIM-box, OTP-harvest)	☐
Rule-based fallback on Triton circuit-breaker open	☐
Feature-store integration (Redis + Postgres) with in-process LRU	☐
PII anonymisation (MSISDN SHA-256 hash) before any inference	☐	Mandatory per AI_INTEGRATION
Budget enforcement: Score 100 ms P99 cap, fallback at 80 ms	☐
Circuit breaker on Triton with 30 s half-open	☐
NATS consumers: `sms.dlr.inbound`, `sms.mo.inbound`, `compliance.audit.v1`, `firewall.audit.v1`, `sender.id.suspended.v1`	☐
MISP feed sync worker (STIX 2.1) with pluggable source adapter	☐
Model registry (Postgres + S3 artifacts) with immutable versioning	☐
Training pipeline (Python + Airflow) with quarterly cadence	☐
Model deployment pipeline with checksum verification + A/B shadow	☐
Drift detection job (weekly F1 vs. baseline)	☐
Feedback API (T&S correction loop) with weight-capping	☐
Per-signal audit row with model version	☐
mTLS gRPC + SPIRE SVID	☐

2. Testing Readiness

Criterion	Target	Status
Unit coverage	≥ 90% line (domain) / ≥ 80% branch	☐
Unit tests for feature extractors	≥ 30	☐
Unit tests for score normalisers + category mapping	≥ 20	☐
Property-based tests (fast-check): feature determinism	≥ 10	☐
Model evaluation: held-out test set per model	10 k labelled per category	☐
Model evaluation targets (AIT)	precision ≥ 0.92, recall ≥ 0.80	☐
Model evaluation targets (SIM-box)	precision ≥ 0.88, recall ≥ 0.75	☐
Model evaluation targets (OTP-harvest)	precision ≥ 0.90, recall ≥ 0.70	☐
Adversarial test corpus per model	≥ 500 crafted examples	☐
Fairness audit per model (per-MNO disparate recall ≤ 15%)	Passed	☐
Integration: Score @ 1 000 RPS P99 ≤ 100 ms	Passed	☐
Integration: signal NATS consumers sustain 10 000 events/min	Passed	☐
Integration: MISP feed sync with mock endpoint	Passed	☐
Integration: model deploy + rollback via registry	Passed	☐
Contract test with firewall consumer	Passed	☐
Contract test with sender-id-registry consumer	Passed	☐
Contract test with compliance-engine tenant scoring feed	Passed	☐
Chaos: Triton unavailable → rule-based fallback	Passed	☐
Chaos: Postgres unavailable → read-only degraded	Passed	☐
Chaos: NATS lag → consumer scaling + stale-signal handling	Passed	☐
Chaos: feature store partial outage → LRU fallback	Passed	☐
Security: feedback-loop poisoning resistance (synthetic attack)	Passed	☐
Security: model-artifact tamper detection via checksum	Passed	☐

3. Observability Readiness

Criterion	Status
All Prometheus metrics emitting (OBSERVABILITY.md §1)	☐
Grafana dashboard `fraud-intel-service.json` deployed	☐
All alerts configured with runbooks	☐
Structured logs with MSISDN hashing	☐
OTel tracing across Score calls verified	☐
Loki parsing validated	☐
SIEM forwarding of `fraud.detected.*` verified	☐

Alerts Configured

4. Security Readiness

Criterion	Status
mTLS on gRPC + SPIRE SVIDs	☐
NetworkPolicy restricting gRPC ingress to firewall, sender-id-registry, compliance, NOC	☐
Kong JWT on REST admin endpoints	☐
MSISDN hashing verified on every model call path	☐
Model artifact S3 bucket immutable + versioned + object-locked	☐
Model deploy dual-control + checksum verified	☐
Feedback API role-restricted (T&S only; auditable identity)	☐
No cloud-LLM / external-API call with PII	☐
Training-data PII scrubbing verified	☐
Pen test against REST admin + gRPC	☐
Security team sign-off	☐

5. Operational Readiness

Criterion	Status
K8s Deployment (3–10 replicas) + HPA on RPS	☐
Triton Deployment (3 replicas, GPU) on `np-data` node pool	☐
Training Airflow setup on separate node pool (CPU + optional GPU on-demand)	☐
PDB `minAvailable: 2` per region	☐
Rolling update: no dropped Score calls under steady 500 RPS	☐
Graceful shutdown (15 s SIGTERM)	☐
Postgres conn pool sized	☐
Redis conn pool sized	☐
Model deployment runbook drafted	☐
Model drift incident runbook drafted	☐
Feed sync failure runbook drafted	☐
Feedback poisoning runbook drafted	☐
On-call: T&S primary, ML Ops secondary, SRE tertiary	☐

6. Documentation Readiness

All 16 SERVICE_TEMPLATE docs at "Complete". Plus runbooks, model cards (AI_INTEGRATION §12 reference), feedback-labeller handbook, and drift-response playbook.

7. Compliance / Regulatory Readiness

Criterion	Status
DPIA authored for signal processing and ML inference	☐
Fairness audit signed off by Trust & Safety lead	☐
Model cards published per model (accuracy, fairness, training data lineage, known limitations)	☐
MISP reciprocal-sharing terms agreed with external parties (if any)	☐
SIEM forwarding of fraud events approved by regulator-portal team	☐
Audit retention policy configured (90 d hot, 7 y cold for detections)	☐

8. Go/No-Go Criteria Summary

Production deployment is GO when:

9. Post-Launch Review

Within 30 days:

Detection-to-enforcement latency (signal emission → firewall / sender-id-registry action).
Per-model real-world precision vs. held-out test-set precision.
False-positive / false-negative rate review with T&S.
Feedback-loop volume; label-source distribution.
Cost analysis: Triton inference hours, NATS bandwidth, Postgres + S3 storage.
Model drift trend review; determine retraining trigger.
Signal volume per MNO; check for bias indicators.
Downstream integration health: firewall pick-up rate, sender-id-registry reputation deltas traced to fraud signals.

10. Phased Rollout

Phase	Duration	Behaviour	Exit criteria
P1 — Signals emitted, not enforced	14 d	`fraud.detected.*` published; downstream consumers log but don't act on them	FP projection < 2% per category
P2 — Enforcement: sender-id reputation only	7 d	Sender-ID registry honours fraud signals (reputation updates); firewall still observes	No unexpected auto-suspension cluster
P3 — Full Enforcement	Ongoing	Firewall + compliance engine honour fraud signals; NOC dashboards live; feedback loop active	Steady state

Rollback flags: FRAUD_SIGNAL_EMISSION_ENABLED, FRAUD_ML_ENABLED, FRAUD_FEEDBACK_API_ENABLED.

1. Code Readiness​

2. Testing Readiness​

3. Observability Readiness​

Alerts Configured​

4. Security Readiness​

5. Operational Readiness​

6. Documentation Readiness​

7. Compliance / Regulatory Readiness​

8. Go/No-Go Criteria Summary​

9. Post-Launch Review​

10. Phased Rollout​