fraud-intel-service — Service Risk Register
Version: 1.0 Status: Draft Owner: Trust and Safety + ML Ops + Security + SRE Last Updated: 2026-04-21 References: FAILURE_MODES.md, AI_INTEGRATION.md, SECURITY_MODEL.md
Known risks with owners, mitigations, and residual classification. Scored 1–5 Likelihood × Impact; residual must be ≤ Medium for GA.
1. Risk Summary
| ID | Risk | Category | Likelihood | Impact | Pre-mitigation | Residual | Owner |
|---|---|---|---|---|---|---|---|
| FR-RISK-01 | Model false positives flood firewall with block-triggering signals | Correctness / ML | 3 | 4 | High | Medium | T&S + ML Ops |
| FR-RISK-02 | Model drift silently degrades detection | ML quality | 4 | 4 | High | Medium | ML Ops |
| FR-RISK-03 | Feedback-loop poisoning by insider or compromised account | Security | 2 | 4 | Medium | Low | Security |
| FR-RISK-04 | ML model bias against a specific MNO or tenant | ML fairness | 3 | 4 | High | Medium | T&S + ML Ops |
| FR-RISK-05 | Training-data PII leak via model weights | Privacy | 1 | 5 | Medium | Low | Security + Legal |
| FR-RISK-06 | Adversarial evasion — attacker crafts signals to bypass detection | Adversarial | 3 | 3 | Medium | Medium | ML Ops |
| FR-RISK-07 | Fraud feed source (MISP) poisoning | Dependency | 2 | 3 | Medium | Low | Security |
| FR-RISK-08 | Signal storm biases training corpus | Adversarial | 2 | 3 | Medium | Low | ML Ops |
| FR-RISK-09 | Model-artifact tampering in registry | Security | 1 | 5 | Medium | Low | Security |
| FR-RISK-10 | Triton GPU scarcity under burst load | Ops | 2 | 3 | Medium | Low | SRE |
| FR-RISK-11 | GDPR / subject-access on signal history | Legal | 2 | 3 | Medium | Low | Legal |
| FR-RISK-12 | Fail-open posture during outage delays critical detection | Availability | 3 | 3 | Medium | Low | SRE |
| FR-RISK-13 | Downstream coupling: firewall over-reacts to a single model update | Integration | 2 | 4 | Medium | Medium | T&S + SRE |
| FR-RISK-14 | Cross-platform fraud-feed partner disputes published IOCs | Political | 2 | 2 | Low | Low | Regulator Liaison |
| FR-RISK-15 | Stale features cause short-window detection miss (e.g., AIT spike within a minute) | Latency | 3 | 3 | Medium | Medium | ML Ops |
2. Risk Details
FR-RISK-01 — Model false positives flood firewall
High-confidence false positive from model triggers firewall BLOCKs at scale.
Mitigation. Shadow-mode rollout for any new model (14 d); automatic rollback if FP rate > baseline + 50% for 10 min; per-tenant whitelist for design-partner banks; human-in-loop on highest-confidence tier; confidence-threshold tuning per category.
Residual. Medium.
FR-RISK-02 — Model drift
Attacker tactics evolve; model silently loses recall without obvious incident.
Mitigation. Weekly F1 vs. baseline drift monitoring; alert on > 5% drop; quarterly retraining + on-demand retraining on drift alert; A/B shadow of candidate models before switchover; model cards with limitations documented.
Residual. Medium.
FR-RISK-03 — Feedback-loop poisoning
Insider / compromised account labels fraudulent traffic as legitimate.
Mitigation. Feedback API role-restricted to T&S staff with auditable identity; feedback weight in training lower than automatic labelling; training pipeline rejects single-account / IP contributing > 5% of labels in a week; weekly human review of label-distribution trends.
Residual. Low.
FR-RISK-04 — ML fairness (MNO / tenant bias)
Model trained on corpus under-representing legitimate traffic from one MNO.
Mitigation. Balanced training corpus (per-MNO sampling floor); fairness audit in CI (fail if per-MNO disparate recall > 15%); post-launch per-MNO block-rate monitoring; model-card documentation.
Residual. Medium.
FR-RISK-05 — Training-data PII leakage via model weights
Extract-style attacks on model weights could reconstruct MSISDNs.
Mitigation. MSISDN hashed before any training data use; differential-privacy-style noise in feature engineering where practical; model weights not exposed outside trusted inference infrastructure (Triton inside mesh); model registry access is mTLS + role-restricted.
Residual. Low.
FR-RISK-06 — Adversarial evasion
Attacker crafts input variations that slip past the model.
Mitigation. Adversarial corpus (500+ crafted examples per category) tested in CI; quarterly red-team exercise adding new adversarial patterns; defence-in-depth with rule-based fallback that catches simpler patterns the ML might miss.
Residual. Medium — adversarial ML is an arms race.
FR-RISK-07 — MISP feed poisoning
External feed delivers malicious IOCs (e.g., legitimate IP ranges labelled as SIM-box).
Mitigation. Feed source on whitelist; feed-import applies rate-limit + anomaly detection; dual-source corroboration for high-impact entries; ability to roll back a feed import.
Residual. Low.
FR-RISK-08 — Signal storm
Adversarial traffic floods signals to bias training.
Mitigation. Per-source / per-tenant rate-limit on signal ingest; training outlier removal (> 3σ); weekly human review of high-volume sources; signal deduplication by content hash.
Residual. Low.
FR-RISK-09 — Model-artifact tampering
Compromised registry replaces legitimate artifact with a malicious one.
Mitigation. S3 object-lock + versioning on model bucket; checksum verified at upload and at deploy; model deploy requires dual-control; cross-region replicated bucket; quarterly backup-integrity drill.
Residual. Low.
FR-RISK-10 — Triton GPU scarcity
Burst fraud-detection load exceeds GPU capacity.
Mitigation. HPA based on inference-queue depth; CPU fallback for low-confidence tier (batched); capacity plan with 50% headroom; GPU fleet multi-region.
Residual. Low.
FR-RISK-11 — GDPR subject-access
Citizen asks what data fraud-intel holds about them.
Mitigation. Only MSISDN hash retained (pre-computed); response format drafted by Legal; 30-d SLA; deterministic re-hash allows response without retaining raw MSISDN.
Residual. Low.
FR-RISK-12 — Fail-open delays detection
Service outage delays fraud detection for up to outage duration.
Mitigation. Fail-open is intentional (service is informational); downstream consumers (firewall) have rule-based fallback that catches common patterns; re-scan on recovery replays backlog signals.
Residual. Low — accepted posture.
FR-RISK-13 — Downstream coupling
A new model update produces signal pattern shifts that firewall interprets as a surge of fraud, escalating BLOCK rate.
Mitigation. Shadow mode + gradual enablement; downstream consumers subscribe to fraud.model.deployed.v1 event and apply conservative thresholds for 24 h post-deploy; observation-mode dashboards track per-model signal emission vs. downstream action.
Residual. Medium.
FR-RISK-14 — Feed-partner dispute
Cross-platform partner disputes an IOC Ghasi shared.
Mitigation. Per-IOC provenance (who contributed, confidence, source signals); conflict-resolution via Regulator Liaison / partner liaison; removal workflow with audit.
Residual. Low.
FR-RISK-15 — Stale features
Fast-moving attack signatures (e.g., AIT spike in 60 s) aren't caught because features are batched hourly.
Mitigation. Real-time streaming feature computation for high-signal categories (AIT): features updated every 10 s via NATS stream processing; tunable latency/accuracy trade-off.
Residual. Medium — real-time features have cost / complexity implications.
3. Residual-Risk Summary
| Residual | Count | Acceptance |
|---|---|---|
| Low | 9 | Accepted for GA |
| Medium | 6 | Accepted with mitigation commitments and named owners |
| High | 0 | — |
4. Risk Review Cadence
- Weekly during development.
- Monthly post-GA (T&S + ML Ops + Security + SRE).
- Quarterly (model cards, fairness audit, regulator liaison).