SERVICE_RISK_REGISTER — bff-backoffice-service
Sibling: SERVICE_READINESS · FAILURE_MODES · SECURITY_MODEL
Living register. Quarterly review by Frontend Platform tech lead + Desktop platform tech lead + SRE + security.
Severity: L low, M medium, H high, C critical.
Status: open, monitored, accepted, closed.
1. Strategic risks
| ID | Risk | L×I | Mitigation | Owner | Status | Review |
|---|---|---|---|---|---|---|
| R-S-1 | Desktop fleet drift across versions causes feature-flag matrix explosion | H×M | App-version floor; auto-update channel; release coordination calendar | FE Platform + Desktop | monitored | quarterly |
| R-S-2 | Hostile network (corporate proxy strips SSE) creates uneven UX | M×M | Polling fallback; per-device transport preference; tested in chaos drill | FE Platform | monitored | quarterly |
| R-S-3 | Tenant onboarding requires per-tenant Cloud Armor tuning | M×M | Operator-cohort baseline traffic profile; tenant-specific overrides | SRE | open | quarterly |
| R-S-4 | Sharia-compliance regression in AI suggestions surface | L×H | complianceProfile propagation tested; AI surfaces filtered at orchestrator; refusal contract enforced; quarterly review | Compliance + FE | monitored | quarterly |
| R-S-5 | Phase 2 mobile backoffice introduces split-brain risk | M×M | Mobile, if introduced, gets its own BFF or constrained subset; design review required | Architecture | open | annual |
2. Performance & reliability risks
| ID | Risk | L×I | Mitigation | Owner | Status | Review |
|---|---|---|---|---|---|---|
| R-P-1 | Dashboard composer p95 unbounded if a single upstream regresses | H×M | Per-widget deadlines + skeletons; per-upstream SLO tracked | FE Platform | monitored | quarterly |
| R-P-2 | Memorystore session-tier eviction loses sessions during scale event | L×H | noeviction policy; alarm; auto-scale to 6 GiB during peak | SRE | monitored | quarterly |
| R-P-3 | SSE active-connection storm on coordinated reconnect (e.g., post-incident) | M×H | Per-instance conn cap; pre-stop drain; gradual reconnect jitter on desktop side | FE Platform | monitored | quarterly |
| R-P-4 | Sync handshake bottleneck during fleet-wide reconnect | M×H | sync-service rate-limit + jitter; BFF cursor cache absorbs reads; chaos drill | SRE + Sync | monitored | quarterly |
| R-P-5 | Cloud SQL HA failover > 60 s during peak | L×M | DR drill verifies; idem keys absorb; trended monthly | SRE | accepted | annual |
3. Security risks
| ID | Risk | L×I | Mitigation | Owner | Status | Review |
|---|---|---|---|---|---|---|
| R-Sec-1 | Device key extraction from OS keychain (rooted device, malware) | L×C | OS keychain isolation; per-device revocation via iam-service; audit alarm on revocation; MFA on sensitive actions | Security + Desktop | monitored | quarterly |
| R-Sec-2 | DPoP replay bypass in BFF | L×C | Replay cache (Memorystore single-use jti); DPoP fuzz suite; pen test | Security + FE | monitored | quarterly |
| R-Sec-3 | MFA bypass on lock revocation | L×C | MFA attestation single-use; consume-then-call ordering; audit; alerts | Security + FE | monitored | quarterly |
| R-Sec-4 | Insider lock-revoke abuse | M×H | MFA gate; full audit; activity-ledger anomaly detector; quarterly review | Security + Compliance | monitored | quarterly |
| R-Sec-5 | Cross-tenant cache leak | L×C | Tenant-scoped keys; nightly synthetic probe; tenant-isolation suite | FE Platform | monitored | quarterly |
| R-Sec-6 | Insider folio adjustment fraud | M×M | MFA gate above threshold; audit; nightly reconciliation | Compliance + FE | monitored | quarterly |
| R-Sec-7 | Force-logout latency too high (revoked operator continues acting) | L×H | Refresh-time backstop; SSE channel monitored; e2e latency tested | Security + FE | monitored | quarterly |
| R-Sec-8 | Audit log gap (event published but ledger row missing) | L×H | Reconciliation job (lock_audit_completeness alert); halt lock proxy on gap | Security + SRE | monitored | quarterly |
4. Compliance & data risks
| ID | Risk | L×I | Mitigation | Owner | Status | Review |
|---|---|---|---|---|---|---|
| R-C-1 | DSR for operator returns inconsistent results | M×M | tenant-service orchestrates DSR; this BFF documented as ephemeral mirror | Data Steward | monitored | quarterly |
| R-C-2 | EU operator data residency (Memorystore in asia-south1) | M×M | Phase 2 region affinity; operator data short-lived; legal review confirmed | Legal | accepted | annual |
| R-C-3 | Audit log export to BigQuery delayed | L×M | Daily export; nightly reconciliation; SRE alert on lag | SRE | monitored | quarterly |
| R-C-4 | Notes carry inadvertent PII | M×M | Truncated to 200 chars in cold mirror; periodic synthetic PII probe | Data Steward + FE | monitored | quarterly |
5. Operational risks
| ID | Risk | L×I | Mitigation | Owner | Status | Review |
|---|---|---|---|---|---|---|
| R-O-1 | Bus-factor 1 on lock-action audit code path | M×H | Pair-on-call; runbook completeness review | Eng Manager | monitored | annual |
| R-O-2 | Schema drift from upstream service released without contract test | L×H | Pact verification gate; OpenAPI diff gate; SCHEMA_DRIFT alert | Platform Eng | monitored | quarterly |
| R-O-3 | Force-logout drill rare; latency regression undetected | M×M | Quarterly chaos drill; SLO tracked monthly | SRE | monitored | quarterly |
| R-O-4 | Auto-update server outage prevents desktop fleet from upgrading | M×M | Independent monitoring; secondary mirror; runbook | Desktop + SRE | monitored | quarterly |
| R-O-5 | App-version-floor change breaks legitimate clients | L×H | Two-step release; advisory before enforcement; rollback runbook; coordination calendar | FE + Desktop | monitored | quarterly |
6. Cost risks
| ID | Risk | L×I | Mitigation | Owner | Status | Review |
|---|---|---|---|---|---|---|
| R-Cost-1 | SSE long-running connections drive instance count | M×M | Per-device 1-conn cap; idle timeout; cost dashboard | SRE | monitored | quarterly |
| R-Cost-2 | Pub/Sub volume from telemetry exceeds budget | M×M | Sample rates per event; cost alarm | SRE | monitored | quarterly |
| R-Cost-3 | Activity ledger storage growth | L×M | 90 d retention + BigQuery export; daily archival | SRE | monitored | quarterly |
7. Risk acceptance log
| ID | Date accepted | Accepted by | Reason | Re-evaluation |
|---|---|---|---|---|
| R-P-5 | 2026-04-15 | SRE | HA failover within SLA on every drill in last 12 months | 2027-04-15 |
| R-C-2 | 2026-04-15 | Legal | Operator data short-lived; Phase 2 will add region affinity | 2027-04-15 |
8. Review cadence
- Quarterly: Frontend Platform + Desktop platform + SRE + security.
- Per major release: any touched row re-rated.
- Per incident: post-mortem owners audit register.