SERVICE_RISK_REGISTER — reporting-service
Sibling: FAILURE_MODES · SERVICE_READINESS · MIGRATION_PLAN
Risks are scored on Likelihood (L) and Impact (I), each 1-5; Score = L × I. Owners and review cadence are explicit. The register is reviewed each sprint and after every P1.
1. Active risks
| ID | Risk | L | I | Score | Owner | Mitigation in flight | Trigger / Detection | Last review |
|---|---|---|---|---|---|---|---|---|
| R-REP-001 | Regulatory submission missed past statutory cutoff | 2 | 5 | 10 | Reporting TL | Cron monitor + P1 alert; manual escalation runbook; per-jurisdiction grace window pre-cutoff alert | next_attempt_at < cutoff query | 2026-04-15 |
| R-REP-002 | Cross-tenant data exposure in rendered PDF (filter bug) | 1 | 5 | 5 | Security guild | RLS + integration tests + property-based filter tests; renderer asserts every rendered row matches tenantId | Integration test + access-denied audit spike | 2026-04-15 |
| R-REP-003 | BigQuery slot exhaustion at month-end | 4 | 3 | 12 | Platform SRE | Reservation pricing for analytics; per-template rowCap; smoothing of schedules across the hour; opportunistic queue draining | reporting.upstream_analytics_timeout rate spike | 2026-04-12 |
| R-REP-004 | Renderer Chromium CVE forces emergency upgrade | 3 | 3 | 9 | Reporting TL | Pinned digest image rebuilt nightly; CVE feed subscription; canary on revision before promote | Security advisory + image scanner | 2026-04-08 |
| R-REP-005 | Template versioning conflict during scheduled run | 2 | 4 | 8 | Reporting TL | Snapshot template version at run-start; runs carry templateVersionId; never mutate published versions | Saga test + production audit | 2026-04-15 |
| R-REP-006 | Subscription email exfiltration to attacker domain | 2 | 4 | 8 | Security guild | Per-tenant allow-list of recipient domains; verification step on first email; admin approval for new domains | Recipient domain audit | 2026-04-08 |
| R-REP-007 | OOM during render of pathological dataset | 3 | 3 | 9 | Reporting TL | Streaming + pagination; per-template row caps; Puppeteer page-count cap; pod resource limits | OOM kill metric | 2026-04-15 |
| R-REP-008 | AI hallucination in callout misleads operator | 2 | 4 | 8 | AI guild + Reporting TL | "AI-generated, review before sharing" badge; HITL for drafted templates; callouts must cite fact ids physically present | User report + sample audit | 2026-04-08 |
| R-REP-009 | Cloud Scheduler regional outage misses fires | 2 | 4 | 8 | Platform SRE | Standby Scheduler in alternate region same residency; backfill on recovery; alert on schedule drift | reporting_schedule_drift_seconds p95 | 2026-04-12 |
| R-REP-010 | Object lock misconfiguration allows premature delete | 1 | 5 | 5 | Platform SRE | IaC enforces lock; service account lacks delete; weekly drift scan | IaC drift alert | 2026-04-12 |
| R-REP-011 | Adapter credentials leaked via logs | 1 | 5 | 5 | Security guild | Secret Manager only; logger redactor; gitleaks CI; periodic adapter test in dev | Log scan | 2026-04-08 |
| R-REP-012 | Tenant deletion cascade leaves orphaned regulatory artifacts | 2 | 3 | 6 | Compliance | Anonymize-not-delete for regulatory bucket; explicit hold flag; legal review | Synthetic tenant deletion | 2026-04-15 |
| R-REP-013 | Locale rendering bug breaks RTL layout (Pashto/Arabic) | 3 | 3 | 9 | Frontend + Reporting | Per-locale golden tests; bidi font fallback; visual diff on PR | Golden snapshot diff | 2026-04-08 |
| R-REP-014 | Outbox publisher lag during Pub/Sub incident | 3 | 3 | 9 | Platform SRE | Bounded backlog; pod-side metric; auto-scale up; replay tools | reporting_outbox_lag_seconds | 2026-04-12 |
| R-REP-015 | Long-running Cloud SQL transactions cause lock contention | 3 | 3 | 9 | Reporting TL | Use cases keep tx short; offload heavy reads to read replicas; query timeout | pg_locks watch | 2026-04-15 |
| R-REP-016 | Sync engine pulls stale subscription causing missed delivery | 2 | 3 | 6 | Desktop platform | WebSocket nudge on report.completed.v1; pull on app foreground; cursor signed | Synthetic delivery | 2026-04-08 |
| R-REP-017 | Operator schedules collide and exceed worker pool | 3 | 2 | 6 | Reporting TL | Token-bucket per tenant; cluster-wide soft cap; MELMASTOON.REPORTING.SCHEDULE_RATE_LIMITED graceful back-off | Saturation alert | 2026-04-12 |
| R-REP-018 | Migration drift between regions during canary | 2 | 4 | 8 | Platform SRE | Forward-only additive migrations; deploy gate runs migration first | Cloud Deploy step status | 2026-04-08 |
| R-REP-019 | Excessive AI cost in a tenant burns shared budget | 3 | 2 | 6 | AI guild | Per-tenant budget caps; orchestrator returns BUDGET_EXHAUSTED; alert tenant.owner | Cost counter | 2026-04-08 |
| R-REP-020 | Data residency violation via wrong regional bucket | 1 | 5 | 5 | Platform SRE | IaC pins bucket per region; runtime asserts bucket residency vs tenant | Drift check | 2026-04-12 |
2. Closed risks
| ID | Risk | Closed when | Closure note |
|---|---|---|---|
| R-REP-C001 | Single-region deployment did not meet residency obligations | 2026-03 | Multi-region per-residency stack shipped; per-region buckets and CMEK |
3. Risk treatment matrix
| Score | Treatment |
|---|---|
| 15-25 | Mitigate now: dedicated workstream, weekly executive review |
| 9-14 | Mitigate next: roadmap item, sprint owner, monthly review |
| 4-8 | Monitor: documented mitigations, quarterly review |
| 1-3 | Accept with light review |
4. Heat map
I=1 I=2 I=3 I=4 I=5
L=5 · · · · ·
L=4 · · R-003 · ·
L=3 · R-017 R-004 R-013 ·
· R-007 R-014 ·
· R-015 R-019 ·
L=2 · R-016 R-012 R-005 R-001
· R-018 R-009 ·
· R-006 ·
· R-008 ·
L=1 · · · · R-002
R-010
R-011
R-020
5. Review cadence & ownership
- Sprint review (every 2 weeks): Reporting TL chairs; revisits scores; closes resolved risks.
- Post-incident: any P1 triggers a same-day risk review; new risks added; existing risk likelihoods recalibrated.
- Quarterly compliance review: Compliance & legal sign off on R-REP-001, R-REP-006, R-REP-010, R-REP-012, R-REP-020.
Cross-references: FAILURE_MODES, SECURITY_MODEL, SERVICE_READINESS.