Skip to main content

SERVICE_RISK_REGISTER — analytics-service

Sibling: SERVICE_OVERVIEW · SECURITY_MODEL · FAILURE_MODES · SERVICE_READINESS

Active risks scored on Likelihood (L: 1–5) × Impact (I: 1–5). Score = L × I. Treatment classes: Accept · Mitigate · Transfer · Eliminate.


1. Top risks

R-ANL-001 — Cross-tenant data exposure via authorized view bypass

FieldValue
L × I2 × 5 = 10
OwnerSecurity squad lead
TriggerMisconfigured view, leaked tenant principal, or SQL injection bypassing param binding
DetectionTenant-isolation integration test; daily access-binding reconciliation; security audit logs
Mitigation (M)Three-layer isolation (JWT, UDF, authorized view); param-only saved queries; SQL lint forbids tenant_id overrides; reconciliation job vs iam-service
ResidualLow; periodic red-team exercise
ReviewQuarterly

R-ANL-002 — BigQuery cost runaway from abusive widgets

FieldValue
L × I4 × 4 = 16
OwnerFinance + SRE
TriggerAuthor publishes wide-filter widget; backfill triggered by misconfig
Detectionanalytics.budget.bytes_used_ratio gauge; cost anomaly alert
Mitigation (M)Per-query byte cap; per-tenant daily budget; auto-pause snapshot generators at 100 %; reservation autoscale ceiling; pre-flight dry-run on save
ResidualMedium during launches
ReviewMonthly

R-ANL-003 — Schema drift breaks dashboards & metrics

FieldValue
L × I3 × 4 = 12
OwnerAnalytics platform lead
TriggerProducer service ships v2 event without coexistence; curated table altered breaking publishers
DetectionDQ schema-drift check; CI schema-drift gate; consumer test failures
Mitigation (M)_schema_version pin; v1/v2 coexistence; event-contract CI; curated DDL via Terraform with two-phase rename
ResidualLow
ReviewQuarterly

R-ANL-004 — Critical metric staleness (occupancy / RevPAR)

FieldValue
L × I3 × 4 = 12
OwnerAnalytics squad lead
TriggerPub/Sub backlog, ETL failure, BigQuery slot exhaustion
DetectionCriticalMetricStale SLO alert (P1)
Mitigation (M)High-frequency cadence (5 min) for critical metrics; dedicated reservation; automatic rerun on transient failures; freshness DQ check
ResidualLow
ReviewMonthly

R-ANL-005 — Forecast writeback corrupts curated rows

FieldValue
L × I2 × 5 = 10
OwnerAI squad + analytics lead
TriggerOrchestrator ships malformed batch; tenant mismatch; model-version regression
DetectionPer-row tenant + schema validation; ForecastWritebackFail alert; downstream DQ checks
Mitigation (M)Strict validator; partial-batch error map; MELMASTOON.ANALYTICS.FORECAST_INVALID_* events; idempotent MERGE keys
ResidualLow
ReviewQuarterly

R-ANL-006 — Looker Studio embed token misuse

FieldValue
L × I2 × 4 = 8
OwnerSecurity + analytics
TriggerStolen token, forgotten revocation
DetectionPer-tenant token issuance metric; binding reconciliation
Mitigation (M)KMS-signed JWT, ≤ 60 min TTL; binding-revocation immediate; embed re-validates per page load; audit looker.token_issued
ResidualLow
ReviewQuarterly

R-ANL-007 — Pub/Sub sink lag during traffic spikes

FieldValue
L × I3 × 3 = 9
OwnerSRE
TriggerHigh-season events spike, BigQuery streaming throttling
Detectionoldest_unacked_message_age alert
Mitigation (M)Sink autoscale; spill-to-GCS when streaming inserts fail; chunked batch loader fallback; capacity load test
ResidualMedium
ReviewMonthly

R-ANL-008 — Composer/Workflows DAG silent skip

FieldValue
L × I2 × 3 = 6
OwnerAnalytics platform
TriggerDAG paused, IAM regression, region drift
DetectionETL run heartbeat metric; etl.failed.v1 absence; cron audit
Mitigation (M)Heartbeat events etl.started.v1 + etl.completed.v1; dashboard alert if no run within cadence
ResidualLow
ReviewQuarterly

R-ANL-009 — Saved query SQL injection

FieldValue
L × I2 × 5 = 10
OwnerSecurity
TriggerAuthor writes string concatenation; param interpolation overlooked
DetectionSave-time parser blocks; SQL lint CI
Mitigation (M)Allowlist-only datasets/tables; parameter binding only; integration test with malicious payloads
ResidualLow
ReviewQuarterly

R-ANL-010 — Right-to-erasure delay > SLA

FieldValue
L × I2 × 4 = 8
OwnerCompliance lead
TriggerCascade purge fails on a curated table; backlog of tenant.deleted.v1
DetectionPurge reconciliation report; tenant_purge.duration_seconds SLI
Mitigation (M)Idempotent purge; partition-aware DELETE/TRUNCATE; scheduled retry with alert; legal-hold override path documented
ResidualLow
ReviewQuarterly

R-ANL-011 — AI suggestion quality regresses

FieldValue
L × I3 × 2 = 6
OwnerAI squad
TriggerModel upgrade affects metric explanations or forecasts
DetectionPer-capability success metric; user feedback loop; eval harness
Mitigation (M)Capability-level off-switch; HITL on writes; canary % rollout per tenant; rollback to previous model version
ResidualMedium
ReviewMonthly

R-ANL-012 — Region/residency violation

FieldValue
L × I1 × 5 = 5
OwnerSecurity + platform
TriggerCross-region replication misconfig; misrouted Pub/Sub
DetectionRegion tag audits; VPC-SC perimeter denials
Mitigation (E)Per-region deployments enforced by Terraform; VPC-SC perimeter; release gate verifies dataset region
ResidualVery low
ReviewQuarterly

2. Risk treatment matrix

ScoreAction
≥ 16Executive review; weekly status update
10–15Squad-level mitigation plan with owner & due date
6–9Tracked; reviewed at retro
≤ 5Accepted; reviewed quarterly

3. Lifecycle & cadence

  • New risks added by anyone in the squad via PR.
  • Owner assigned within one sprint.
  • Quarterly review by service owner + security + finance + AI lead.
  • Risks closed only when residual is "accepted" or fully eliminated; closure recorded with date and rationale.

Cross-references: FAILURE_MODES, SECURITY_MODEL, SERVICE_READINESS.