housekeeping-service — SERVICE_RISK_REGISTER
Living register of identified risks. Reviewed monthly by service owner + platform; quarterly with Security + Operations. Each risk has: ID, summary, likelihood, impact, current mitigation, residual risk, owner, target review date.
Likelihood / Impact: L (low) · M (medium) · H (high). Residual = post-mitigation.
R-HK-001 · Cross-tenant data leak via missing app.tenant_id set
| Likelihood | L |
| Impact | H |
| Current mitigation | RLS on every table; TenantContext middleware sets SET LOCAL app.tenant_id per UoW; tenant-isolation.spec.ts mandatory; integration suite asserts denies. DB user has no BYPASSRLS. |
| Residual | L |
| Owner | Service owner + Security |
| Next review | quarterly |
R-HK-002 · Outbox backlog during checkout storm (peak hour)
| Likelihood | M |
| Impact | M |
| Current mitigation | Cloud Run min=2; relay batches 100/tick; Pub/Sub publisher concurrency tuned; alert at 1k unpublished for 5 min; load test simulates 10 events/s sustained. |
| Residual | L |
| Owner | Service owner |
| Next review | after each peak season |
R-HK-003 · Sync conflict storm during shift change-over
| Likelihood | M |
| Impact | M |
| Current mitigation | Per-field conflict policies (server_authoritative for status, lww+diff for assignment, max-of for priority); audit row per conflict; renderer surfaces explicit toasts. |
| Residual | L |
| Owner | Desktop team |
| Next review | semi-annual |
R-HK-004 · AI routing applies bad assignment after staff calls in sick
| Likelihood | M |
| Impact | M |
| Current mitigation | HITL gate default supervisor_approval; auto-apply path requires "no manual edits since generatedAt"; application layer rejects suggestion rows with STAFF_UNAVAILABLE; per-suggestion audit. |
| Residual | L |
| Owner | Service owner + AI Orchestrator team |
| Next review | quarterly |
R-HK-005 · Lost-and-found PII (claimant phone) leaked via logs
| Likelihood | L |
| Impact | M |
| Current mitigation | Allowlist redaction; field never logged; Sentry breadcrumbs scrub; access to lost_and_found audited. |
| Residual | L |
| Owner | Security |
| Next review | quarterly DPIA |
R-HK-006 · Partition pruning degrades after a query rewrite
| Likelihood | M |
| Impact | M |
| Current mitigation | partition-pruning.spec.ts baseline-tests EXPLAIN plans; CI gate; slow-query alerts. |
| Residual | L |
| Owner | Service owner |
| Next review | per release |
R-HK-007 · Push subscription auth misconfiguration (OIDC)
| Likelihood | L |
| Impact | H |
| Current mitigation | OIDC verifier strict in prod; integration test (oidc-pubsub-push.spec.ts); IaC for subscription configuration; quarterly drill. |
| Residual | L |
| Owner | Platform |
| Next review | quarterly |
R-HK-008 · Long offline desktop produces stale workflow state on resync
| Likelihood | M |
| Impact | M |
| Current mitigation | Cursor expiration triggers full re-sync; renderer queues operations with backoff; conflict resolution surfaces unrecoverable mismatches as 3-way diff for human resolution. |
| Residual | M |
| Owner | Desktop team |
| Next review | semi-annual |
R-HK-009 · Dependency on staff-service shift events for assignment validity
| Likelihood | M |
| Impact | M |
| Current mitigation | Cached StaffShiftAssignment; live re-check on assignment via StaffShiftPort; on staff-service outage, board falls back to manual mode with warning banner. |
| Residual | L |
| Owner | Service owner |
| Next review | quarterly |
R-HK-010 · Linen low-watermark misconfiguration → spammy alerts
| Likelihood | M |
| Impact | L |
| Current mitigation | Debounce: at most 1 alert per (tenant, property, line) per 60 min; tenant settings UI prevents lowWatermark > onHand * 5. |
| Residual | L |
| Owner | Operations |
| Next review | semi-annual |
R-HK-011 · room.status_changed.v1 lost → front-desk arrivals board out of date
| Likelihood | L |
| Impact | M |
| Current mitigation | Outbox + at-least-once delivery; consumers idempotent; periodic reconciliation job between this DB and search-aggregation-service projection. |
| Residual | L |
| Owner | Service owner + Search team |
| Next review | quarterly |
R-HK-012 · DPIA overdue for new lost-and-found photo retention
| Likelihood | L |
| Impact | M |
| Current mitigation | Photos stored by media-service with 90-day warm + archive lifecycle; DPIA scheduled quarterly. |
| Residual | L |
| Owner | Security + Compliance |
| Next review | quarterly |
R-HK-013 · Cloud SQL HA failover skews now() clock vs application clocks
| Likelihood | L |
| Impact | L |
| Current mitigation | All durations computed from application ClockPort (ntp-synced); Postgres now() only used for created_at defaults. |
| Residual | L |
| Owner | Platform |
| Next review | annual |
R-HK-014 · Tenant misuse of manual room-status flip to bypass workflow
| Likelihood | M |
| Impact | M |
| Current mitigation | Manual flips require elevated role + reason; audit-flagged; analytics surfaces tenants with > 10% manual flips for ops outreach. |
| Residual | M |
| Owner | Operations |
| Next review | quarterly |
Closed risks
| ID | Resolution date | Notes |
|---|---|---|
| (none yet) |
Cross-link
- Active failure runbooks:
FAILURE_MODES.md. - Security controls:
SECURITY_MODEL.md. - Production-readiness checklist:
SERVICE_READINESS.md.