property-service — SERVICE_RISK_REGISTER
Companion: SERVICE_READINESS · SECURITY_MODEL · FAILURE_MODES · SYNC_CONTRACT · AI_INTEGRATION
This is the binding risk register for property-service. Each risk has an owner, an exposure scoring (likelihood × impact), the current mitigations in place, and the residual exposure after mitigation. Any risk graded High residual must reference a Linear ticket and an ADR if mitigation requires architectural change.
Likelihood: L (low ≤ 5%/qtr), M (med 5–25%), H (high > 25%). Impact: tenant / cross-tenant / consumer / regulatory.
1. Register
| # | Risk | L | Impact | Mitigation | Residual | Owner |
|---|---|---|---|---|---|---|
| R-01 | Cross-tenant data exposure via missing or regressed RLS policy | L | Cross-tenant + regulatory | 4-layer isolation (domain, app, DB RLS FORCEd, outbox guard); nightly audit job; integration tests per table | Low | Service lead + Security |
| R-02 | Property published with stale or incorrect amenities (regional sensitivity, e.g., missing halal_kitchen) | M | Tenant reputation | HITL accept on AI suggestions; canonical amenity registry; UX shows regional pack hints; operator QA checklist before publish | Medium | Product |
| R-03 | Multi-language field rendering bug (mixed RTL/LTR) | M | Tenant + consumer | First-class i18n in DTOs; bidi fixtures in tests; native-script fields stored separately; visual regression via Playwright on BFF | Medium | Frontend platform |
| R-04 | Geocoding inaccuracy → wrong pin on consumer meta map | M | Consumer | Confidence threshold from geo-service; AI fallback gated on operator opt-in + visual confirmation; map preview before publish | Medium | Service lead |
| R-05 | Photo upload of inappropriate content | L | Consumer + regulatory | Scanner via file-storage-service; photos start uploaded and gate on clean verdict; audit + operator-attributed event | Low | Service lead |
| R-06 | AI generates culturally inappropriate description (regional moderation fail) | L | Tenant + consumer + regulatory | Strict moderation in orchestrator with regional rule pack (Pashto/Dari/Persian/Tajik); HITL accept always required; staged-only persistence | Low | AI platform |
| R-07 | Outbox publisher stall causes prolonged downstream drift | L | Tenant + consumer | Lag SLO + alert; backpressure-aware publisher; load-tested 10k backlog drain | Low | SRE |
| R-08 | Pub/Sub regional outage exceeds outbox retention | L | Tenant + consumer | 7-day retention on DLQ; outbox table sized for ≥ 24 h backlog; documented manual replay path; cross-region fallback runbook | Low | SRE |
| R-09 | Sync conflict storm after a buggy desktop release | M | Tenant (operator UX) | lww+diff is bounded by state-machine validation; per-device throttling; conflict-rate alert; rollback playbook for desktop binaries | Medium | Desktop + Service lead |
| R-10 | Sync cursor regression after a service deploy | L | Tenant (offline data) | Canary + auto-rollback; cursor monotonicity test in CI; documented full-reset fallback for clients | Low | SRE |
| R-11 | Long-tail AI quota burn from a single tenant (cost spike) | M | Internal cost | Per-capability daily caps; per-tenant monthly quota; quota dashboard; alert on quota exhaustion spikes | Low | Finance + Service lead |
| R-12 | RoomType / Room status semantic drift between this service, housekeeping-service, and inventory-service | M | Operator confusion + booking errors | Bounded-context contracts published; cross-service E2E tests; periodic invariant review across teams | Medium | Architecture |
| R-13 | Lock device binding mismatch (room ↔ device id) | L | Operator + safety | Single-binding constraint; conflict event rejected; lock-integration owns device truth | Low | Lock-integration team |
| R-14 | GDPR erasure incompleteness for guest-likeness photos | L | Regulatory | containsGuestLikeness=true flagging at upload; subject-erasure consumer archives flagged photos; DPIA on file | Medium | Compliance |
| R-15 | Property archive while hidden reservations remain (consistency) | L | Tenant + guest | Archive precondition checks reservation-service port; archive blocked otherwise; alert if reservation-service port times out | Low | Service lead |
| R-16 | Hot tenant skew (one tenant dominates writes/storage) | M | Internal cost + neighbor noise | Per-tenant cost panel; shared-tenant tier vs dedicated tier (hybrid model); per-tenant rate limits | Medium | Platform |
| R-17 | Schema migration failure in production | L | Tenant (write availability) | Expand → backfill → contract; Cloud Run Job runs migrations pre-traffic; auto-rollback on failure; paired down.sql reviewed at PR | Low | SRE |
| R-18 | Memorystore outage degrades read SLO | L | Tenant + consumer | Cache-miss fall-through; Cloud SQL sized for 3× read load; alert wired | Low | SRE |
| R-19 | OPA bundle stale → policy decisions diverge | L | Operator (denied valid actions) | 30-min freshness check + alert; service holds last-good bundle 1 h; coordinated bundle releases | Low | IAM team |
| R-20 | Vendor lock-in to PostGIS for geo capability | L | Strategic | Geo abstracted behind a port; bbox/nearby tested via the abstraction; alternate provider feasible | Low | Architecture |
| R-21 | AI orchestrator response payload drift | L | Operator (specific capability fails) | Schema validation per capability; alert on schema violations; capability disable flag per tenant | Low | AI platform |
| R-22 | Outbox table growth unbounded if publisher disabled | L | Internal (storage cost) | Publisher health alert + automatic ticket; retention policy after publish (rows older than 30 days archived) | Low | SRE |
| R-23 | Unbounded photo count per property → DOS via storage | L | Internal cost | Per-property photo cap (200) enforced at API; per-tenant total cap; quota events surfaced to billing | Low | Service lead |
| R-24 | Misuse of bulk room create (e.g., 10k rooms) | L | Internal | Hard cap of 200 rooms per request; per-tenant rate limit on bulk; transactional all-or-nothing | Low | Service lead |
| R-25 | Improper handling of tenant.deleted.v1 (loss of data before legal hold) | L | Regulatory | Cascade soft-delete; hard purge deferred until retention window expires; audit row preserved indefinitely | Low | Compliance |
2. Risk Themes
2.1 Multi-tenancy
R-01, R-16, R-25. Mitigated by a layered isolation model and a budget for tenant-tier upgrades when noise dominates.
2.2 Domain accuracy in regional markets
R-02, R-03, R-06. Hotel domain quality is the customer-perceived moat; we accept Medium residual on UX-quality risks because the cost of fully removing them (e.g., per-region human review on every publish) outweighs the benefit.
2.3 Consistency with adjacent services
R-12, R-15, R-13. Mitigated by event contracts and integration tests; periodic cross-team contract reviews are mandatory.
2.4 AI assist as additive surface
R-06, R-11, R-21. The AI surface is intentionally non-essential; failures degrade UX, never block publish.
2.5 Sync robustness
R-09, R-10. Offline desktop is a first-class surface; sync risks are owned jointly by the service and desktop teams with a documented rollback playbook for desktop binaries.
3. Review Cadence
- Quarterly review by service lead + SRE + security; updates land as PRs to this file.
- Post-incident review automatically appends or updates a row whenever an incident maps to a register entry; if an incident has no row, a new one is opened.
- Annual full audit with an external compliance reviewer.
4. Change Log
- 2026-04-22 — Initial register published alongside the v1 service bundle.
Open risks of grade High residual must list mitigation owner, target date, and the linked ADR. The register is the single source of truth; do not maintain a parallel risk list elsewhere.