Skip to main content

property-service — SERVICE_READINESS

Companion: SERVICE_OVERVIEW · TESTING_STRATEGY · SECURITY_MODEL · OBSERVABILITY · FAILURE_MODES · DEPLOYMENT_TOPOLOGY

This is the binding production readiness gate for property-service. The service may not receive production traffic until every box below is green. Items are owned per row; reviewers are platform SRE + security + the service team lead.

Format: ☐ open · ☑ green · ⚠ accepted risk (explicit waiver linked).


1. Domain & API

  • ☐ Bounded context boundaries reviewed and signed off by adjacent service teams (pricing-service, inventory-service, housekeeping-service, theme-config-service, search-aggregation-service).
  • ☐ All aggregate invariants enumerated in DOMAIN_MODEL §3 and unit-tested at ≥ 90 % coverage.
  • ☐ All public REST routes documented in API_CONTRACTS with example bodies including multi-language fields.
  • ☐ OpenAPI spec published to the platform schema registry; CI breaking-change diff = 0 unflagged breakages.
  • ☐ Pact provider verifications green for all consumers listed in TESTING_STRATEGY §4.1.
  • ☐ Idempotency-Key behavior verified end-to-end on every mutation route.
  • ☐ Optimistic concurrency (If-Match / version) verified on Property and Room.

2. Events

  • ☐ Every event in EVENT_SCHEMAS has an Avro/JSON schema in events/schemas/ and is registered.
  • ☐ AsyncAPI document published; asyncapi:check green on the latest commit.
  • ☐ Outbox pattern verified: aggregate write + outbox row in same transaction; rollback removes both.
  • ☐ Idempotent consumers verified for all four upstream events.
  • ☐ DLQ topics provisioned and dashboards exist.
  • ☐ Retention class (operational, regulated, audit) declared per topic.

3. Data

  • ☐ All migrations reviewed; every multi-tenant table carries tenant_id + RLS policy + isolation integration test.
  • ☐ Nightly tenant isolation auditor deployed; alert wired to security on-call.
  • ☐ PostGIS extension enabled, GIST index on properties.geo confirmed via EXPLAIN.
  • ☐ Photos: only MediaRef stored locally; bytes live in file-storage-service.
  • ☐ PII fields cataloged + redaction in logs verified.
  • ☐ Backups: Cloud SQL automated backups + 7-day PITR enabled.
  • ☐ Restore drill executed at least once in staging; runbook on file.

4. Security

  • ☐ JWT verification + JWKS rotation tested.
  • ☐ OPA policy bundle deployed and decisions logged with policy_decision_id.
  • ☐ ABAC predicates implemented for every permission in SECURITY_MODEL §2.2.
  • ☐ Service account scoped to least privilege per DEPLOYMENT_TOPOLOGY §4.
  • ☐ Secret Manager wiring tested; no secrets in env.
  • ☐ Cloud SQL CMEK + Memorystore CMEK applied.
  • ☐ TLS 1.3 enforced on all ingress and egress paths; mTLS within mesh.
  • ☐ Audit table append-only enforcement (DB rules) verified.
  • ☐ STRIDE review walked through with security; mitigations recorded.
  • ☐ Static analysis (semgrep, npm audit) clean for high severity.
  • ☐ Dependency SBOM generated and signed.

5. Observability

  • ☐ All required span attributes present (sampled trace inspected).
  • ☐ All required metrics emitted with documented tag cardinality.
  • ☐ Six dashboards (per OBSERVABILITY §5) created and accessible.
  • ☐ All alerts (per OBSERVABILITY §6) wired with runbook links.
  • ☐ Synthetic probes deployed for read, publish, sync, and photo pipelines.
  • ☐ On-call rotation populated in PagerDuty with documented hand-off.
  • ☐ Log redactions verified (no PII, no signed URLs in production logs).

6. Sync (Electron desktop)

  • ☐ Pull endpoint cursor stability verified across deploys.
  • ☐ Push endpoint conflict policies (server_authoritative, lww+diff) implemented per SYNC_CONTRACT.
  • ☐ Idempotency on push verified.
  • ☐ 24-hour offline replay test green.
  • ☐ Bandwidth budget for full catch-up (200-room property) ≤ documented threshold.
  • ☐ Status-transition state machine enforced at apply-time, not push-time.
  • ☐ Per-device backpressure (429 Retry-After) verified.

7. AI

  • ☐ All AI capabilities go through ai-orchestrator-service (no direct provider SDKs).
  • ☐ Suggestions persisted as suggested until HITL acceptance.
  • aiProvenance recorded on every applied AI value.
  • ☐ Moderation gating verified.
  • ☐ Edge/ONNX routing documented; opt-in flag per tenant honored.
  • ☐ Per-tenant quota surface returns clean 429 MELMASTOON.AI.QUOTA_EXHAUSTED.

8. Reliability & DR

  • ☐ DR drill executed (Cloud SQL failover; Pub/Sub regional outage simulation).
  • ☐ Outbox publisher load-tested (10k backlog drain in ≤ 5 min).
  • ☐ Memorystore-down fall-through verified (latency rises but no errors).
  • ☐ Bulk room create (100 rooms) atomic and SLO-respecting.
  • ☐ Cloud Run min instances set to 2; canary + auto-rollback rules in place.

9. Data Lifecycle

  • ☐ Soft-delete + hard-purge policy implemented for properties, rooms, room_types, photos.
  • ☐ GDPR erasure flow for photos containing guest likeness verified end-to-end.
  • ☐ Tenant deletion cascade (auto-unpublish + archive + photo removal) tested.
  • ☐ Audit retention configured per platform compliance baseline.

10. Documentation

  • ☐ All 17 service-bundle docs present and up to date.
  • ☐ Summary doc docs/03-microservices/property-service.md reflects current state.
  • ☐ Runbooks present for every alert in OBSERVABILITY §6.
  • ☐ ADRs filed for any deviation from platform standards (none expected at launch).

11. Cost & Capacity

  • ☐ Monthly cost projection within budget (Cloud Run + Cloud SQL + Pub/Sub + Memorystore + AI orchestrator attribution).
  • ☐ Per-tenant capacity baseline captured; alerts on per-tenant outliers (e.g., one tenant > 50 % of total room count) wired.

Sign-off

RoleNameDateStatus
Service tech lead
Platform SRE
Security
Compliance
Product

A waiver against any item must reference an ADR (docs/architecture/ADR-XXXX-*.md) and a Linear ticket with an owner and a target close date.


Re-readiness review is mandatory after any breaking event (region migration, schema rebuild, AI provider switch, multi-tenancy refactor) and at least annually.