property-service — SERVICE_READINESS
Companion: SERVICE_OVERVIEW · TESTING_STRATEGY · SECURITY_MODEL · OBSERVABILITY · FAILURE_MODES · DEPLOYMENT_TOPOLOGY
This is the binding production readiness gate for property-service. The service may not receive production traffic until every box below is green. Items are owned per row; reviewers are platform SRE + security + the service team lead.
Format: ☐ open · ☑ green · ⚠ accepted risk (explicit waiver linked).
1. Domain & API
- ☐ Bounded context boundaries reviewed and signed off by adjacent service teams (
pricing-service,inventory-service,housekeeping-service,theme-config-service,search-aggregation-service). - ☐ All aggregate invariants enumerated in DOMAIN_MODEL §3 and unit-tested at ≥ 90 % coverage.
- ☐ All public REST routes documented in API_CONTRACTS with example bodies including multi-language fields.
- ☐ OpenAPI spec published to the platform schema registry; CI breaking-change diff = 0 unflagged breakages.
- ☐ Pact provider verifications green for all consumers listed in TESTING_STRATEGY §4.1.
- ☐ Idempotency-Key behavior verified end-to-end on every mutation route.
- ☐ Optimistic concurrency (
If-Match/version) verified onPropertyandRoom.
2. Events
- ☐ Every event in EVENT_SCHEMAS has an Avro/JSON schema in
events/schemas/and is registered. - ☐ AsyncAPI document published;
asyncapi:checkgreen on the latest commit. - ☐ Outbox pattern verified: aggregate write + outbox row in same transaction; rollback removes both.
- ☐ Idempotent consumers verified for all four upstream events.
- ☐ DLQ topics provisioned and dashboards exist.
- ☐ Retention class (
operational,regulated,audit) declared per topic.
3. Data
- ☐ All migrations reviewed; every multi-tenant table carries
tenant_id+ RLS policy + isolation integration test. - ☐ Nightly tenant isolation auditor deployed; alert wired to security on-call.
- ☐ PostGIS extension enabled, GIST index on
properties.geoconfirmed viaEXPLAIN. - ☐ Photos: only
MediaRefstored locally; bytes live infile-storage-service. - ☐ PII fields cataloged + redaction in logs verified.
- ☐ Backups: Cloud SQL automated backups + 7-day PITR enabled.
- ☐ Restore drill executed at least once in staging; runbook on file.
4. Security
- ☐ JWT verification + JWKS rotation tested.
- ☐ OPA policy bundle deployed and decisions logged with
policy_decision_id. - ☐ ABAC predicates implemented for every permission in SECURITY_MODEL §2.2.
- ☐ Service account scoped to least privilege per DEPLOYMENT_TOPOLOGY §4.
- ☐ Secret Manager wiring tested; no secrets in env.
- ☐ Cloud SQL CMEK + Memorystore CMEK applied.
- ☐ TLS 1.3 enforced on all ingress and egress paths; mTLS within mesh.
- ☐ Audit table append-only enforcement (DB rules) verified.
- ☐ STRIDE review walked through with security; mitigations recorded.
- ☐ Static analysis (semgrep, npm audit) clean for high severity.
- ☐ Dependency SBOM generated and signed.
5. Observability
- ☐ All required span attributes present (sampled trace inspected).
- ☐ All required metrics emitted with documented tag cardinality.
- ☐ Six dashboards (per OBSERVABILITY §5) created and accessible.
- ☐ All alerts (per OBSERVABILITY §6) wired with runbook links.
- ☐ Synthetic probes deployed for read, publish, sync, and photo pipelines.
- ☐ On-call rotation populated in PagerDuty with documented hand-off.
- ☐ Log redactions verified (no PII, no signed URLs in production logs).
6. Sync (Electron desktop)
- ☐ Pull endpoint cursor stability verified across deploys.
- ☐ Push endpoint conflict policies (
server_authoritative,lww+diff) implemented per SYNC_CONTRACT. - ☐ Idempotency on push verified.
- ☐ 24-hour offline replay test green.
- ☐ Bandwidth budget for full catch-up (200-room property) ≤ documented threshold.
- ☐ Status-transition state machine enforced at apply-time, not push-time.
- ☐ Per-device backpressure (
429 Retry-After) verified.
7. AI
- ☐ All AI capabilities go through
ai-orchestrator-service(no direct provider SDKs). - ☐ Suggestions persisted as
suggesteduntil HITL acceptance. - ☐
aiProvenancerecorded on every applied AI value. - ☐ Moderation gating verified.
- ☐ Edge/ONNX routing documented; opt-in flag per tenant honored.
- ☐ Per-tenant quota surface returns clean
429 MELMASTOON.AI.QUOTA_EXHAUSTED.
8. Reliability & DR
- ☐ DR drill executed (Cloud SQL failover; Pub/Sub regional outage simulation).
- ☐ Outbox publisher load-tested (10k backlog drain in ≤ 5 min).
- ☐ Memorystore-down fall-through verified (latency rises but no errors).
- ☐ Bulk room create (100 rooms) atomic and SLO-respecting.
- ☐ Cloud Run min instances set to 2; canary + auto-rollback rules in place.
9. Data Lifecycle
- ☐ Soft-delete + hard-purge policy implemented for properties, rooms, room_types, photos.
- ☐ GDPR erasure flow for photos containing guest likeness verified end-to-end.
- ☐ Tenant deletion cascade (auto-unpublish + archive + photo removal) tested.
- ☐ Audit retention configured per platform compliance baseline.
10. Documentation
- ☐ All 17 service-bundle docs present and up to date.
- ☐ Summary doc
docs/03-microservices/property-service.mdreflects current state. - ☐ Runbooks present for every alert in OBSERVABILITY §6.
- ☐ ADRs filed for any deviation from platform standards (none expected at launch).
11. Cost & Capacity
- ☐ Monthly cost projection within budget (Cloud Run + Cloud SQL + Pub/Sub + Memorystore + AI orchestrator attribution).
- ☐ Per-tenant capacity baseline captured; alerts on per-tenant outliers (e.g., one tenant > 50 % of total room count) wired.
Sign-off
| Role | Name | Date | Status |
|---|---|---|---|
| Service tech lead | ☐ | ||
| Platform SRE | ☐ | ||
| Security | ☐ | ||
| Compliance | ☐ | ||
| Product | ☐ |
A waiver against any item must reference an ADR (docs/architecture/ADR-XXXX-*.md) and a Linear ticket with an owner and a target close date.
Re-readiness review is mandatory after any breaking event (region migration, schema rebuild, AI provider switch, multi-tenancy refactor) and at least annually.