Skip to main content

SERVICE_READINESS — reservation-service

Sibling: SERVICE_RISK_REGISTER · TESTING_STRATEGY · OBSERVABILITY

Strategic anchor: standards/SERVICE_TEMPLATE §Service readiness gate · standards/DEFINITION_OF_DONE

A service is ready for production only when every box below is green. Tech lead and SRE both sign off; the tenant-pilot launch is gated on this checklist plus a successful 30-minute, 5%-traffic canary in staging.


1. Documentation completeness

  • All 17 bundle files present and substantive (no stub headings).
  • SERVICE_OVERVIEW reviewed by domain product owner.
  • DOMAIN_MODEL, API_CONTRACTS, EVENT_SCHEMAS reviewed by every consuming service team (inventory, pricing, payment-gateway, lock-integration, notification, billing, housekeeping, analytics, audit, search-aggregation, sync, both BFFs).
  • SECURITY_MODEL reviewed by Security and signed off.
  • SYNC_CONTRACT reviewed by sync/desktop team and tested against sync-service.
  • AI_INTEGRATION reviewed by ai-orchestrator-service team; AIProvenance fields persisted on every AI-touched value.
  • FAILURE_MODES entries each have a runbook in runbooks/reservation/.
  • 03-microservices catalog summary up to date.

2. Code quality

  • ESLint domain layer import-restriction passes (no NestJS, no Drizzle, no I/O imports under src/domain/).
  • tsc --noEmit strict, zero errors.
  • No any, no // @ts-ignore, no // eslint-disable outside reviewed adapter shims.
  • Public API surface (controllers, ports, DTOs) has TSDoc comments.
  • Conventional commit history clean.

3. Test coverage and gates

  • Overall coverage ≥ 85% statements; ≥ 80% branches.
  • Domain layer coverage ≥ 95% statements; 100% on the state machine.
  • All unit tests in TESTING_STRATEGY §2 and §3 green.
  • Mandatory three integration tests pass on every PR: tenant-isolation.spec.ts, outbox.spec.ts, inbox.spec.ts.
  • Booking saga happy path + 8 compensation paths (C1–C8) all green.
  • Concurrency suite green (check-in race, cancel-vs-modify, group partial cancel, hold-expiry race).
  • FX snapshot stability test green (including IRR magnitude).
  • Cash-on-arrival flow test green.
  • Pact contracts published; consumer + provider sides verified by Pact broker.
  • OpenAPI snapshot diff: no breaking changes without major version bump.
  • Event schema registry conformance green for every produced subject.

4. API and event hygiene

  • All endpoints under /api/v1/reservations/*; URI versioned only.
  • All mutating endpoints accept Idempotency-Key and dedupe over 24 h.
  • All endpoints support If-Match: "v<n>" for OCC.
  • All endpoints emit Problem+JSON errors with canonical MELMASTOON.RESERVATION.* codes.
  • All produced subjects follow melmastoon.reservation.<aggregate>.<verb-past-tense>.v1.
  • Every consumed subject has an inbox handler with dedupe and a DLQ binding.
  • Every event envelope carries tenantId, traceparent, causationId where applicable.

5. Storage and migrations

  • All tables under reservation schema have tenant_id, an RLS policy <table>_tenant_isolation, and a leading tenant_id index.
  • No table has cross-tenant foreign keys.
  • Append-only audit (reservation_modifications) has no UPDATE/DELETE grants to the application role.
  • Outbox + inbox dedupe tables present.
  • Migrations are backwards-compatible; no destructive change in the same release as a writer removal.
  • Postgres connection middleware sets app.tenant_id per request and is covered by tenant-isolation.spec.ts.

6. Security

  • No payment processor or lock vendor SDKs imported (CI dependency-graph guard).
  • Field-level encryption applied to guests.email, guests.phone_e164 with per-tenant DEK.
  • Hash-for-search (email_hash, phone_e164_hash) populated on every write.
  • All secrets via Secret Manager + Workload Identity; no SA keys in deploy artifacts.
  • Security review (security-reviewer) signed off.
  • Threat model entries in SECURITY_MODEL §9 all have mitigations implemented.

7. Observability

  • OpenTelemetry initialized before NestFactory in main.ts (verified by smoke test).
  • tenant_id, trace_id, request_id present on every log record (verified by structured-log lint).
  • All SLIs in OBSERVABILITY §3 emit metrics with the documented tags.
  • Three dashboards (service health, booking funnel, operations) present and reviewed.
  • All alerts in OBSERVABILITY §6 configured with paged routes and named runbooks.
  • Synthetic checks live (POST /quotes, POST /holds → confirm, GET /internal/health).

8. Deployment

  • Cloud Run service deployed to me-central1 (primary) and asia-south1 (secondary if region-pinned tenants exist).
  • Hold-expiry worker deployed as separate Cloud Run service with single-replica pin and Cloud Scheduler trigger every 30 s.
  • Min 3 replicas on the API service; HPA configured.
  • VPC Service Controls perimeter membership confirmed.
  • Workload Identity mappings verified for both services.
  • Canary deploy: 5% / 30 min in staging completed without alert ladder firing.
  • Rollback rehearsed and recorded.

9. Desktop / sync

  • sync-service integration test exercises pull and push for reservation, reservation_item, guest, additional_guest, special_request, reservation_modification.
  • Conflict-policy table in SYNC_CONTRACT §2 implemented end-to-end and verified by an offline-then-reconnect drill.
  • Walk-in offline path tested (client-issued rsv_d_ ID → server canonical rsv_ ID mapping).

10. AI

  • Every AI capability documented in AI_INTEGRATION.
  • AIProvenance persisted on every AI-derived value.
  • HITL flow for auto_block anomaly verdicts implemented; audit-traceable.
  • Fallbacks (cloud failure → no-op or edge) demonstrated under chaos test.

11. Operational

  • On-call rotation assigned; PagerDuty escalations configured.
  • Tenant-pilot success criteria recorded; first-tenant rollback plan written.
  • Cost dashboards (Cloud Run, Cloud SQL share, Pub/Sub egress, KMS, AI orchestrator spend) include service=reservation filter.
  • Cost guardrails set (budget alerts at 50/75/100/110% of monthly target).
  • Backup / point-in-time restore tested for the reservation schema (RPO 5 min, RTO 30 min).

12. Sign-off

RoleNameDate
Tech lead (PMS core)__________________
SRE on-call lead__________________
Security reviewer__________________
Domain product owner__________________
AI orchestrator team__________________
Sync/desktop team__________________

A green checklist plus all six signatures unlocks prod-ready status. Without all of them, traffic stays at 0% in production.


13. Cross-references