Skip to main content

search-aggregation-service — SERVICE_READINESS

Companion: SERVICE_OVERVIEW · SECURITY_MODEL · OBSERVABILITY · DEPLOYMENT_TOPOLOGY · TESTING_STRATEGY · FAILURE_MODES · SERVICE_RISK_REGISTER · ../../docs/standards/DEFINITION_OF_DONE.md

The readiness checklist is gated by the platform readiness review board (one platform owner + one security reviewer + one SRE). All ✓ items must be ticked and evidence linked before a region launch.


1. Functional readiness

#CheckOwnerStatus
F-1All 17 service-bundle docs published and currentservice owner
F-2Domain layer complete: HotelIndexEntry, RateSnapshot, AvailabilityHint, BoostRule, IndexBuild, with all invariants enforcedservice owner
F-3All commanded use cases implemented (per APPLICATION_LOGIC § Commands)service owner
F-4All consumed events have a registered handler (per EVENT_SCHEMAS § Consumed events)service owner
F-5All published events validate against contracts/asyncapi.yaml in CIservice owner
F-6Public REST API matches contracts/openapi.yaml; CI fails on diffservice owner
F-7Multi-language search verified for ps, fa, tg, ar, ur, en, ru (per region)search domain expert⏳ launch region only
F-8Geo search verified at boundaries: 0 km radius, 200 km cap, antimeridian (n/a Phase 1), bbox > 250 000 km² rejectedservice owner
F-9Currency conversion verified against pricing-service golden FX snapshotpricing domain expert
F-10Region pinning enforced; cross-region requests return only target-region results when strict=trueservice owner
F-11Click recording → popularity recompute closes the loop (24 h cycle)service owner
F-12Boost-rule lifecycle (draft → active → expired/cancelled) covered with operator UI smokeplatform UX⏳ Phase 3
F-13Index rebuild from BigQuery archive completes for the launch region within 4 hSRE✅ rehearsed in staging

2. Operational readiness

#CheckOwnerStatus
O-1Cloud Run service deployed in primary + secondary regionSRE
O-2Cloud SQL HA enabled + cross-region read replicaSRE
O-3OpenSearch cluster (Aiven) sized per DEPLOYMENT_TOPOLOGY § 1 per regionSRE
O-4Memorystore Redis HA in each regionSRE
O-5Pub/Sub topics + subscriptions provisioned via Terraform; DLQ + retry policy appliedSRE
O-6All dashboards live (Cloud Monitoring + Grafana mirror)SRE
O-7All SLO burn-rate alerts wired to PagerDuty (search-aggregation rotation)SRE
O-8All runbooks present in ops/runbooks/search-aggregation-service/service owner
O-9DR game day rehearsed within last 90 daysSRE⏳ first launch will count
O-10Synthetic checks running in EU + ASIASRE
O-11Cost guardrails configured (Cloud Run, Pub/Sub, OpenSearch storage) with monthly reviewplatform finance
O-12On-call rotation populated (primary + secondary, follow-the-sun)engineering manager
O-13/healthz and /readyz differentiated and used by Cloud Run probesservice owner
O-14Outbox publisher advisory lock prevents duplicate publishing across pods (verified via integration test)service owner
O-15Index swap rehearsed in staging twice without incidentservice owner
O-16Tenant cascade purge rehearsed for a synthetic deleted tenant in staging within SLOsecurity + SRE

3. Security & compliance readiness

#CheckOwnerStatus
S-1Field-level allow-list policy implemented and CI-enforced (L1 type, L2 projection, L3 schema, L4 audit)service owner + security
S-2tenant-isolation.spec.ts (inverted) greenservice owner
S-3outbox.spec.ts and inbox.spec.ts greenservice owner
S-4Cross-tenant exposure auditor scheduled nightlysecurity
S-5Cursor signing key rotation procedure documented + dry-runsecurity
S-6Cloud Armor WAF + per-IP rate limit configured per SECURITY_MODEL § 7security
S-7OPA policies for admin routes deployed and testedsecurity
S-8All secrets in Google Secret Manager with rotation schedulesecurity
S-9Container image signed (cosign) and Cloud Run admission policy enforces signaturesecurity
S-10SBOM published and CVE scan green (no critical/high)security
S-11Audit topics mirrored to audit-service BigQuery + GCS Object Locksecurity
S-12Pentest scope items listed in SECURITY_MODEL § 13 covered (annual, due before next launch)security⏳ scheduled
S-13Privacy review: search_queries retention + anonymization approvedDPO / legal
S-14Sanctions list ingestion path validated (via tenant-service suspension flow)compliance
S-15All AI calls go through ai-orchestrator-service; direct provider SDKs absent from dependency treesecurity

4. Performance readiness

#CheckOwnerStatus
P-1Search latency p95 < 250 ms at 1 000 RPS (k6 sustained 10 min)performance
P-2Hotel detail latency p95 < 200 ms at 500 RPSperformance
P-3Cache hit ratio ≥ 70 % under load profileperformance
P-4Projection freshness p95 < 30 s under 50 events/sec ingestionperformance
P-5Outbox publish lag p95 < 1 s steady stateperformance
P-6OpenSearch shard rejection rate < 0.1 % under peakperformance
P-7Postgres connection pool saturation < 70 % under peakperformance
P-8Memorystore eviction rate < 5 / min under peakperformance
P-9Performance baseline pinned in BigQuery; nightly regression check activeperformance

5. Documentation & onboarding

#CheckOwnerStatus
D-1This bundle present and reviewedtech writer + service owner
D-2OpenAPI doc rendered via melmastoon-docs-portal; consumer-visible endpoints documentedtech writer
D-3AsyncAPI doc rendered + linked from 04 Event Architecturetech writer
D-4LOCAL_DEV_SETUP works for a brand-new dev (verified by 2 engineers outside the team within 30 days)service owner
D-5Runbooks tested by an on-call engineer not on the teamSRE✅ for top 5
D-6Public-facing changelog page set upservice owner⏳ at GA

6. Definition of Ready (per launch region)

Before turning on a new region (e.g. IR after AF+TJ):

  • province_centers seeded for the region with verified geo + city slugs.
  • OpenSearch index melmastoon-search-v<n>-<region> created and joined to the writer alias.
  • At least 100 published properties projected from upstream services.
  • Multi-language acceptance test corpus updated for the region's primary languages.
  • Synthetic checks added for the region.
  • DR game day rerun including the new region's resources.
  • FX snapshot supports the region's currency pair.
  • Pentest delta scope reviewed.
  • Region appears in flags-service allow-list search.region_pinning.allowed_regions.

7. Definition of Done (per release)

Per DEFINITION_OF_DONE.md, every release must show:

  • All CI gates green (unit, integration, contract, perf smoke, lint, typecheck, security).
  • Coverage gate green (domain ≥ 95 %, application ≥ 90 %, overall ≥ 85 %).
  • Migration plan approved (if schema changes) + dry-run successful.
  • OpenAPI/AsyncAPI/OpenSearch template diffs reviewed and approved.
  • Allow-list audit passed in CI.
  • No new critical/high CVEs.
  • Image signed; admission policy passes.
  • Progressive rollout completed without SLO burn alarm.
  • Release notes appended to changelog.
  • Post-deploy smoke + synthetics green for 30 min.
  • On-call notified.

8. Sign-offs (template)

RoleNameDateNotes
Service owner
Platform owner
Security reviewer
SRE on-call lead
DPO / legal (privacy)
Compliancesanctions / data residency
Engineering manager

A single ❌ above blocks region launch.