Skip to main content

search-aggregation-service — MIGRATION_PLAN

Companion: DATA_MODEL · DEPLOYMENT_TOPOLOGY · SERVICE_READINESS · TESTING_STRATEGY

This document covers two scopes:

  1. Greenfield bootstrap — bringing the service into existence from zero (Phase 0 → Phase 1).
  2. Ongoing schema and contract evolutionexpand → backfill → contract discipline used for every subsequent change.

Part A — Greenfield bootstrap

A.1 Phase 0 — Infrastructure provisioning

Tracked in infra/terraform/services/search-aggregation/. Steps:

  1. Create GCP project bindings, SAs (search-aggregation@, ci-search-aggregation@, index-builder@), VPC connector, and Secret Manager entries.
  2. Provision Cloud SQL for PostgreSQL 15 (HA) + cross-region replica.
  3. Provision Memorystore Redis 7 (HA) per region.
  4. Provision Aiven OpenSearch (peered VPC) per region.
  5. Create Pub/Sub topics owned by this service (melmastoon.search.*) and subscriptions to upstream topics (property, pricing, inventory, tenant).
  6. Create BigQuery dataset bindings to the platform events_raw.melmastoon_* dataset (read-only).
  7. Cloud Build triggers, Artifact Registry repository, cosign key in KMS.
  8. Cloud Armor policy, External HTTPS LB target, Apigee proxy bound to bff-consumer-service.

A.2 Phase 1 — Schema migrations 0001–0009

Run via node-pg-migrate in CI. Order:

#MigrationNotes
0001init_schema.sqlextensions, schema, sentinel role
0002hotel_index_entries.sqltable + indexes + RLS sentinel policy
0003rate_snapshots.sqlpartitioned by month (parent + first 12 months)
0004availability_hints.sqlpartitioned by month
0005outbox_inbox.sqloutbox + inbox + indexes
0006search_queries.sqlpartitioned by day (parent + first 30 days)
0007click_events.sqlpartitioned by day
0008province_centers.sql+ seed for AF/TJ/IR
0009index_builds.sqlwith the partial-unique guard for active builds per region

A.3 Phase 1 — OpenSearch bootstrap

  1. PUT index template melmastoon-search-template (from contracts/opensearch/hotel-index.template.json).
  2. PUT ILM policy melmastoon-search-ilm.
  3. Create per-region indexes melmastoon-search-v1-{af,tj,ir}.
  4. Bind alias melmastoon-search-current to all three regional indexes.

A.4 Phase 1 — Backfill from upstream

Two backfill paths:

  • Live — subscribers are running; new events flow naturally. New properties, prices, inventory all appear within seconds.
  • Historical — for already-published properties (existing tenants), run the index rebuild flow (StartIndexRebuild with sinceTs = epoch). This consumes the BigQuery archive and writes to a fresh per-region index, then atomic alias swap.

Both paths use the same ProjectionAllowListPolicy; backfill cannot leak forbidden fields.

A.5 Phase 1 — Cutover

  • Initially, bff-consumer-service calls a stub /api/v1/search/queries returning a static "coming soon" payload (feature flag consumer.meta_search.enabled = false).
  • Once readiness checklist (per SERVICE_READINESS.md) is green:
    1. Enable search.region_pinning.allowed_regions = ["AF","TJ","IR"].
    2. Flip consumer.meta_search.enabled = true for 1 % traffic.
    3. Monitor SLO burn for 24 h.
    4. Ramp to 100 %.

A.6 Phase 2 (post-launch) — semantic re-rank

Schema changes (migration 0011):

ALTER TABLE search.hotel_index_entries
ADD COLUMN embedding vector(768),
ADD COLUMN embedding_model text,
ADD COLUMN embedding_provenance jsonb;

CREATE INDEX ix_hie_embedding_hnsw
ON search.hotel_index_entries
USING hnsw (embedding vector_cosine_ops)
WHERE embedding IS NOT NULL;

OpenSearch: PUT a new template version with the embedding knn_vector mapping; new indexes melmastoon-search-v2-<region>. Embed offline job (embed-properties) populates vectors; full reindex required because cosine similarity isn't comparable across model versions.

A.7 Phase 3 (post-launch) — sponsored slots

Migration 0012:

CREATE TABLE search.sponsored_rankings (
id text PRIMARY KEY CHECK (id ~ '^spr_[0-9A-HJKMNP-TV-Z]{26}$'),
tenant_id uuid NOT NULL,
property_id text NOT NULL,
campaign jsonb NOT NULL,
slot int NOT NULL CHECK (slot BETWEEN 1 AND 3),
active_window tstzrange NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX ix_spr_property ON search.sponsored_rankings (property_id);
EXCLUDE USING gist (slot WITH =, active_window WITH &&) WHERE (true);

API additions documented in API_CONTRACTS.md Phase 3 section.


Part B — Ongoing schema & contract evolution

The platform discipline is expand → backfill → contract, exactly as documented for the rest of the platform. No PR may ship a destructive change in a single release.

B.1 The three phases

PhaseWhat it doesOld code state
ExpandAdd new column / table / index / field as nullable or with default; new event topic version published in addition to the old one; new API field optional.Keeps working unchanged.
BackfillBackfill data into the new column; dual-publish events; consumer code paths begin reading the new shape preferentially with fallback to the old.Keeps working; new code path active.
ContractDrop old column / topic / field; remove fallback code.Stopped — must have already migrated.

Each phase is a separate release. Minimum spacing: 7 days, longer if cross-team coordination is needed.

B.2 Postgres column changes

Change kindExpandBackfillContract
Add columnALTER TABLE … ADD COLUMN x type NULLBackfill via nightly-backfill-x job, batched 5 000 rows; verify zero NULLsALTER … SET NOT NULL (only after verification)
Rename columnALTER TABLE … ADD COLUMN new … + dual-write triggerCode migrates to read new; backfill from oldDrop old, drop trigger
Change typeNew column + dual-writeCode migrates; backfillDrop old
Drop column(skip)(skip)ALTER … DROP COLUMN x after no code reads it for 7 d
Add indexCREATE INDEX CONCURRENTLYnonenone
Drop index(skip)(skip)DROP INDEX CONCURRENTLY
Partition rotationscheduled monthly via nightly-partition-prunen/an/a

Migrations are checked into services/search-aggregation-service/migrations/ and tagged expand-…, backfill-…, contract-… in the file name to make phase obvious.

B.3 OpenSearch index template changes

Index templates are immutable for an existing index. Therefore:

ChangeProcedure
Add a fieldNew template version + new per-region index melmastoon-search-v<n+1>-<region>; reindex via IndexBuild; alias swap; old index reaped after 48 h
Change analyzersame as above
Remove a fieldnew template + reindex (template enforces dynamic: strict, so the old field stays in old indexes only)
Mapping correction (e.g. keywordtext)same

Alias swap is the contract-phase step; old index sticks around for 48 h enabling instant rollback.

B.4 Pub/Sub event schema changes

Per 04 Event Architecture:

  • Additive (new optional field): bump event.version minor; consumers must tolerate unknown fields.
  • Breaking (rename, remove, type change): create new topic …v2; producer dual-publishes for ≥ 30 days; consumers migrate; producer removes v1 topic in contract phase.
  • Deprecated topics announced in the platform #deprecations channel and tracked in docs/04-event-driven-architecture.md deprecation table.

This service's published topics:

TopicCurrent majorDeprecation watch
melmastoon.search.projection.v1v1none
melmastoon.search.click.v1v1none
melmastoon.search.query.v1v1none
melmastoon.search.boost_rule.v1v1none
melmastoon.search.index.v1v1none

B.5 REST API changes

Per 05 API Design:

  • Additive (new endpoint, new optional field) → minor version, no /v<N> bump.
  • Breaking (remove field, change semantics) → publish /api/v<N+1>/… and run both for ≥ 90 days (CI fails the contract phase if old version still has measurable callers).
  • Deprecation header: Deprecation: true + Sunset: <date> on responses for 30 d before flipping the kill switch.

B.6 Migration release checklist

Every release that ships a migration must include:

  • Phase tag (expand, backfill, contract) in the PR title.
  • Dry-run the migration on a fresh ephemeral Postgres in CI.
  • Compatibility matrix in the PR body: which previous releases this PR breaks (none, by definition, in expand or backfill).
  • Updated expected.schema.sql snapshot.
  • If event schema changed: AsyncAPI doc updated and producer dual-publish (if breaking).
  • If REST changed: OpenAPI doc updated and Deprecation header set (if breaking).
  • Rollback plan documented (specifically: what to do if the new index/topic/field misbehaves).
  • Approval from platform DBA for any contract-phase column drop or index drop.

B.7 Index rebuild as a migration tool

Many search-specific changes require a full reindex rather than (or in addition to) a SQL migration. Trigger: POST /api/v1/search/index:rebuild { regions:[…], sinceTs }. This is a routine procedure rehearsed in staging twice per quarter.

Examples that require rebuild:

  • New analyzer (multilingual change).
  • New embedding model version.
  • New facet field that requires aggregation pre-computation.
  • Allow-list expansion that adds a field the existing index doesn't have.
  • ILM policy change that affects shard layout.

The rebuild itself is non-disruptive: serving traffic continues on the current alias until the atomic swap.

B.8 Decommissioning (worst case)

If the service is ever decommissioned:

  1. Disable consumer feature flag (consumer.meta_search.enabled = false); bff-consumer-service stops calling.
  2. Stop subscribers (drain remaining events to DLQ for archival).
  3. Snapshot Cloud SQL and OpenSearch.
  4. Export final BigQuery archive of search events.
  5. Delete topics after 30 d retention.
  6. Delete Cloud Run revisions, then Cloud SQL, then Memorystore, then OpenSearch, then secrets, then SAs.
  7. Archive code under services/_decommissioned/search-aggregation-service/.
  8. Cross-tenant access is revoked at the IAM level by removing all role grants for search-aggregation@.

This decommission path is documented for completeness; the service is on the core track per 03-microservices/README.md and is not slated for retirement.