search-aggregation-service — MIGRATION_PLAN
Companion: DATA_MODEL · DEPLOYMENT_TOPOLOGY · SERVICE_READINESS · TESTING_STRATEGY
This document covers two scopes:
- Greenfield bootstrap — bringing the service into existence from zero (Phase 0 → Phase 1).
- Ongoing schema and contract evolution —
expand → backfill → contractdiscipline used for every subsequent change.
Part A — Greenfield bootstrap
A.1 Phase 0 — Infrastructure provisioning
Tracked in infra/terraform/services/search-aggregation/. Steps:
- Create GCP project bindings, SAs (
search-aggregation@,ci-search-aggregation@,index-builder@), VPC connector, and Secret Manager entries. - Provision Cloud SQL for PostgreSQL 15 (HA) + cross-region replica.
- Provision Memorystore Redis 7 (HA) per region.
- Provision Aiven OpenSearch (peered VPC) per region.
- Create Pub/Sub topics owned by this service (
melmastoon.search.*) and subscriptions to upstream topics (property,pricing,inventory,tenant). - Create BigQuery dataset bindings to the platform
events_raw.melmastoon_*dataset (read-only). - Cloud Build triggers, Artifact Registry repository, cosign key in KMS.
- Cloud Armor policy, External HTTPS LB target, Apigee proxy bound to
bff-consumer-service.
A.2 Phase 1 — Schema migrations 0001–0009
Run via node-pg-migrate in CI. Order:
| # | Migration | Notes |
|---|---|---|
| 0001 | init_schema.sql | extensions, schema, sentinel role |
| 0002 | hotel_index_entries.sql | table + indexes + RLS sentinel policy |
| 0003 | rate_snapshots.sql | partitioned by month (parent + first 12 months) |
| 0004 | availability_hints.sql | partitioned by month |
| 0005 | outbox_inbox.sql | outbox + inbox + indexes |
| 0006 | search_queries.sql | partitioned by day (parent + first 30 days) |
| 0007 | click_events.sql | partitioned by day |
| 0008 | province_centers.sql | + seed for AF/TJ/IR |
| 0009 | index_builds.sql | with the partial-unique guard for active builds per region |
A.3 Phase 1 — OpenSearch bootstrap
- PUT index template
melmastoon-search-template(fromcontracts/opensearch/hotel-index.template.json). - PUT ILM policy
melmastoon-search-ilm. - Create per-region indexes
melmastoon-search-v1-{af,tj,ir}. - Bind alias
melmastoon-search-currentto all three regional indexes.
A.4 Phase 1 — Backfill from upstream
Two backfill paths:
- Live — subscribers are running; new events flow naturally. New properties, prices, inventory all appear within seconds.
- Historical — for already-published properties (existing tenants), run the index rebuild flow (
StartIndexRebuildwithsinceTs = epoch). This consumes the BigQuery archive and writes to a fresh per-region index, then atomic alias swap.
Both paths use the same ProjectionAllowListPolicy; backfill cannot leak forbidden fields.
A.5 Phase 1 — Cutover
- Initially,
bff-consumer-servicecalls a stub/api/v1/search/queriesreturning a static "coming soon" payload (feature flagconsumer.meta_search.enabled = false). - Once readiness checklist (per SERVICE_READINESS.md) is green:
- Enable
search.region_pinning.allowed_regions = ["AF","TJ","IR"]. - Flip
consumer.meta_search.enabled = truefor 1 % traffic. - Monitor SLO burn for 24 h.
- Ramp to 100 %.
- Enable
A.6 Phase 2 (post-launch) — semantic re-rank
Schema changes (migration 0011):
ALTER TABLE search.hotel_index_entries
ADD COLUMN embedding vector(768),
ADD COLUMN embedding_model text,
ADD COLUMN embedding_provenance jsonb;
CREATE INDEX ix_hie_embedding_hnsw
ON search.hotel_index_entries
USING hnsw (embedding vector_cosine_ops)
WHERE embedding IS NOT NULL;
OpenSearch: PUT a new template version with the embedding knn_vector mapping; new indexes melmastoon-search-v2-<region>. Embed offline job (embed-properties) populates vectors; full reindex required because cosine similarity isn't comparable across model versions.
A.7 Phase 3 (post-launch) — sponsored slots
Migration 0012:
CREATE TABLE search.sponsored_rankings (
id text PRIMARY KEY CHECK (id ~ '^spr_[0-9A-HJKMNP-TV-Z]{26}$'),
tenant_id uuid NOT NULL,
property_id text NOT NULL,
campaign jsonb NOT NULL,
slot int NOT NULL CHECK (slot BETWEEN 1 AND 3),
active_window tstzrange NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX ix_spr_property ON search.sponsored_rankings (property_id);
EXCLUDE USING gist (slot WITH =, active_window WITH &&) WHERE (true);
API additions documented in API_CONTRACTS.md Phase 3 section.
Part B — Ongoing schema & contract evolution
The platform discipline is expand → backfill → contract, exactly as documented for the rest of the platform. No PR may ship a destructive change in a single release.
B.1 The three phases
| Phase | What it does | Old code state |
|---|---|---|
| Expand | Add new column / table / index / field as nullable or with default; new event topic version published in addition to the old one; new API field optional. | Keeps working unchanged. |
| Backfill | Backfill data into the new column; dual-publish events; consumer code paths begin reading the new shape preferentially with fallback to the old. | Keeps working; new code path active. |
| Contract | Drop old column / topic / field; remove fallback code. | Stopped — must have already migrated. |
Each phase is a separate release. Minimum spacing: 7 days, longer if cross-team coordination is needed.
B.2 Postgres column changes
| Change kind | Expand | Backfill | Contract |
|---|---|---|---|
| Add column | ALTER TABLE … ADD COLUMN x type NULL | Backfill via nightly-backfill-x job, batched 5 000 rows; verify zero NULLs | ALTER … SET NOT NULL (only after verification) |
| Rename column | ALTER TABLE … ADD COLUMN new … + dual-write trigger | Code migrates to read new; backfill from old | Drop old, drop trigger |
| Change type | New column + dual-write | Code migrates; backfill | Drop old |
| Drop column | (skip) | (skip) | ALTER … DROP COLUMN x after no code reads it for 7 d |
| Add index | CREATE INDEX CONCURRENTLY | none | none |
| Drop index | (skip) | (skip) | DROP INDEX CONCURRENTLY |
| Partition rotation | scheduled monthly via nightly-partition-prune | n/a | n/a |
Migrations are checked into services/search-aggregation-service/migrations/ and tagged expand-…, backfill-…, contract-… in the file name to make phase obvious.
B.3 OpenSearch index template changes
Index templates are immutable for an existing index. Therefore:
| Change | Procedure |
|---|---|
| Add a field | New template version + new per-region index melmastoon-search-v<n+1>-<region>; reindex via IndexBuild; alias swap; old index reaped after 48 h |
| Change analyzer | same as above |
| Remove a field | new template + reindex (template enforces dynamic: strict, so the old field stays in old indexes only) |
Mapping correction (e.g. keyword → text) | same |
Alias swap is the contract-phase step; old index sticks around for 48 h enabling instant rollback.
B.4 Pub/Sub event schema changes
- Additive (new optional field): bump
event.versionminor; consumers must tolerate unknown fields. - Breaking (rename, remove, type change): create new topic
…v2; producer dual-publishes for ≥ 30 days; consumers migrate; producer removes v1 topic in contract phase. - Deprecated topics announced in the platform
#deprecationschannel and tracked indocs/04-event-driven-architecture.mddeprecation table.
This service's published topics:
| Topic | Current major | Deprecation watch |
|---|---|---|
melmastoon.search.projection.v1 | v1 | none |
melmastoon.search.click.v1 | v1 | none |
melmastoon.search.query.v1 | v1 | none |
melmastoon.search.boost_rule.v1 | v1 | none |
melmastoon.search.index.v1 | v1 | none |
B.5 REST API changes
Per 05 API Design:
- Additive (new endpoint, new optional field) → minor version, no
/v<N>bump. - Breaking (remove field, change semantics) → publish
/api/v<N+1>/…and run both for ≥ 90 days (CI fails the contract phase if old version still has measurable callers). - Deprecation header:
Deprecation: true+Sunset: <date>on responses for 30 d before flipping the kill switch.
B.6 Migration release checklist
Every release that ships a migration must include:
- Phase tag (
expand,backfill,contract) in the PR title. - Dry-run the migration on a fresh ephemeral Postgres in CI.
- Compatibility matrix in the PR body: which previous releases this PR breaks (none, by definition, in expand or backfill).
- Updated
expected.schema.sqlsnapshot. - If event schema changed: AsyncAPI doc updated and producer dual-publish (if breaking).
- If REST changed: OpenAPI doc updated and
Deprecationheader set (if breaking). - Rollback plan documented (specifically: what to do if the new index/topic/field misbehaves).
- Approval from platform DBA for any contract-phase column drop or index drop.
B.7 Index rebuild as a migration tool
Many search-specific changes require a full reindex rather than (or in addition to) a SQL migration. Trigger: POST /api/v1/search/index:rebuild { regions:[…], sinceTs }. This is a routine procedure rehearsed in staging twice per quarter.
Examples that require rebuild:
- New analyzer (multilingual change).
- New
embeddingmodel version. - New facet field that requires aggregation pre-computation.
- Allow-list expansion that adds a field the existing index doesn't have.
- ILM policy change that affects shard layout.
The rebuild itself is non-disruptive: serving traffic continues on the current alias until the atomic swap.
B.8 Decommissioning (worst case)
If the service is ever decommissioned:
- Disable consumer feature flag (
consumer.meta_search.enabled = false);bff-consumer-servicestops calling. - Stop subscribers (drain remaining events to DLQ for archival).
- Snapshot Cloud SQL and OpenSearch.
- Export final BigQuery archive of search events.
- Delete topics after 30 d retention.
- Delete Cloud Run revisions, then Cloud SQL, then Memorystore, then OpenSearch, then secrets, then SAs.
- Archive code under
services/_decommissioned/search-aggregation-service/. - Cross-tenant access is revoked at the IAM level by removing all role grants for
search-aggregation@.
This decommission path is documented for completeness; the service is on the core track per 03-microservices/README.md and is not slated for retirement.