Skip to main content

MIGRATION_PLAN — inventory-service

Sibling: DATA_MODEL · DEPLOYMENT_TOPOLOGY · SERVICE_READINESS · TESTING_STRATEGY

Strategic anchor: 02 §11 Data lifecycle · 04 §9 Schema evolution

inventory-service is on the booking critical path; every migration must hold it gently. The rule is expand → backfill → contract, with backwards-compatible intermediate states and zero-downtime cutover. Destructive single-PR migrations are rejected by CI.


1. Migration tooling

LayerToolNotes
Schemanode-pg-migrateSQL files under db/migrations/; one folder per release
BackfillsTypeScript jobs under src/infrastructure/migrations/run via Cloud Run job; chunked and resumable
Drizzle ORMcode-side schemaregenerated after each migration; CI compares Drizzle introspection against Postgres
Event registryJSON Schemas under events/versioned (vN); CI validates compatibility
OpenAPIYAML emitted from controllerssnapshot-diffed in CI

2. Mandatory phases

┌────────────┐ ┌─────────────┐ ┌─────────────┐
│ EXPAND │ ─► │ BACKFILL │ ─► │ CONTRACT │
│ (release N)│ │ (release N+ │ │ (release N+ │
│ │ │ 1) │ │ 2) │
└────────────┘ └─────────────┘ └─────────────┘

EXPAND (release N). Add new columns/tables/indexes/constraints in a way that old code still works. New code writes to both old and new shapes if needed.

BACKFILL (release N+1). Migrate historical data via chunked job. Verify reconciliation. Begin reading from the new shape behind a feature flag; fallback to the old shape on mismatch.

CONTRACT (release N+2). Drop the old shape only after the backfill is verified and the new shape has run for a full release cycle without alerts.

A migration that cannot be split this way is a red flag: justify in the PR description and obtain a tech-lead override.


3. Constraint additions

Adding CHECK or EXCLUDE constraints on hot tables (room_type_inventory_daily, room_allocations) follows:

  1. Add as NOT VALID first.
  2. Run a backfill validating each row; fix any pre-existing violations (manual or via repair script).
  3. ALTER TABLE … VALIDATE CONSTRAINT — only after green validation run on staging.
  4. CI enforces that no EXCLUDE is added against an unindexed range (must use btree_gist).

4. Partition operations

Partitioning by month for room_type_inventory_daily:

OperationWhenMechanism
Create new partitionDaily 02:30 UTC, ahead by 4 monthsinventory-calendar-jobs.rotate-partitions
Detach old partitionAfter 24 months retentionALTER TABLE … DETACH PARTITION then export to gs://melmastoon-cold-inventory/ and DROP TABLE
Default partitionAlways presentcatches writes that fall outside known partitions; alert if it ever receives a row

Partition migrations are idempotent and resumable; failure mid-batch leaves the database in a known state.


5. Data backfills

Backfill jobs follow this pattern:

const cursor = pickCursor("backfill_<name>"); // resumable
for (const chunk of paginate(query, { cursorColumn: "id", batchSize: 1_000 })) {
await pg.transaction(async (tx) => {
for (const row of chunk) await transformAndUpsert(tx, row);
await markCursor(tx, "backfill_<name>", chunk.last.id);
});
await sleep(jitter(50, 200)); // throttle
}
await reconcile("backfill_<name>");

Reconciliation step compares row counts and a sampled diff. The job emits inventory.migration.backfill.progress.v1 for observability.


6. Event versioning

Change kindAction
Additive (new optional field)Bump minor; old consumers ignore unknown fields
Field removed or semantics changedPublish vN+1 topic; dual-publish for one release; consumers migrate; retire vN after consumer attestations
Subject renamedNew topic + dual-publish; deprecate old after 1 release

Schema registry rejects breaking minor bumps. Every produced event carries eventVersion: 1 (incremented monotonically per vN).


7. Release-by-release migration ledger

Each release cuts a folder under db/migrations/<yyyymmdd-release-<n>>/ with:

  • up.sql, down.sql (down only for reversible expand-only changes).
  • README.md describing intent, expand/contract phase, rollback plan, and observability links.
  • verify.sql — a query that proves the migration is complete (e.g., SELECT count(*) FROM … WHERE new_column IS NULL should return 0 after backfill).

Example:

db/migrations/20260615-r17/
up.sql -- ALTER TABLE room_allocations ADD COLUMN guest_count INT;
README.md -- "Expand: add guest_count nullable; backfill from reservation snapshot in r18; contract NOT NULL in r19."
verify.sql -- SELECT count(*) FROM inventory.room_allocations WHERE guest_count IS NULL;

8. Cutover playbook

For any migration touching hot-path tables:

  1. Merge expand PR Monday; ship Tuesday.
  2. Run backfill job in staging; verify reconciliation.
  3. Run backfill in production during low-traffic window (typically 00:00–04:00 region-local).
  4. Monitor RESV-INV-001..014 for 24 h.
  5. Cut contract PR; ship next release.
  6. Update DATA_MODEL to reflect the final shape; remove transitional notes.

Rollback is gcloud run services update-traffic --to-revisions=<prev>=100 — schemas remain expanded (forward-compatible) so the previous code reads/writes correctly.


9. Cross-references