MIGRATION_PLAN — inventory-service
Sibling: DATA_MODEL · DEPLOYMENT_TOPOLOGY · SERVICE_READINESS · TESTING_STRATEGY
Strategic anchor: 02 §11 Data lifecycle · 04 §9 Schema evolution
inventory-service is on the booking critical path; every migration must hold it gently. The rule is expand → backfill → contract, with backwards-compatible intermediate states and zero-downtime cutover. Destructive single-PR migrations are rejected by CI.
1. Migration tooling
| Layer | Tool | Notes |
|---|---|---|
| Schema | node-pg-migrate | SQL files under db/migrations/; one folder per release |
| Backfills | TypeScript jobs under src/infrastructure/migrations/ | run via Cloud Run job; chunked and resumable |
| Drizzle ORM | code-side schema | regenerated after each migration; CI compares Drizzle introspection against Postgres |
| Event registry | JSON Schemas under events/ | versioned (vN); CI validates compatibility |
| OpenAPI | YAML emitted from controllers | snapshot-diffed in CI |
2. Mandatory phases
┌────────────┐ ┌─────────────┐ ┌─────────────┐
│ EXPAND │ ─► │ BACKFILL │ ─► │ CONTRACT │
│ (release N)│ │ (release N+ │ │ (release N+ │
│ │ │ 1) │ │ 2) │
└────────────┘ └─────────────┘ └─────────────┘
EXPAND (release N). Add new columns/tables/indexes/constraints in a way that old code still works. New code writes to both old and new shapes if needed.
BACKFILL (release N+1). Migrate historical data via chunked job. Verify reconciliation. Begin reading from the new shape behind a feature flag; fallback to the old shape on mismatch.
CONTRACT (release N+2). Drop the old shape only after the backfill is verified and the new shape has run for a full release cycle without alerts.
A migration that cannot be split this way is a red flag: justify in the PR description and obtain a tech-lead override.
3. Constraint additions
Adding CHECK or EXCLUDE constraints on hot tables (room_type_inventory_daily, room_allocations) follows:
- Add as
NOT VALIDfirst. - Run a backfill validating each row; fix any pre-existing violations (manual or via repair script).
ALTER TABLE … VALIDATE CONSTRAINT— only after green validation run on staging.- CI enforces that no
EXCLUDEis added against an unindexed range (must usebtree_gist).
4. Partition operations
Partitioning by month for room_type_inventory_daily:
| Operation | When | Mechanism |
|---|---|---|
| Create new partition | Daily 02:30 UTC, ahead by 4 months | inventory-calendar-jobs.rotate-partitions |
| Detach old partition | After 24 months retention | ALTER TABLE … DETACH PARTITION then export to gs://melmastoon-cold-inventory/ and DROP TABLE |
| Default partition | Always present | catches writes that fall outside known partitions; alert if it ever receives a row |
Partition migrations are idempotent and resumable; failure mid-batch leaves the database in a known state.
5. Data backfills
Backfill jobs follow this pattern:
const cursor = pickCursor("backfill_<name>"); // resumable
for (const chunk of paginate(query, { cursorColumn: "id", batchSize: 1_000 })) {
await pg.transaction(async (tx) => {
for (const row of chunk) await transformAndUpsert(tx, row);
await markCursor(tx, "backfill_<name>", chunk.last.id);
});
await sleep(jitter(50, 200)); // throttle
}
await reconcile("backfill_<name>");
Reconciliation step compares row counts and a sampled diff. The job emits inventory.migration.backfill.progress.v1 for observability.
6. Event versioning
| Change kind | Action |
|---|---|
| Additive (new optional field) | Bump minor; old consumers ignore unknown fields |
| Field removed or semantics changed | Publish vN+1 topic; dual-publish for one release; consumers migrate; retire vN after consumer attestations |
| Subject renamed | New topic + dual-publish; deprecate old after 1 release |
Schema registry rejects breaking minor bumps. Every produced event carries eventVersion: 1 (incremented monotonically per vN).
7. Release-by-release migration ledger
Each release cuts a folder under db/migrations/<yyyymmdd-release-<n>>/ with:
up.sql,down.sql(down only for reversible expand-only changes).README.mddescribing intent, expand/contract phase, rollback plan, and observability links.verify.sql— a query that proves the migration is complete (e.g.,SELECT count(*) FROM … WHERE new_column IS NULLshould return 0 after backfill).
Example:
db/migrations/20260615-r17/
up.sql -- ALTER TABLE room_allocations ADD COLUMN guest_count INT;
README.md -- "Expand: add guest_count nullable; backfill from reservation snapshot in r18; contract NOT NULL in r19."
verify.sql -- SELECT count(*) FROM inventory.room_allocations WHERE guest_count IS NULL;
8. Cutover playbook
For any migration touching hot-path tables:
- Merge expand PR Monday; ship Tuesday.
- Run backfill job in staging; verify reconciliation.
- Run backfill in production during low-traffic window (typically 00:00–04:00 region-local).
- Monitor RESV-INV-001..014 for 24 h.
- Cut contract PR; ship next release.
- Update DATA_MODEL to reflect the final shape; remove transitional notes.
Rollback is gcloud run services update-traffic --to-revisions=<prev>=100 — schemas remain expanded (forward-compatible) so the previous code reads/writes correctly.
9. Cross-references
- Final schema: DATA_MODEL
- Release process: DEPLOYMENT_TOPOLOGY §5
- Readiness gate: SERVICE_READINESS §2
- Reservation migration plan: reservation-service MIGRATION_PLAN