maintenance-service · MIGRATION_PLAN
Two flavours of migration: schema (DDL changes via
node-pg-migrate) and data / onboarding (importing legacy maintenance records when a new tenant joins from another PMS / spreadsheet).
1. Schema migration policy
1.1 Tooling
node-pg-migratefor all DDL changes; migrations live inservices/maintenance-service/db/migrations/.- Migration filename:
YYYYMMDDHHmm_<verb>_<subject>.sql.ts(e.g.,202604220900_add_relocation_required_to_work_orders.sql.ts). - One concern per migration. No mixed
ALTER+ data backfill in the same file (data backfills go through a worker; see §1.4).
1.2 Compatibility rules
| Change kind | Allowed in single step? | Notes |
|---|---|---|
| Add nullable column | ✅ | Always with default if needed |
| Add column with non-null default | ⚠️ | Requires DEFAULT to avoid table rewrite on Postgres < 11; we're on 16 so default add is metadata-only |
| Add CHECK constraint | ❌ in single step | Must be NOT VALID first, then VALIDATE CONSTRAINT after backfill |
| Add UNIQUE / regular INDEX | ✅ | Must use CONCURRENTLY |
| Drop column | ❌ | Expand-then-contract: stop reading → stop writing → drop in next release |
| Rename column | ❌ | Expand-then-contract: add new column → dual-write → backfill → switch reads → drop old |
| Change column type | ❌ | Expand-then-contract or via shadow column |
| Drop / change CHECK enum value | ❌ | Add new enum value → migrate writes → migrate reads → remove old in later release |
| Rename table | ❌ | Use a view aliasing the new name during the transition |
1.3 Pre-deploy gate
CI runs:
node-pg-migrate up --dry-runagainst staging.- The new migration applied to a shadow Postgres restored from a recent prod backup.
pnpm test:intruns against the migrated shadow.- If green: pipeline applies the migration to staging Cloud SQL via a Cloud Run Job, then deploys the new revision behind canary traffic.
- Production: same job runs before the new image is rolled out to traffic. A migration that fails halts the pipeline.
1.4 Long-running backfills
Data backfills are never done in a migration file — only schema changes are. Backfills are dedicated workers under services/maintenance-service/jobs/backfills/<id>/, each:
- emits a metric
mnt.backfill.<id>.processed_totaland_remaining, - is restart-safe (resumes from last
cursor), - can be paused via
tools/backfill-control.ts, - writes audit entries.
Once the backfill is complete and verified, a follow-up migration can tighten constraints (e.g., VALIDATE CONSTRAINT, SET NOT NULL).
1.5 Sample expand-contract: adding relocation_required
- Migration
202604220900_add_relocation_required_to_work_orders.sql.ts:ALTER TABLE maintenance.work_ordersADD COLUMN relocation_required boolean NOT NULL DEFAULT false; - Code release: writes set
relocation_requiredcorrectly; reads ignore it. - Code release: reads start using
relocation_required(e.g., for the relocation event). - Verify via metric and dashboards for ≥ 14 days.
- (No contract step needed; the column is here to stay.)
If we later want to remove it: stop reads → stop writes (1 release each) → drop column.
1.6 Rollback policy
- Every migration must be forward-compatible with the previous app version (so canary works).
- True rollback = revert app code; the schema can stay because of expand-contract.
- DOWN scripts exist for emergency only and are reviewed but rarely executed in prod.
2. Data migration: legacy import (tenant onboarding)
2.1 Supported sources
| Source | Format | Notes |
|---|---|---|
Excel .xlsx | one sheet per entity | most common; templates provided |
| CSV | per-entity files | for power users |
| PMS export | JSON | adapter per known PMS (Cloudbeds, Mews — Phase 2; Phase 1 uses the Excel template) |
| Manual UI | n/a | the BFF supports adding assets/vendors directly; not a "migration" path |
2.2 Required entities & order
Order matters because of FK dependencies:
Vendor(no FKs)Asset(FK to property — must already exist inproperty-service)Part(FK to property)PreventiveSchedule(FK to asset)WorkOrder(FK to property/room/asset/vendor) — usually only the open ones at cutover; closed history is optionalMaintenanceTask(FK to WO)PartUsage(FK to WO + part)
2.3 Excel template columns (Phase 1)
- Vendors sheet:
external_id(legacy),display_name,categories(semicolon-joined),contact_name,phone_e164,email,whatsapp_e164,channel_primary,channel_fallback,address_free_text. - Assets sheet:
external_id,property_external_id,room_external_id(optional),class,display_name,model,manufacturer,serial_number,installed_at,last_serviced_at,run_hours,capacity_value,capacity_unit. - Parts sheet:
external_id,property_external_id,part_number,display_name,category,on_hand,reorder_threshold,last_unit_cost_currency,last_unit_cost_amount. - PreventiveSchedules sheet:
external_id,property_external_id,asset_external_id(orasset_class),category,title,cadence_kind,every_days,every_hours,tz,sla_target_minutes,default_severity,default_assignee_kind,default_assignee_external_id,notify_channel,active. - WorkOrders sheet:
external_id,property_external_id,room_external_id,asset_external_id,category,severity,title,description,source,status,assignee_kind,assignee_external_id,created_at_iso. - PartUsages sheet:
external_id,wo_external_id,part_external_id,quantity,unit_cost_currency,unit_cost_amount,recorded_at,recorded_by_external_id.
2.4 Legacy status mapping
| Legacy term | Maps to |
|---|---|
new, reported, submitted | open |
acknowledged, dispatched, assigned | assigned |
working, in-progress, wip | in_progress |
on-hold, paused, awaiting parts | blocked |
done, closed, complete | resolved |
verified, qc-approved, signed-off | verified |
cancelled, void, dropped | cancelled |
| anything else | open + warning row |
Severity mapping is configurable per import (default low/medium/high/critical with synonyms).
2.5 Idempotency
- Every imported row carries a
legacy_external_idstored in a side tablemaintenance.legacy_id_map(tenant_id, entity, legacy_external_id, internal_id). - Re-running the importer with the same input produces zero new rows; only updates if the source has changed (hash compared).
- New rows in source produce new internal rows; deletions in source do not propagate (manual decommission required).
2.6 Dry-run vs commit
tools/import-maintenance.ts modes:
--dry-run(default): producesreport-<runId>.jsonwith counts per entity, validation errors per row, and projected effects (creates / updates / skips). No DB writes.--commit: applies changes inside one transaction per batch (default 500 rows). Failures roll back the batch and continue.--resume <runId>: continues from last successful batch.
2.7 Side effects suppressed during import
- Outbox events suppressed: import runs with
OUTBOX_MODE=suppress; nowork_order.created.v1storm. Instead, a singlemelmastoon.maintenance.tenant.imported.v1event is published at the end with summary counts. - Notifications disabled per import.
- AI capabilities disabled per import (no model bills for legacy data).
- SLA timers still computed but not actively scanned for breach until cutover flag is flipped.
2.8 Cutover
- Last sync of legacy → produce final Excel.
- Dry-run; review
report-<runId>.jsonwith the tenant's GM. - Commit run during a maintenance window.
- Set tenant flag
maintenance.cutoverComplete = trueintenant-service. - Outbox event
tenant.imported.v1triggersanalytics-service,search-aggregation-service, etc., to backfill projections. - Enable SLA scanner for the tenant.
- Flip the BFF to point at
maintenance-servicefor that tenant.
2.9 Reporting
After every commit run, an HTML report is uploaded to gs://melmastoon-import-reports/<tenantId>/<runId>/index.html containing:
- Per-sheet success / failure counts
- Validation errors with cell references
- A diff preview for updated rows
- Total time, batch sizes, total rows
- A signed-off acknowledgement section for the tenant GM
2.10 Rollback
Every commit run records a runId in maintenance.import_runs. Rollback (tools/import-rollback.ts --runId=<id>):
- Deletes all rows created with that
runId(FK cascade respected). - Reverts updates to prior values from the run's snapshot table.
- Does not restore deleted legacy_external_id mappings (those are kept for forward idempotency).
A rollback past cutover requires SRE + DBA approval; we do not silently undo a tenant who has been live.
3. Backfills for new columns / event versions
3.1 Adding a new column with computed value
Pattern: add column nullable → run worker to backfill → tighten to NOT NULL.
Example: adding WorkOrder.estimatedDurationMinutes (Phase 1.1):
- Schema migration: add nullable column.
- Backfill worker: for each closed WO, computes
(resolved_at - created_at)clamped to 0..N; for each open WO, infers from category default. - After 100% backfilled, schema migration tightens to
NOT NULL.
3.2 New event version (e.g., work_order.created.v2)
- Add v2 publishing alongside v1 (dual-publish for ≥ 6 months).
- Consumers migrate independently.
- v1 frozen, then archived.
3.3 Re-emit historical events
For analytics consumers that join late, the importer can re-emit a synthetic work_order.created.v1 (with producedAt = original createdAt, marked replay=true in envelope metadata). Consumers must respect replay=true if they have side effects.
4. Tenant offboarding
When a tenant leaves the platform:
- Export:
tools/export-maintenance.ts --tenantId=<id> --format=excel|json --bucket=gs://...Exports the entire maintenance dataset (WOs, tasks, parts, usages, vendors, assets, schedules, audit summary). - Cool-down period (typically 30 days), during which the data is read-only.
- Erasure:
tools/erase-maintenance.ts --tenantId=<id> --confirm=<token>:- Deletes operational rows (WOs, tasks, parts, usages, schedules).
- Anonymises regulated rows (
vendor_invoice_*retained 7 years per policy with all PII redacted). - Archives event payloads in BigQuery with PII fields nulled.
- Vendor records anonymised (
displayName→[redacted-vnd-<n>], contact fields nulled).
- Audit entry written to
audit-servicewith the export bucket URI and the erasure receipt.
5. Risks specific to migration
| Risk | Mitigation |
|---|---|
Legacy data has duplicate external_id | Importer deduplicates within a sheet; warns; first wins |
Legacy severity values can't map | Importer rejects row; staff adjusts mapping config |
| Large imports timeout single transaction | Batch size 500; per-batch transactions; resume from cursor |
| Backfill worker drifts behind | Metric mnt.backfill.<id>.lag_seconds; alert at 1h |
| Migration applies but schema desync from app | CI gate compares Drizzle/Nest model vs DB introspection; build fails on diff |
6. Linked artefacts
- Schema policies: this doc §1
- Tools:
services/maintenance-service/tools/ - Run history:
maintenance.import_runstable - Reports:
gs://melmastoon-import-reports/ - Risk register entries: R12 (data corruption), R23 (prefix rename frozen)
- Audit destination:
audit-service