Skip to main content

tenant-service — MIGRATION_PLAN

Plan for onboarding existing hotel operators (especially chain customers) onto Ghasi Melmastoon. The dominant migration source is spreadsheet-based org structure (Excel/Google Sheets) listing properties, departments, staff, and roles. This document covers schema migrations into tenant-service, plus a CSV importer for chain customers.


1. Migration Surfaces

There are three distinct migration surfaces:

SurfaceScopeOwner
Database schema migrationstenant.* tables — additive, forward-only, deprecation windowsservice tech lead
Role catalog reconciliationNew permissions/roles in canonical registry → existing tenant role catalogsservice tech lead
Customer onboarding migrationChain customer org-tree + staff list from spreadsheet → tenant aggregates via CSV importeronboarding eng

All three follow the same operating discipline: no destructive change without an explicit deprecation window, evidence in CI, and a rollback path.


2. Schema Migration Discipline

  • Tooling: node-pg-migrate, files in services/tenant-service/migrations/NNNN_<slug>.sql. Numeric prefix is forward-only.
  • Forward-only: no down migrations in production. Rollback = forward-fix migration.
  • Additive default: new columns are nullable or have defaults; new tables ship with RLS enabled and an explicit policy.
  • Deprecation window: a dropped column lives in code (read-only) for ≥ 14 days after the last writer was removed; the actual DROP COLUMN runs after the window in a follow-up migration.
  • Backfills: any backfill > 100k rows runs in batches of 1k, with a heartbeat metric (tenant_backfill_progress).
  • CMEK + RLS: every new tenant-scoped table inherits CMEK from the cluster and ships with ENABLE ROW LEVEL SECURITY plus a tenant-scoped policy in the same migration.

2.1 Migration PR Checklist

  • Migration file added with sequential prefix
  • Up path tested against a snapshot of staging
  • If destructive, deprecation window documented in PR description
  • RLS policy added / unchanged for tenant-scoped tables (auto-checked by pnpm migrate:lint)
  • Two-tenant simulator green after migration
  • Runbook entry added to services/tenant-service/runbooks/migrations.md if backfill > 1 hour

3. Role Catalog Reconciliation

Permissions live in tenant.role_permissions. When the canonical platform registry changes, every existing tenant's role catalog needs to converge.

  • Trigger: PR that adds a permission to domain/permissions.ts requires a paired migration migrations/NNNN_role_catalog_<slug>.sql that updates the seed plus a runtime reconciler invocation.
  • Runtime reconciler: RoleCatalogReconciler runs nightly, opens a drift report (metric role_catalog_drift_total), and emails the platform team if drift > 0 after 48 h.
  • Operator override: pnpm migrate:role-catalog --tenants all applies missing permissions to existing tenants; runs in batches of 100 tenants with a 5-second cool-down.

4. Customer Onboarding Migration (CSV Importer)

The chain customer's spreadsheet typically looks like:

chain_name, region, property_name, department, staff_email, staff_name, role
"Asia Hotels","Central","Asia Hotel Kabul","Front Desk","sara@…","Sara Ahmadi","tenant.front_desk"
"Asia Hotels","Central","Asia Hotel Kabul","Front Desk","ali@…","Ali Hashimi","tenant.front_desk"
"Asia Hotels","Central","Asia Hotel Kabul","Management","gm.kabul@…","Mohammad Daud","tenant.gm"
"Asia Hotels","South","Asia Hotel Kandahar","Front Desk","…"

4.1 CSV Importer Flow

  1. Operator (super admin) uploads CSV via POST /api/v1/admin/tenants/{tenantId}/import (multipart). File ≤ 5 MB, ≤ 5,000 rows.
  2. Validator (synchronous, ≤ 30 s):
    • Schema check (columns present, types correct).
    • Email RFC 5321 check.
    • Role exists in tenant role catalog.
    • Returns { valid_rows, errors[] } with row numbers; UI shows per-row error.
  3. Plan stage: Importer computes a plan — list of org units to create, list of invitations to send. Returns plan to operator for approval. No state change yet.
  4. Approve & execute: Operator approves; importer runs as a saga (one transaction per "chunk" of 50 rows). Each chunk:
    • Upserts org units (chain → region → property → department).
    • Sends invitations (skips already-active emails).
    • Logs progress to tenant.import_jobs (job_id, tenant_id, total_rows, rows_done, errors[]).
  5. Resumable: If a chunk fails, the job pauses; operator can resume after fix.
  6. Completion: Job marked done; emits melmastoon.tenant.import_completed.v1 (no PII; just counts and tenant_id).

4.2 Idempotency

  • Each row keyed by (tenant_id, property_slug, staff_email).
  • Re-running the importer with the same file is a no-op for already-applied rows.
  • Operator can re-run after fixing a row; only the changed row is processed.

4.3 Limits and Safeguards

  • Max 5,000 rows per import; larger files are split server-side.
  • Max 50 invitations per minute per tenant (rate limit on the membership/invitation surface still applies).
  • If invitation send fails (notification-service outage), the row is queued for retry; operator can see queued count in the import job UI.
  • Invitations created by importer have a longer TTL (14 days vs 7 days) since onboarding is in flight.

5. Reverse Migration / Tenant Export

For GDPR DSAR and for tenants leaving the platform:

  • Export endpoint: POST /api/v1/admin/tenants/{tenantId}/export produces a signed Cloud Storage URL with all tenant-owned data: tenants, tenant_configs, organization_units, memberships, roles, role_assignments, invitations (active only; tokens redacted), billing_contacts, feature_flags, audit_events (last 90 d). Format: gzipped JSON Lines.
  • Retention of export: 7 days, then auto-deleted.
  • Cross-service export: initiates a fan-out saga on melmastoon.tenant.export_requested.v1 to all data-owning services (reservation, billing, etc.); each service drops its slice into the same Cloud Storage bucket; importer ZIPs the result.

6. Cutover Plan for First Chain Customer

  1. Week 1: Operator uploads anonymized sample CSV; we validate schema; share back per-row report.
  2. Week 2: Operator uploads real CSV in a sandbox tenant; runs end-to-end including invitation acceptance for a small staff cohort (5 people).
  3. Week 3: Production tenant provisioned by super admin; CSV import runs; invitations sent over 3 days (rate-limited).
  4. Week 4: Go-live; tenant runs in parallel with their legacy system; reservations are made via Ghasi but reconciled nightly.
  5. Week 6: Legacy system retired; Ghasi is system-of-record.

A runbook with timeline, contact tree, and rollback (re-enable legacy + suspend tenant) lives at services/tenant-service/runbooks/chain-onboarding.md.