tenant-service — MIGRATION_PLAN
Plan for onboarding existing hotel operators (especially chain customers) onto Ghasi Melmastoon. The dominant migration source is spreadsheet-based org structure (Excel/Google Sheets) listing properties, departments, staff, and roles. This document covers schema migrations into
tenant-service, plus a CSV importer for chain customers.
1. Migration Surfaces
There are three distinct migration surfaces:
| Surface | Scope | Owner |
|---|---|---|
| Database schema migrations | tenant.* tables — additive, forward-only, deprecation windows | service tech lead |
| Role catalog reconciliation | New permissions/roles in canonical registry → existing tenant role catalogs | service tech lead |
| Customer onboarding migration | Chain customer org-tree + staff list from spreadsheet → tenant aggregates via CSV importer | onboarding eng |
All three follow the same operating discipline: no destructive change without an explicit deprecation window, evidence in CI, and a rollback path.
2. Schema Migration Discipline
- Tooling:
node-pg-migrate, files inservices/tenant-service/migrations/NNNN_<slug>.sql. Numeric prefix is forward-only. - Forward-only: no
downmigrations in production. Rollback = forward-fix migration. - Additive default: new columns are nullable or have defaults; new tables ship with RLS enabled and an explicit policy.
- Deprecation window: a dropped column lives in code (read-only) for ≥ 14 days after the last writer was removed; the actual
DROP COLUMNruns after the window in a follow-up migration. - Backfills: any backfill > 100k rows runs in batches of 1k, with a heartbeat metric (
tenant_backfill_progress). - CMEK + RLS: every new tenant-scoped table inherits CMEK from the cluster and ships with
ENABLE ROW LEVEL SECURITYplus a tenant-scoped policy in the same migration.
2.1 Migration PR Checklist
- Migration file added with sequential prefix
- Up path tested against a snapshot of staging
- If destructive, deprecation window documented in PR description
- RLS policy added / unchanged for tenant-scoped tables (auto-checked by
pnpm migrate:lint) - Two-tenant simulator green after migration
- Runbook entry added to
services/tenant-service/runbooks/migrations.mdif backfill > 1 hour
3. Role Catalog Reconciliation
Permissions live in tenant.role_permissions. When the canonical platform registry changes, every existing tenant's role catalog needs to converge.
- Trigger: PR that adds a permission to
domain/permissions.tsrequires a paired migrationmigrations/NNNN_role_catalog_<slug>.sqlthat updates the seed plus a runtime reconciler invocation. - Runtime reconciler:
RoleCatalogReconcilerruns nightly, opens a drift report (metricrole_catalog_drift_total), and emails the platform team if drift > 0 after 48 h. - Operator override:
pnpm migrate:role-catalog --tenants allapplies missing permissions to existing tenants; runs in batches of 100 tenants with a 5-second cool-down.
4. Customer Onboarding Migration (CSV Importer)
The chain customer's spreadsheet typically looks like:
chain_name, region, property_name, department, staff_email, staff_name, role
"Asia Hotels","Central","Asia Hotel Kabul","Front Desk","sara@…","Sara Ahmadi","tenant.front_desk"
"Asia Hotels","Central","Asia Hotel Kabul","Front Desk","ali@…","Ali Hashimi","tenant.front_desk"
"Asia Hotels","Central","Asia Hotel Kabul","Management","gm.kabul@…","Mohammad Daud","tenant.gm"
"Asia Hotels","South","Asia Hotel Kandahar","Front Desk","…"
4.1 CSV Importer Flow
- Operator (super admin) uploads CSV via
POST /api/v1/admin/tenants/{tenantId}/import(multipart). File ≤ 5 MB, ≤ 5,000 rows. - Validator (synchronous, ≤ 30 s):
- Schema check (columns present, types correct).
- Email RFC 5321 check.
- Role exists in tenant role catalog.
- Returns
{ valid_rows, errors[] }with row numbers; UI shows per-row error.
- Plan stage: Importer computes a plan — list of org units to create, list of invitations to send. Returns plan to operator for approval. No state change yet.
- Approve & execute: Operator approves; importer runs as a saga (one transaction per "chunk" of 50 rows). Each chunk:
- Upserts org units (chain → region → property → department).
- Sends invitations (skips already-active emails).
- Logs progress to
tenant.import_jobs(job_id, tenant_id, total_rows, rows_done, errors[]).
- Resumable: If a chunk fails, the job pauses; operator can resume after fix.
- Completion: Job marked
done; emitsmelmastoon.tenant.import_completed.v1(no PII; just counts and tenant_id).
4.2 Idempotency
- Each row keyed by
(tenant_id, property_slug, staff_email). - Re-running the importer with the same file is a no-op for already-applied rows.
- Operator can re-run after fixing a row; only the changed row is processed.
4.3 Limits and Safeguards
- Max 5,000 rows per import; larger files are split server-side.
- Max 50 invitations per minute per tenant (rate limit on the membership/invitation surface still applies).
- If invitation send fails (
notification-serviceoutage), the row is queued for retry; operator can see queued count in the import job UI. - Invitations created by importer have a longer TTL (14 days vs 7 days) since onboarding is in flight.
5. Reverse Migration / Tenant Export
For GDPR DSAR and for tenants leaving the platform:
- Export endpoint:
POST /api/v1/admin/tenants/{tenantId}/exportproduces a signed Cloud Storage URL with all tenant-owned data:tenants,tenant_configs,organization_units,memberships,roles,role_assignments,invitations(active only; tokens redacted),billing_contacts,feature_flags,audit_events(last 90 d). Format: gzipped JSON Lines. - Retention of export: 7 days, then auto-deleted.
- Cross-service export: initiates a fan-out saga on
melmastoon.tenant.export_requested.v1to all data-owning services (reservation, billing, etc.); each service drops its slice into the same Cloud Storage bucket; importer ZIPs the result.
6. Cutover Plan for First Chain Customer
- Week 1: Operator uploads anonymized sample CSV; we validate schema; share back per-row report.
- Week 2: Operator uploads real CSV in a sandbox tenant; runs end-to-end including invitation acceptance for a small staff cohort (5 people).
- Week 3: Production tenant provisioned by super admin; CSV import runs; invitations sent over 3 days (rate-limited).
- Week 4: Go-live; tenant runs in parallel with their legacy system; reservations are made via Ghasi but reconciled nightly.
- Week 6: Legacy system retired; Ghasi is system-of-record.
A runbook with timeline, contact tree, and rollback (re-enable legacy + suspend tenant) lives at services/tenant-service/runbooks/chain-onboarding.md.