Skip to main content

iam-service — Migration Plan

Catalog · DATA_MODEL · API_CONTRACTS · EVENT_SCHEMAS · SECURITY_MODEL

iam-service launches greenfield in M0 — there is no legacy data to migrate from prior systems for the first tenant. This plan instead covers (a) internal evolution (DB / API / events / crypto rolling forward without breaking running clients) and (b) inbound tenant migrations (M2+) where a hotel switches to Melmastoon from an existing PMS that has its own user accounts.

1. Backward-Compatibility Principles

PrincipleApplication
Never break a v1 event consumerAdd fields = OK; remove / rename = new vN+1 side-by-side
Never break a /api/v1 consumerSame rule; deprecate via Sunset header before removing
DB migrations are forward-compatibleTwo-phase: write to old + new, backfill, switch reads, remove old
Crypto upgrades are user-transparentRehash on next login; never lock users out
Identity is stickyuserId and tenantId never change post-creation

All migrations are gated by SERVICE_READINESS and reviewed by DBRE.

2. Schema Evolution

2.1 DB

PhaseAction
1. AddNew column / table with NULL-allowed; deploy code that writes to it
2. BackfillBackground job populates historic rows (idempotent, batched)
3. ReadDeploy code that reads new column; old code still tolerates absence
4. ConstrainAdd NOT NULL / UNIQUE constraints once backfill verified
5. Remove oldDrop legacy column / table after ≥ 1 release with no reads

Each phase is its own migration file; never combined. Migration tool: node-pg-migrate. Migrations are idempotent (IF NOT EXISTS patterns).

2.2 RLS

When introducing tenant-scoping to a previously platform-wide table, run a four-step migration:

  1. Add tenant_id column nullable.
  2. Backfill tenant_id (or leave NULL for platform-only rows).
  3. Enable RLS with (tenant_id = current_setting('app.tenant_id')::uuid OR tenant_id IS NULL).
  4. Tighten policy to drop the OR clause once all rows have tenant_id.

2.3 Indexes

Created with CREATE INDEX CONCURRENTLY to avoid lock; verified on read replica before primary.

2.4 Partition Maintenance

audit_events is monthly-partitioned. Cron creates next 6 months ahead; old partitions detached and archived to BigQuery at T-90d.

3. API Versioning

ChangeTreatment
Add new endpointNon-breaking; ship in v1
Add optional field to requestNon-breaking
Add field to responseNon-breaking; consumers must tolerate unknown fields (per 30-api rule)
Make field requiredBreaking → new v2 endpoint; Sunset header on v1
Rename / remove fieldBreaking → new v2
Change validation rules to be stricterBreaking → new v2; old v1 keeps lenient validation
Change error code mappingBreaking → new v2

Deprecation timeline: minimum 6 months from Sunset announcement to removal; major changes ≥ 12 months.

Sunset headers carry: Sunset: Sat, 31 Dec 2026 23:59:59 GMT, Deprecation: true, Link: <https://docs.melmastoon.cloud/iam/migration/v2>; rel="deprecation".

4. JWT Claim Evolution

ChangeAllowed in v1?Treatment
Add new optional claimNon-breaking; consumers must ignore unknown claims
Add new required claim (consumers must read)New JWT version negotiated via iss-suffix or ver claim
Change semantics of an existing claimSame as above
Remove a claimSame

A ver claim (currently 1) anticipates future versioning. Consumers MUST verify ver.

Rotation does not change claim shape; only kid and signature.

5. Event Versioning

Per platform standard (40-events):

ActionTreatment
Add a field to event payloadNon-breaking; new field becomes part of vN; consumer must tolerate
Make existing field requiredBreaking → publish vN+1 alongside vN
Change semanticsBreaking → vN+1
Remove a fieldBreaking → vN+1

Lifecycle:

publish vN ─► publish vN+1 alongside vN ─► consumers migrate ─► stop publishing vN ─► retire topic vN (after retention)

Both versions are published for ≥ 30 days; longer if a major consumer hasn't migrated.

Schema-registry CI gates compatibility (BACKWARD).

6. Password Hash Migration

Argon2id parameters are reviewed annually against OWASP Password Storage Cheat Sheet. When parameters bump:

StepAction
1Bump PASSWORD_HASH_VERSION in config; new column credentials.hash_version already records it
2New registrations use new params
3On every successful login, if hash_version < current → recompute hash, store, emit iam.password.rehashed.v1
4Background pwn_audit job optionally rehashes inactive accounts on monthly cycle (lower priority)
5After ≥ 12 months, decide whether to force a reset for residual stale-hashed accounts (rare)

Algorithm changes (e.g. argon2id → next-gen) follow the same flow with a wider migration window.

7. MFA Factor Migration

ChangeTreatment
New factor type addedBackward-compat; existing factors unaffected
TOTP algorithm change (e.g. SHA-1 → SHA-256)New TOTP enrollments use new algo; existing remain valid until user re-enrolls; UI nudges re-enroll
WebAuthn algorithm bumpNew credentials negotiate at registration; existing remain valid until expiry/replace
Recovery code regenerationAlways invalidates previous bundle; user must download anew

Factor data is never silently mutated — a user must take action to upgrade an existing factor.

8. IdP Configuration Migration

When tenant rotates SSO IdP:

StepAction
1Tenant adds new IdP via admin UI; both old + new active
2iam supports per-tenant multiple IdPs in DATA_MODEL
3Communications from tenant tell users to log in via new IdP
4After migration window, tenant removes old IdP; iam refuses new logins via it but ExternalIdentity linkage retained for audit
5Optional: re-link existing ExternalIdentity rows from old provider to new (admin tool, requires user-side reconfirmation)

9. Offline Bundle / Device-Key Migration

ChangeTreatment
Bundle format bumpNew bundle_version field; client tolerates missing fields; server can serve both for 30 d
Device key algorithm bumpNew device bindings use new algo; existing certs remain valid until expiry; renewal upgrades
Tenant CA rotationAlways overlapped (≥ 7 d); new certs signed by new CA; existing certs valid until natural expiry; both CA chains in client trust store during overlap

10. Tenant Onboarding (Greenfield — M0)

Default path. No legacy data.

StepAction
1tenant-service emits melmastoon.tenant.created.v1
2iam consumes; provisions super-tenant-admin user with email-verification flow
3Tenant admin completes verification; sets password + MFA
4Tenant admin invites staff via tenant-service (which calls iam to create user records)
5Optional: tenant admin configures SSO; iam stores IdP config
6Optional: tenant admin downloads + signs CA bootstrap for offline desktops

Time-to-first-user: < 5 min.

11. Tenant Migration from Existing PMS (M2+)

When a hotel switches to Melmastoon from an existing PMS:

11.1 Inputs

SourceFormat
User exportCSV with columns: external_id, email, display_name, role, mfa_enabled, last_login_at, created_at
Optional: hashed passwordsonly if from compatible argon2id deployment; else users must reset on first login (recommended path)
Optional: SSO claim mappingsJSON document

11.2 Importer flow (scripts/iam-import.ts)

CSV ──► validation ──► dry-run report ──► import (batched)


┌──────────────┐
│ For each row │
├──────────────┤
│ - Create User (status=invited)
│ - Map external role → Melmastoon role (passed to tenant-service event)
│ - Insert ExternalIdentity row (provider="legacy:<pms>", subject=external_id)
│ - If hashed password supplied + compatible: insert Credential w/ hash_version=2 (force rehash on first login)
│ - If not: leave Credential absent; user receives magic-link invite
│ - Emit `iam.user.registered.v1` (with import_batch_id for traceability)
└──────────────┘

11.3 First-login UX

Path A (no password supplied)Path B (password supplied)
User receives invitation email with magic linkUser logs in with same password
Sets password + MFA via standard flowiam transparently rehashes to current params
Linked to existing ExternalIdentity for auditIf breach-list match → forced reset before continue

11.4 Cutover

StepAction
1Import dry-run; review counts + errors
2Schedule cutover window
3Old PMS frozen (read-only)
4Run import (resumable, batched, 100 users/batch)
5Send invitations
6T+7d: track adoption metric (users who completed first login); poke laggards
7T+30d: deactivate any never-logged-in invitee (tenant admin decision)

Idempotent: re-running import skips users with matching ExternalIdentity{provider, subject}.

12. Multi-Region Migration (M2)

Per DEPLOYMENT_TOPOLOGY §5.

StepAction
1Stand up second region (Cloud SQL primary, Memorystore, KMS, Cloud Run).
2Configure logical replication (DB layer; tenant-router routes write per residency).
3KMS keyring replicated; same kid aliases.
4Pub/Sub topic + subscription replicated; consumer dedup based on eventId.
5DNS geo-routes new tenants to nearest region.
6Existing tenants stay pinned unless residency change requested.

Tenant residency change procedure (rare): create export → quiesce writes for tenant → import to target region → switch routing → verify → un-quiesce. Documented in runbooks/iam/tenant-region-move.md.

13. Crypto Algorithm Migration (Future)

If Ed25519 is ever superseded:

StepAction
1Generate new keypair in KMS for new algorithm
2Publish in JWKS alongside existing
3New JWTs signed with new algorithm; new kid
4Consumers must support both algorithms during transition
5After ≥ 30 d, stop signing with old algorithm
6Old kid retained in JWKS for token-lifetime + audit

Same pattern for tenant CA; existing device certs remain valid until natural expiry.

14. Breaking-Change Changelog

Maintained in docs/migrations/iam-changelog.md. Each entry records: date, what changed, why, migration window, removal date, runbook link.

DateChangeStatus
(pending M0 launch — no entries)

15. Rollback Strategy

Every migration step has an explicit rollback:

PhaseRollback
Schema (add column / table)DROP IF EXISTS; safe because phase 1 doesn't read
Schema (backfill)Stop job; rows partial-NULL is benign
Schema (read switch)Revert deploy; reads still tolerate absence
Schema (constraint tighten)DROP CONSTRAINT
Schema (drop legacy)Restore from PITR (last resort)
API (new version)Keep old version live; revert traffic
Event (new version)Keep old version publishing; revert consumer config
Crypto (param bump)Old hash version still verifies; lower-priority rehash continues from old

CI runs forward + rollback migrations in test before merge; rollback failures block.

16. Observability of Migrations

Every migration emits:

  • iam_migration_step_total{name, phase, result} counter
  • iam_migration_backfill_progress{name} gauge (0–100 %)
  • Audit event iam.migration.executed{name, phase, sha} for traceability

Long-running backfills publish a progress record every 60 s; on-call dashboard shows in-flight migrations.

17. Cross-References