Skip to main content

MIGRATION_PLAN — analytics-service

Sibling: DATA_MODEL · EVENT_SCHEMAS · DEPLOYMENT_TOPOLOGY

analytics-service is green-field for Ghasi Melmastoon. This plan covers initial rollout, schema evolution, event evolution, BigQuery layer migrations, coexistence with the legacy Ghasi-edTech analytics, and removal/deprecation playbooks.


1. Initial rollout

1.1 Phasing

PhaseScopeRegionTenants
αInternal demo tenanteurope-west3tnt_internal_demo
βTwo pilot tenants (one EU, one MENA)europe-west32
GAAll tenants in residency-mapped regionsEU → MENA → APACprogressive

Promotion gates between phases require: SLO budgets healthy for 30 d, ≤ 5 critical bugs, SERVICE_READINESS checklist 100 %.

1.2 Feature flags

Flags managed in feature-flags-service, scoped by tenant_id and propertyId:

FlagDefaultPurpose
analytics.queries.enabledoffMaster switch for ad-hoc queries
analytics.dashboards.enabledoffBackoffice dashboards visible
analytics.looker_studio.enabledoffLooker Studio embedding
analytics.ai.metric_explainer.enabledoffAI metric explanation surface
analytics.ai.dashboard_assistant.enabledoffAI-drafted widgets
analytics.forecast.writeback.enabledoffAccept forecast events from orchestrator

Flags are reversible without deploy and observed in OBSERVABILITY dashboards.


2. Coexistence with legacy Ghasi-edTech analytics

The legacy service in Ghasi-edTech predates this platform. Coexistence rules during cutover:

  • Data is not copied from the legacy warehouse. Melmastoon analytics is event-sourced from new platform events from day one.
  • Per-tenant routing is decided by tenant feature flag platform=melmastoon. Until set, dashboards continue to read from the legacy stack.
  • Cross-platform metric definitions are kept name-stable (occupancy_pct, adr, revpar, etc.) so reports and integrations remain compatible.
  • A reconciliation dashboard shows top metrics from both stacks side-by-side during the first 30 d per tenant; deltas > 1 % trigger investigation.
  • Once a tenant is fully cutover, the legacy projection is frozen for 90 d (read-only), then archived.

3. Schema evolution policy (Postgres metadata)

  • Forward-only migrations in db/migrations/ (timestamp-prefixed) with optional reversible companion.
  • Two-phase for renames or destructive type changes:
    1. Add new column / table; dual-write from application code.
    2. Backfill; flip readers; verify; drop old in next release.
  • Index changes use CREATE INDEX CONCURRENTLY; never block writes.
  • RLS policies on every new table — CI gate enforces presence.
  • All migrations applied via Cloud Run one-shot job before traffic shift.

4. Event evolution policy

  • Versioning. Subject suffix .v<n>; never mutate published versions.
  • Adding optional fields: allowed in same major version with defaults at consumer.
  • Removing/renaming fields, type changes: require new .v(n+1) topic; producer dual-publishes during coexistence (≥ 30 d).
  • Subscriber migration: consumers add v(n+1) handler first; switch routing; deprecate v(n) handler; document in CHANGELOG.
  • Sunset criteria: v(n) topic retired only after 30 d of zero traffic + 1 release cycle pass.

@melmastoon/event-contracts is the single source of truth; CI fails on payload drift.


5. BigQuery curated layer migrations

Change kindAllowed?Procedure
Add columnYesALTER TABLE ADD COLUMN; bump _schema_version; consumers tolerate by default
Add partition / clusterYesNew table → backfill → swap label current_v1
Rename column / type changeTwo-phaseAdd new column; dual-write in ETL; flip metric SQL via published version; drop old column next release
New tableYesTerraform-managed; _schema_version=1
Drop tableTwo-phaseMark archived in projection registry; freeze 90 d; remove

_schema_version is a label on the table and a column on each row when it carries semantic meaning. Authorized views always select the published version.


6. Backfill scenarios

ScenarioApproach
New tenant onboardingNo historical data; baseline starts from first event
New metric definitionBackfill via ETL job over events_raw.* from earliest available date (subject to PII retention)
New curated tableBackfill chunked by month; throttled to ≤ 5 % daily byte budget per tenant
Schema repair after a bugTargeted reprocess of specific partition range; emit etl.backfill.completed.v1 (informational)

Backfill jobs run as ETL job kind BACKFILL (DATA_MODEL §4); cost-attributed and rate-limited.


7. Region failover & residency moves

  • Region outage within residency. Cloud SQL HA replica + BigQuery time-travel. Cloud Run revisions re-spin in alternate zone within region. RTO 30 min.
  • Tenant residency change. Manual procedure orchestrated with tenant-service:
    1. Freeze tenant writes in source region (stop accepting events).
    2. Run cascade purge in source region (tenant.deleted.v1).
    3. Reactivate tenant in target region; events flow to new region's pipeline.
    4. Historical analytics for tenant does not carry over (residency-mandated); legal exception requires DPA + tenant.owner approval.

Document each residency move in the data-residency-moves.log with timestamps and operators.


8. Removal / deprecation playbook

  1. Announce deprecation in #analytics and via tenant comm if tenant-visible.
  2. Mark resource as archived (metric, projection, dashboard, widget).
  3. Set sunset date ≥ 30 d from announcement.
  4. Block new references; existing consumers receive deprecation header (Deprecation: true, Sunset: <date>).
  5. On sunset: stop publishing related events; freeze authorized view; export final snapshot for legal/audit retention.
  6. Remove definitions from registry; Terraform plan reflects removal.

9. Open migration items

ItemOwnerDue
Reconciliation dashboard against legacy Ghasi-edTech for top 5 metricsAnalytics squadβ phase
Per-tenant residency move runbook hardeningCompliance + SREGA
Forecast model v2 coexistence (writeback v1 vs v2)AI squadpost-GA
Looker Studio embedded paywall integration with billing-serviceAnalytics + billingpost-GA
Composer → Workflows-only migration evaluationSREpost-GA

10. Versioning & release cadence

  • API versioned via URI (/api/v1/analytics/*); breaking changes ship /v2 after announcement and 90-day overlap.
  • Event versions tracked in @melmastoon/event-contracts releases.
  • Curated table versions tracked in infra/bigquery/CHANGELOG.md.
  • Service follows trunk-based develop with weekly release train; hotfixes off any tag.

Cross-references: DATA_MODEL §4 backfill, EVENT_SCHEMAS §6 versioning, SERVICE_READINESS.