MIGRATION_PLAN — analytics-service
Sibling: DATA_MODEL · EVENT_SCHEMAS · DEPLOYMENT_TOPOLOGY
analytics-service is green-field for Ghasi Melmastoon. This plan covers initial rollout, schema evolution, event evolution, BigQuery layer migrations, coexistence with the legacy Ghasi-edTech analytics, and removal/deprecation playbooks.
1. Initial rollout
1.1 Phasing
| Phase | Scope | Region | Tenants |
|---|---|---|---|
| α | Internal demo tenant | europe-west3 | tnt_internal_demo |
| β | Two pilot tenants (one EU, one MENA) | europe-west3 | 2 |
| GA | All tenants in residency-mapped regions | EU → MENA → APAC | progressive |
Promotion gates between phases require: SLO budgets healthy for 30 d, ≤ 5 critical bugs, SERVICE_READINESS checklist 100 %.
1.2 Feature flags
Flags managed in feature-flags-service, scoped by tenant_id and propertyId:
| Flag | Default | Purpose |
|---|---|---|
analytics.queries.enabled | off | Master switch for ad-hoc queries |
analytics.dashboards.enabled | off | Backoffice dashboards visible |
analytics.looker_studio.enabled | off | Looker Studio embedding |
analytics.ai.metric_explainer.enabled | off | AI metric explanation surface |
analytics.ai.dashboard_assistant.enabled | off | AI-drafted widgets |
analytics.forecast.writeback.enabled | off | Accept forecast events from orchestrator |
Flags are reversible without deploy and observed in OBSERVABILITY dashboards.
2. Coexistence with legacy Ghasi-edTech analytics
The legacy service in Ghasi-edTech predates this platform. Coexistence rules during cutover:
- Data is not copied from the legacy warehouse. Melmastoon analytics is event-sourced from new platform events from day one.
- Per-tenant routing is decided by tenant feature flag
platform=melmastoon. Until set, dashboards continue to read from the legacy stack. - Cross-platform metric definitions are kept name-stable (
occupancy_pct,adr,revpar, etc.) so reports and integrations remain compatible. - A reconciliation dashboard shows top metrics from both stacks side-by-side during the first 30 d per tenant; deltas > 1 % trigger investigation.
- Once a tenant is fully cutover, the legacy projection is frozen for 90 d (read-only), then archived.
3. Schema evolution policy (Postgres metadata)
- Forward-only migrations in
db/migrations/(timestamp-prefixed) with optional reversible companion. - Two-phase for renames or destructive type changes:
- Add new column / table; dual-write from application code.
- Backfill; flip readers; verify; drop old in next release.
- Index changes use
CREATE INDEX CONCURRENTLY; never block writes. - RLS policies on every new table — CI gate enforces presence.
- All migrations applied via Cloud Run one-shot job before traffic shift.
4. Event evolution policy
- Versioning. Subject suffix
.v<n>; never mutate published versions. - Adding optional fields: allowed in same major version with defaults at consumer.
- Removing/renaming fields, type changes: require new
.v(n+1)topic; producer dual-publishes during coexistence (≥ 30 d). - Subscriber migration: consumers add v(n+1) handler first; switch routing; deprecate v(n) handler; document in CHANGELOG.
- Sunset criteria: v(n) topic retired only after 30 d of zero traffic + 1 release cycle pass.
@melmastoon/event-contracts is the single source of truth; CI fails on payload drift.
5. BigQuery curated layer migrations
| Change kind | Allowed? | Procedure |
|---|---|---|
| Add column | Yes | ALTER TABLE ADD COLUMN; bump _schema_version; consumers tolerate by default |
| Add partition / cluster | Yes | New table → backfill → swap label current_v1 |
| Rename column / type change | Two-phase | Add new column; dual-write in ETL; flip metric SQL via published version; drop old column next release |
| New table | Yes | Terraform-managed; _schema_version=1 |
| Drop table | Two-phase | Mark archived in projection registry; freeze 90 d; remove |
_schema_version is a label on the table and a column on each row when it carries semantic meaning. Authorized views always select the published version.
6. Backfill scenarios
| Scenario | Approach |
|---|---|
| New tenant onboarding | No historical data; baseline starts from first event |
| New metric definition | Backfill via ETL job over events_raw.* from earliest available date (subject to PII retention) |
| New curated table | Backfill chunked by month; throttled to ≤ 5 % daily byte budget per tenant |
| Schema repair after a bug | Targeted reprocess of specific partition range; emit etl.backfill.completed.v1 (informational) |
Backfill jobs run as ETL job kind BACKFILL (DATA_MODEL §4); cost-attributed and rate-limited.
7. Region failover & residency moves
- Region outage within residency. Cloud SQL HA replica + BigQuery time-travel. Cloud Run revisions re-spin in alternate zone within region. RTO 30 min.
- Tenant residency change. Manual procedure orchestrated with
tenant-service:- Freeze tenant writes in source region (stop accepting events).
- Run cascade purge in source region (
tenant.deleted.v1). - Reactivate tenant in target region; events flow to new region's pipeline.
- Historical analytics for tenant does not carry over (residency-mandated); legal exception requires DPA + tenant.owner approval.
Document each residency move in the data-residency-moves.log with timestamps and operators.
8. Removal / deprecation playbook
- Announce deprecation in
#analyticsand via tenant comm if tenant-visible. - Mark resource as
archived(metric, projection, dashboard, widget). - Set sunset date ≥ 30 d from announcement.
- Block new references; existing consumers receive deprecation header (
Deprecation: true,Sunset: <date>). - On sunset: stop publishing related events; freeze authorized view; export final snapshot for legal/audit retention.
- Remove definitions from registry; Terraform plan reflects removal.
9. Open migration items
| Item | Owner | Due |
|---|---|---|
Reconciliation dashboard against legacy Ghasi-edTech for top 5 metrics | Analytics squad | β phase |
| Per-tenant residency move runbook hardening | Compliance + SRE | GA |
| Forecast model v2 coexistence (writeback v1 vs v2) | AI squad | post-GA |
Looker Studio embedded paywall integration with billing-service | Analytics + billing | post-GA |
| Composer → Workflows-only migration evaluation | SRE | post-GA |
10. Versioning & release cadence
- API versioned via URI (
/api/v1/analytics/*); breaking changes ship/v2after announcement and 90-day overlap. - Event versions tracked in
@melmastoon/event-contractsreleases. - Curated table versions tracked in
infra/bigquery/CHANGELOG.md. - Service follows trunk-based develop with weekly release train; hotfixes off any tag.
Cross-references: DATA_MODEL §4 backfill, EVENT_SCHEMAS §6 versioning, SERVICE_READINESS.