Skip to main content

SERVICE_OVERVIEW — analytics-service

Bundle index: SERVICE_OVERVIEW · DOMAIN_MODEL · APPLICATION_LOGIC · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL · SYNC_CONTRACT · AI_INTEGRATION · SECURITY_MODEL · OBSERVABILITY · TESTING_STRATEGY · DEPLOYMENT_TOPOLOGY · FAILURE_MODES · LOCAL_DEV_SETUP · SERVICE_READINESS · SERVICE_RISK_REGISTER · MIGRATION_PLAN

Strategic anchors: 02 Enterprise Architecture · 04 Event-Driven Architecture · 05 API Design · 06 Data Models · 07 Security/Compliance/Tenancy · 08 AI Architecture


1. Purpose

analytics-service is the read-side analytics platform of Ghasi Melmastoon. It turns the firehose of platform domain events into:

  • Trustworthy facts in BigQuery curated tables (fact_*, dim_*).
  • Defined metrics with frozen SQL semantics (occupancy %, ADR, RevPAR, ALOS, cancellation rate, no-show rate, channel mix, AI-suggestion-acceptance rate).
  • Composable dashboards (KPI tiles, time-series, breakdown, funnel, heatmap) for hotel staff, managers, and tenant admins.
  • Aggregated signals that ai-orchestrator-service ingests (occupancy curves, lead-time distributions, day-of-week and seasonality features) and writes back as forecasts.

It does not compute UI; the BFF (bff-backoffice-service) and Looker Studio render. It does not generate documents; reporting-service does. It does not run inference; ai-orchestrator-service does.


2. Bounded context

Analytics — Supporting bounded context in the platform context map (02 §3).

Public-facing because tenant admins author dashboards and view widgets, and Looker Studio is a paid power-user surface; otherwise its primary consumers are internal services.


3. Aggregates owned

AggregateOne-lineStorage
AnalyticsEventRaw event landing in BigQuery (immutable)BigQuery events_raw.*
ProjectionCurated table definition (target schema, source query, refresh policy, freshness SLO, version)Cloud SQL analytics.projections
MetricDefinitionFrozen-SQL metric with dimensions, units, allowed filtersCloud SQL analytics.metric_definitions
DashboardTenant-admin-authored composition of widgetsCloud SQL analytics.dashboards
WidgetOne visual element bound to a metric or queryCloud SQL analytics.widgets
QuerySaved curated query (named, parameterized)Cloud SQL analytics.queries
ETLJobScheduled or on-demand refresh of one or more projectionsCloud SQL analytics.etl_jobs
DataQualityCheckDefinition + most recent result for one quality assertionCloud SQL analytics.dq_checks, history in BigQuery dq_results

Detailed structure & invariants in DOMAIN_MODEL.


4. Responsibilities

  1. Event landing. Pub/Sub-to-BigQuery managed subscriptions land every melmastoon.* event into events_raw.<topic_unsuffixed> with envelope + payload columns.
  2. Curated layer ETL. Scheduled jobs (Cloud Workflows + Cloud Run Jobs) MERGE incremental rows from raw to curated tables; clustered & partitioned per DATA_MODEL §3.
  3. Metric definitions registry. Versioned, frozen-SQL metrics with explicit dimension/grain.
  4. Query API. Authenticated REST API to run pre-defined queries and read widget data, with byte caps and slot routing.
  5. Dashboard authoring. CRUD on dashboards & widgets; per-tenant scope; sharing via Looker Studio embed tokens.
  6. Data quality. Row-count drift, freshness lag, null rate, distinct-count, business-rule checks; results published as events.
  7. AI pipeline. Publish metric.computed.v1 for occupancy/forecasting features; consume ai.forecast.produced.v1 to write fact_demand_forecast.
  8. Tenant isolation. Per-tenant authorized views; managed service account never executes raw SQL on behalf of tenants without view-binding.

5. Context map

DirectionCounterpartRelationship
Upstream consumerevery emitting servicePub/Sub conformist
Downstreamreporting-serviceReads curated tables; we expose stable schemas with version pins
Downstreambff-backoffice-serviceCustomer/Supplier; we are supplier of widget data
Downstreamai-orchestrator-serviceCustomer/Supplier (bidirectional): we publish signals, consume forecasts
DownstreamLooker StudioOpen Host Service via authorized views
Peertenant-serviceConformist for residency, deletion, region change
Peeraudit-serviceAnti-corruption layer for DQ alert audit trail
Peeriam-serviceConformist for permissions and JWT

6. End-to-end pipeline (sketch)

[Pub/Sub topic melmastoon.*]


[Pub/Sub-to-BigQuery subscription]
│ (raw envelope + payload JSON)

events_raw.<topic> (partition by ingestion_ts, cluster by tenant_id)

│ scheduled (5 min hot, 15 min cold)

[ETLJob: merge_curated_<domain>]
│ reads: events_raw.<…>, dim_*, prior fact_*
│ writes: fact_<domain> (incremental MERGE)

[fact_reservation, fact_payment, fact_housekeeping_task, fact_lock_action, …]

├──▶ Query API ──▶ Widget data ──▶ Electron desktop dashboard
├──▶ Authorized View ──▶ Looker Studio
├──▶ reporting-service (read-only)
└──▶ MetricDefinition.compute() ──▶ metric.computed.v1 ──▶ ai-orchestrator


ai.forecast.produced.v1 → fact_demand_forecast

7. Domain invariants

  • Immutable raw. events_raw.* rows are never updated or deleted by analytics jobs except via tenant.deleted purge.
  • Curated is idempotent. Re-running a curated MERGE for the same window must produce the same rows (deterministic source query + stable hash keys).
  • Tenant scoping. Every curated row carries tenant_id. Authorized views always restrict to WHERE tenant_id IN (SELECT tenant_id FROM session_tenant_scope()).
  • Metric reproducibility. A metric value at a given (window, filters, version) must be reproducible from raw events for the retention window.
  • Schema versioning. Curated tables carry _schema_version; consumers pin to a major version; breaking changes ship as a new table (e.g., fact_reservation_v2) with a coexistence window of one quarter.

8. Hot read paths

PathLatency targetNotes
Widget data (cached)p95 ≤ 250 msMemorystore Redis cache keyed by (widgetId, paramsHash, asOfWindow)
Widget data (cold)p95 ≤ 800 msBigQuery curated read with byte cap
Ad-hoc query (curated)p95 ≤ 3 s≤ 1 GB scanned; per-tenant slot reservation
Ad-hoc query (raw)gated to adminsup to 30 s
Metric compute (window)p95 ≤ 5 sscheduled batch path; ad-hoc allowed for tenant.admin

9. Cost & scale envelope

DimensionPhase 1
Tenants1k pilots → 50k SMB hotels
Events ingested/day1M (early) → 200M (mid)
Curated tables10 fact + 8 dim
BigQuery slotsreservation 200 (baseline), autoscale 800
Cloud SQL (metadata)4 vCPU / 16 GiB regional HA
Memorystore Redis5 GiB tier
Per-tenant cost capconfigurable; default 50 USD / month equivalent of slot ms + bytes scanned

10. Key dependencies

  • NestJS (TypeScript) — API and worker layout per APPLICATION_LOGIC §1.
  • Drizzle — Cloud SQL Postgres for metadata.
  • @google-cloud/bigquery, @google-cloud/pubsub, @google-cloud/storage, @google-cloud/scheduler, @google-cloud/workflows.
  • @google-cloud/dataform (optional) for SQL workflow definitions; otherwise plain SQL files in etl/ executed via the Workflows runner.
  • OpenTelemetry SDK; SigNoz + Cloud Monitoring.
  • Cloud KMS (CMEK) for any small staging buckets we own.
  • Looker Studio as a downstream consumer (no SDK dependency on our side; we expose authorized views).

11. Decision log anchors

  • ADR-0001 Core architecture & tech stack — establishes BigQuery as analytics warehouse and TypeScript/NestJS for application services.
  • ADR-0002 Multi-tenancy model — defines per-tenant authorized view RLS for BigQuery analytics access.
  • ADR-0003 Electron offline-first desktop — defines KPI snapshot pull contract for desktop (SYNC_CONTRACT).

12. Cross-references