Skip to main content

DATA_MODEL — analytics-service

Sibling: DOMAIN_MODEL · APPLICATION_LOGIC · SECURITY_MODEL · platform anchor: docs/06 Data Models, docs/standards/NAMING

analytics-service owns two storage surfaces:

  1. Cloud SQL Postgres (analytics schema) — metadata: projection definitions, metric definitions, dashboards, widgets, queries, ETL jobs/runs, DQ checks/results pointers, outbox/inbox.
  2. BigQuery (events_raw.*, analytics_curated.*, dq_results.*) — the warehouse layer.

1. Identity strategy

ULIDs with prefixes per DOMAIN_MODEL §1. Prefixes (prj_, met_, dsh_, wid_, qry_, etl_, etr_, dqc_, dqr_) are added to the canonical registry in docs/standards/NAMING §6.

Curated table rows are keyed by domain natural keys (reservation_id, payment_id, etc.) plus tenant_id and _event_id for idempotent MERGE.


2. Storage placement

StorePurpose
Cloud SQL Postgres analyticsProjection/metric/dashboard metadata, ETL state, DQ checks, outbox, inbox
BigQuery events_raw.*Raw event landing (one table per subject, partitioned by ingestion ts, clustered by tenant_id)
BigQuery analytics_curated.*Curated fact + dim tables (versioned _v<n>)
BigQuery dq_results.*Historical DQ check results (immutable append)
BigQuery tenant_views.*Per-tenant authorized views over curated tables
Memorystore RedisWidget data cache, byte-budget counters, hot metric cache
GCS staging bucketsBackfill exports + Looker Studio embed signing material (key refs only)

3. Cloud SQL DDL (metadata)

3.1 analytics.projections

CREATE TABLE analytics.projections (
id text PRIMARY KEY, -- prj_<ulid>
key text NOT NULL, -- 'fact_reservation'
schema_version int NOT NULL CHECK (schema_version >= 1),
target_dataset text NOT NULL,
target_table text NOT NULL, -- 'fact_reservation_v1'
partitioning jsonb NOT NULL,
clustering text[] NOT NULL,
source_query text NOT NULL, -- MERGE … USING (…) ON …
merge_key text[] NOT NULL,
refresh_policy text NOT NULL CHECK (refresh_policy IN ('incremental','full')),
window_minutes int NOT NULL CHECK (window_minutes >= 5),
freshness jsonb NOT NULL,
archived boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0, -- OCC

UNIQUE (key, schema_version)
);

3.2 analytics.metric_definitions

CREATE TABLE analytics.metric_definitions (
id text PRIMARY KEY, -- met_<ulid>
tenant_id text, -- NULL = platform-shared
key text NOT NULL, -- 'reservation.occupancy_pct'
metric_version int NOT NULL CHECK (metric_version >= 1),
display jsonb NOT NULL,
unit jsonb NOT NULL,
grain text NOT NULL CHECK (grain IN ('minute','hour','day','week','month','quarter','year')),
dimensions jsonb NOT NULL,
source_tables text[] NOT NULL,
schema_version int NOT NULL,
params jsonb NOT NULL,
sql_template text NOT NULL,
byte_cap jsonb NOT NULL,
freshness jsonb NOT NULL,
archived boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0,
UNIQUE (tenant_id, key, metric_version)
);

ALTER TABLE analytics.metric_definitions ENABLE ROW LEVEL SECURITY;
CREATE POLICY metric_definitions_tenant_isolation ON analytics.metric_definitions
USING (tenant_id IS NULL OR tenant_id = current_setting('app.tenant_id', true));

3.3 analytics.dashboards, analytics.widgets, analytics.queries

CREATE TABLE analytics.dashboards (
id text PRIMARY KEY, -- dsh_<ulid>
tenant_id text NOT NULL,
owner_user_id text NOT NULL,
scope text NOT NULL CHECK (scope IN ('tenant','property','private')),
property_id text,
name_i18n jsonb NOT NULL,
description_i18n jsonb,
widget_ids text[] NOT NULL DEFAULT '{}',
shared_with jsonb NOT NULL DEFAULT '[]'::jsonb,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0,
CHECK (scope <> 'property' OR property_id IS NOT NULL)
);

ALTER TABLE analytics.dashboards ENABLE ROW LEVEL SECURITY;
CREATE POLICY dashboards_tenant_isolation ON analytics.dashboards
USING (tenant_id = current_setting('app.tenant_id', true));

CREATE INDEX ix_dashboards_tenant_owner ON analytics.dashboards (tenant_id, owner_user_id);

CREATE TABLE analytics.widgets (
id text PRIMARY KEY, -- wid_<ulid>
dashboard_id text NOT NULL REFERENCES analytics.dashboards(id),
tenant_id text NOT NULL,
spec jsonb NOT NULL,
position jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0
);
CREATE INDEX ix_widgets_dashboard ON analytics.widgets (dashboard_id);

ALTER TABLE analytics.widgets ENABLE ROW LEVEL SECURITY;
CREATE POLICY widgets_tenant_isolation ON analytics.widgets
USING (tenant_id = current_setting('app.tenant_id', true));

CREATE TABLE analytics.queries (
id text PRIMARY KEY, -- qry_<ulid>
tenant_id text NOT NULL,
owner_user_id text NOT NULL,
name_i18n jsonb NOT NULL,
source_tables text[] NOT NULL,
schema_version int NOT NULL,
sql_template text NOT NULL,
params jsonb NOT NULL,
byte_cap jsonb NOT NULL,
archived boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0
);

ALTER TABLE analytics.queries ENABLE ROW LEVEL SECURITY;
CREATE POLICY queries_tenant_isolation ON analytics.queries
USING (tenant_id = current_setting('app.tenant_id', true));

3.4 analytics.etl_jobs, analytics.etl_runs

CREATE TABLE analytics.etl_jobs (
id text PRIMARY KEY, -- etl_<ulid>
projection_id text NOT NULL REFERENCES analytics.projections(id),
trigger text NOT NULL CHECK (trigger IN ('cron','on_demand','backfill','event')),
window_from timestamptz NOT NULL,
window_to timestamptz NOT NULL,
scheduled_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0,
CHECK (window_from < window_to)
);
CREATE INDEX ix_etl_jobs_projection ON analytics.etl_jobs (projection_id, scheduled_at DESC);

CREATE TABLE analytics.etl_runs (
id text PRIMARY KEY, -- etr_<ulid>
job_id text NOT NULL REFERENCES analytics.etl_jobs(id),
status text NOT NULL CHECK (status IN ('queued','running','succeeded','failed','cancelled')),
bytes_scanned bigint NOT NULL DEFAULT 0,
bytes_written bigint NOT NULL DEFAULT 0,
rows_affected bigint NOT NULL DEFAULT 0,
slot_ms bigint NOT NULL DEFAULT 0,
started_at timestamptz,
succeeded_at timestamptz,
failed_at timestamptz,
error_code text,
error_detail text,
retry_of_run_id text,
attempt int NOT NULL DEFAULT 1,
warehouse_job_id text,
version int NOT NULL DEFAULT 0
);
CREATE INDEX ix_etl_runs_job_status ON analytics.etl_runs (job_id, status);
CREATE INDEX ix_etl_runs_status_started ON analytics.etl_runs (status, started_at DESC);

3.5 analytics.dq_checks, analytics.dq_latest_results

CREATE TABLE analytics.dq_checks (
id text PRIMARY KEY, -- dqc_<ulid>
key text NOT NULL UNIQUE,
rule jsonb NOT NULL,
severity text NOT NULL CHECK (severity IN ('info','warn','critical')),
enabled boolean NOT NULL DEFAULT true,
version int NOT NULL DEFAULT 0
);

CREATE TABLE analytics.dq_latest_results (
check_id text PRIMARY KEY REFERENCES analytics.dq_checks(id),
observed_value numeric NOT NULL,
expected_value numeric,
status text NOT NULL CHECK (status IN ('ok','warn','critical','error')),
observed_at timestamptz NOT NULL,
bigquery_job_id text
);

History rows are written to the dq_results BigQuery dataset (§ 5.3) instead of bloating Postgres.

3.6 analytics.tenant_query_budget

CREATE TABLE analytics.tenant_query_budget (
tenant_id text PRIMARY KEY,
daily_limit_bytes bigint NOT NULL,
current_window_start date NOT NULL,
current_window_bytes bigint NOT NULL DEFAULT 0,
updated_at timestamptz NOT NULL DEFAULT now()
);

3.7 analytics.outbox and analytics.inbox_processed

Standard platform shapes.

CREATE TABLE analytics.outbox (
id bigserial PRIMARY KEY,
envelope_id text NOT NULL UNIQUE,
tenant_id text NOT NULL,
subject text NOT NULL,
ordering_key text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
published_at timestamptz,
published_attempts int NOT NULL DEFAULT 0
);
CREATE INDEX ix_outbox_unpublished ON analytics.outbox (created_at) WHERE published_at IS NULL;

CREATE TABLE analytics.inbox_processed (
subject text NOT NULL,
message_id text NOT NULL,
processed_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (subject, message_id)
);

4. BigQuery datasets

DatasetPurposeRetention
events_rawRaw event landing per subject730 d for operational, 7 y for operational_7y, 10 y object-locked policy mirrored from GCS for regulated
analytics_curatedCurated fact + dim tables (versioned)rolling 5 y default, configurable per table
dq_resultsDQ result history2 y
tenant_viewsPer-tenant authorized viewsn/a (views)
analytics_stagingBackfill staging tables (TTL 7 d)7 d

5. Curated table BigQuery DDL (selected)

5.1 analytics_curated.fact_reservation_v1

CREATE TABLE `proj.analytics_curated.fact_reservation_v1`
(
tenant_id STRING NOT NULL,
reservation_id STRING NOT NULL,
property_id STRING NOT NULL,
room_type_id STRING NOT NULL,
rate_plan_id STRING NOT NULL,
channel STRING NOT NULL, -- 'direct'|'ota_booking'|'ota_expedia'|'meta'|'walk_in'
guest_country STRING,
arrival_date DATE NOT NULL,
departure_date DATE NOT NULL,
business_date DATE NOT NULL, -- often arrival_date; used for partitioning
nights INT64 NOT NULL,
status STRING NOT NULL, -- 'confirmed'|'cancelled'|'no_show'|'checked_in'|'checked_out'
adults INT64 NOT NULL,
children INT64 NOT NULL,
rooms INT64 NOT NULL,
gross_amount NUMERIC,
taxes_amount NUMERIC,
currency STRING,
lead_time_days INT64,
confirmed_at TIMESTAMP,
cancelled_at TIMESTAMP,
no_show_at TIMESTAMP,
checked_in_at TIMESTAMP,
checked_out_at TIMESTAMP,
_event_id STRING NOT NULL, -- last event applied
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, room_type_id
OPTIONS(
description = "One row per reservation (latest state). Idempotent MERGE keyed on (tenant_id, reservation_id).",
partition_expiration_days = 1825,
require_partition_filter = true
);

5.2 analytics_curated.fact_payment_v1

CREATE TABLE `proj.analytics_curated.fact_payment_v1`
(
tenant_id STRING NOT NULL,
payment_id STRING NOT NULL,
reservation_id STRING,
property_id STRING NOT NULL,
channel STRING,
method STRING NOT NULL, -- 'cash'|'card'|'transfer'|'wallet'|'ota_settlement'
amount NUMERIC NOT NULL,
currency STRING NOT NULL,
fx_to_tenant_ccy NUMERIC,
business_date DATE NOT NULL,
status STRING NOT NULL, -- 'authorized'|'captured'|'refunded'|'failed'|'voided'
refund_of_id STRING,
posted_at TIMESTAMP,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, method
OPTIONS(require_partition_filter = true);

5.3 analytics_curated.fact_housekeeping_task_v1

CREATE TABLE `proj.analytics_curated.fact_housekeeping_task_v1`
(
tenant_id STRING NOT NULL,
task_id STRING NOT NULL,
property_id STRING NOT NULL,
room_id STRING NOT NULL,
task_type STRING NOT NULL, -- 'turn_down'|'check_out_clean'|'maintenance'|'…'
assigned_to_user_id STRING,
status STRING NOT NULL, -- 'queued'|'in_progress'|'done'|'failed'
business_date DATE NOT NULL,
duration_seconds INT64,
started_at TIMESTAMP,
completed_at TIMESTAMP,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, status
OPTIONS(require_partition_filter = true);

5.4 analytics_curated.fact_lock_action_v1

CREATE TABLE `proj.analytics_curated.fact_lock_action_v1`
(
tenant_id STRING NOT NULL,
action_id STRING NOT NULL,
property_id STRING NOT NULL,
room_id STRING,
lock_id STRING NOT NULL,
actor_kind STRING NOT NULL, -- 'guest'|'staff'|'system'
actor_id STRING,
action STRING NOT NULL, -- 'unlock'|'lock'|'denied'|'low_battery'|'tamper'
result STRING NOT NULL, -- 'success'|'failure'
business_date DATE NOT NULL,
occurred_at TIMESTAMP NOT NULL,
latency_ms INT64,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, action;

5.5 Dimensions

CREATE TABLE `proj.analytics_curated.dim_property_v1` (
tenant_id STRING NOT NULL,
property_id STRING NOT NULL,
name STRING,
city STRING,
country STRING,
timezone STRING,
rooms_count INT64,
open_date DATE,
closed_date DATE,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
) CLUSTER BY tenant_id, property_id;

CREATE TABLE `proj.analytics_curated.dim_room_type_v1` (
tenant_id STRING NOT NULL,
room_type_id STRING NOT NULL,
property_id STRING NOT NULL,
code STRING,
capacity INT64,
base_rate NUMERIC,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
) CLUSTER BY tenant_id, property_id, room_type_id;

CREATE TABLE `proj.analytics_curated.dim_tenant_v1` (
tenant_id STRING NOT NULL,
display_name STRING,
data_residency STRING,
plan STRING,
created_at TIMESTAMP,
status STRING,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
) CLUSTER BY tenant_id;

CREATE TABLE `proj.analytics_curated.dim_calendar_v1` (
date DATE NOT NULL,
year INT64 NOT NULL,
quarter INT64 NOT NULL,
month INT64 NOT NULL,
iso_week INT64 NOT NULL,
day_of_week INT64 NOT NULL,
is_weekend BOOL NOT NULL,
is_holiday BOOL NOT NULL,
holiday_name STRING
);

5.6 analytics_curated.fact_demand_forecast_v1

CREATE TABLE `proj.analytics_curated.fact_demand_forecast_v1` (
tenant_id STRING NOT NULL,
property_id STRING NOT NULL,
room_type_id STRING NOT NULL,
business_date DATE NOT NULL,
horizon_days INT64 NOT NULL,
predicted_occupancy_pct NUMERIC,
predicted_adr NUMERIC,
ci80_low_occupancy NUMERIC,
ci80_high_occupancy NUMERIC,
model_id STRING NOT NULL,
model_version STRING NOT NULL,
produced_at TIMESTAMP NOT NULL,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, room_type_id;

6. Frozen metric SQL (canonical examples)

These are the exact SQL bodies stored in metric_definitions.sql_template. The orchestrator binds @tenant_id, @from, @to, etc.

6.1 reservation.occupancy_pct v1

WITH room_nights_available AS (
SELECT d.date AS business_date, p.rooms_count AS rooms_avail
FROM `proj.analytics_curated.dim_calendar_v1` d
CROSS JOIN `proj.analytics_curated.dim_property_v1` p
WHERE p.tenant_id = @tenant_id
AND (@property_id IS NULL OR p.property_id = @property_id)
AND d.date BETWEEN @from AND @to
),
room_nights_sold AS (
SELECT business_date, SUM(rooms) AS sold
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = @tenant_id
AND status IN ('confirmed','checked_in','checked_out')
AND (@property_id IS NULL OR property_id = @property_id)
AND business_date BETWEEN @from AND @to
GROUP BY business_date
)
SELECT a.business_date AS date,
SAFE_DIVIDE(SUM(s.sold), SUM(a.rooms_avail)) * 100 AS value
FROM room_nights_available a
LEFT JOIN room_nights_sold s USING (business_date)
GROUP BY date
ORDER BY date;

6.2 reservation.adr v1

SELECT business_date AS date,
SAFE_DIVIDE(SUM(gross_amount - IFNULL(taxes_amount,0)), SUM(rooms)) AS value
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = @tenant_id
AND status IN ('checked_out','checked_in')
AND (@property_id IS NULL OR property_id = @property_id)
AND business_date BETWEEN @from AND @to
GROUP BY date
ORDER BY date;

6.3 reservation.revpar v1

WITH adr AS ( /* same as 6.2 */ ),
occ AS ( /* same as 6.1 */ )
SELECT a.date,
(adr.value * occ.value / 100) AS value
FROM adr JOIN occ USING (date);

6.4 funnel.meta_to_book_pct v1

SELECT business_date AS date,
SAFE_DIVIDE(
COUNTIF(channel='meta' AND status IN ('confirmed','checked_in','checked_out')),
COUNTIF(channel='meta')
) * 100 AS value
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = @tenant_id
AND business_date BETWEEN @from AND @to
GROUP BY date;

7. Authorized views (RLS)

Per SECURITY_MODEL §3, every curated table is exposed to tenant identities only via authorized views in tenant_views.<tenant_id>.<table> (or a single tenant_views.fact_reservation_v1 view that resolves caller identity). Pattern:

CREATE VIEW `proj.tenant_views.fact_reservation_v1` AS
SELECT *
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = SESSION_USER_TENANT_ID();

SESSION_USER_TENANT_ID() is a UDF that resolves the caller's binding from tenant_views.access_bindings (per §8). The Looker Studio Community Connector uses these views; the Query API binds @tenant_id directly and never trusts caller-provided values.


8. Tenant access bindings

CREATE TABLE `proj.tenant_views.access_bindings` (
google_principal STRING NOT NULL, -- 'user:foo@hotel.com' or 'serviceAccount:looker-studio-…'
tenant_id STRING NOT NULL,
property_ids ARRAY<STRING>,
granted_at TIMESTAMP NOT NULL,
granted_by STRING NOT NULL,
PRIMARY KEY(google_principal, tenant_id) NOT ENFORCED
);

Binding rows are managed by BootstrapTenantAnalyticsUseCase and RebindAuthorizedViewsUseCase. Removed when a user permission is revoked or a tenant is deleted.


9. Indexes & query patterns

Hot queryIndex used
Find due ETL jobsix_etl_jobs_projection + run scan
Latest run per projectionix_etl_runs_status_started
Dashboard list per userix_dashboards_tenant_owner
Widget readPK + dashboard FK
Outbox publish loopix_outbox_unpublished

10. Backups & retention

  • Cloud SQL: daily backup, PITR 14 d.
  • BigQuery: time-travel 7 d (default), table snapshot exports nightly to GCS for catastrophic recovery (last 30 d).
  • events_raw.* mirrors GCS event archive retention class; deletion respects regulated 10 y object-lock when applicable (we never DELETE; we use partition expiration and tenant-scoped purge).
  • DQ results retained 2 y in BigQuery.

11. Migrations

  • Postgres migrations under db/migrations/ (Drizzle), forward-only.
  • BigQuery DDL applied through Terraform (terraform/bigquery/); curated table changes use the _v<n> coexistence pattern from MIGRATION_PLAN.

Cross-references: SECURITY_MODEL §3, APPLICATION_LOGIC §4, docs/06.