DATA_MODEL — analytics-service
Sibling: DOMAIN_MODEL · APPLICATION_LOGIC · SECURITY_MODEL · platform anchor: docs/06 Data Models, docs/standards/NAMING
analytics-service owns two storage surfaces:
- Cloud SQL Postgres (
analyticsschema) — metadata: projection definitions, metric definitions, dashboards, widgets, queries, ETL jobs/runs, DQ checks/results pointers, outbox/inbox. - BigQuery (
events_raw.*,analytics_curated.*,dq_results.*) — the warehouse layer.
1. Identity strategy
ULIDs with prefixes per DOMAIN_MODEL §1. Prefixes (prj_, met_, dsh_, wid_, qry_, etl_, etr_, dqc_, dqr_) are added to the canonical registry in docs/standards/NAMING §6.
Curated table rows are keyed by domain natural keys (reservation_id, payment_id, etc.) plus tenant_id and _event_id for idempotent MERGE.
2. Storage placement
| Store | Purpose |
|---|---|
Cloud SQL Postgres analytics | Projection/metric/dashboard metadata, ETL state, DQ checks, outbox, inbox |
BigQuery events_raw.* | Raw event landing (one table per subject, partitioned by ingestion ts, clustered by tenant_id) |
BigQuery analytics_curated.* | Curated fact + dim tables (versioned _v<n>) |
BigQuery dq_results.* | Historical DQ check results (immutable append) |
BigQuery tenant_views.* | Per-tenant authorized views over curated tables |
| Memorystore Redis | Widget data cache, byte-budget counters, hot metric cache |
| GCS staging buckets | Backfill exports + Looker Studio embed signing material (key refs only) |
3. Cloud SQL DDL (metadata)
3.1 analytics.projections
CREATE TABLE analytics.projections (
id text PRIMARY KEY, -- prj_<ulid>
key text NOT NULL, -- 'fact_reservation'
schema_version int NOT NULL CHECK (schema_version >= 1),
target_dataset text NOT NULL,
target_table text NOT NULL, -- 'fact_reservation_v1'
partitioning jsonb NOT NULL,
clustering text[] NOT NULL,
source_query text NOT NULL, -- MERGE … USING (…) ON …
merge_key text[] NOT NULL,
refresh_policy text NOT NULL CHECK (refresh_policy IN ('incremental','full')),
window_minutes int NOT NULL CHECK (window_minutes >= 5),
freshness jsonb NOT NULL,
archived boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0, -- OCC
UNIQUE (key, schema_version)
);
3.2 analytics.metric_definitions
CREATE TABLE analytics.metric_definitions (
id text PRIMARY KEY, -- met_<ulid>
tenant_id text, -- NULL = platform-shared
key text NOT NULL, -- 'reservation.occupancy_pct'
metric_version int NOT NULL CHECK (metric_version >= 1),
display jsonb NOT NULL,
unit jsonb NOT NULL,
grain text NOT NULL CHECK (grain IN ('minute','hour','day','week','month','quarter','year')),
dimensions jsonb NOT NULL,
source_tables text[] NOT NULL,
schema_version int NOT NULL,
params jsonb NOT NULL,
sql_template text NOT NULL,
byte_cap jsonb NOT NULL,
freshness jsonb NOT NULL,
archived boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0,
UNIQUE (tenant_id, key, metric_version)
);
ALTER TABLE analytics.metric_definitions ENABLE ROW LEVEL SECURITY;
CREATE POLICY metric_definitions_tenant_isolation ON analytics.metric_definitions
USING (tenant_id IS NULL OR tenant_id = current_setting('app.tenant_id', true));
3.3 analytics.dashboards, analytics.widgets, analytics.queries
CREATE TABLE analytics.dashboards (
id text PRIMARY KEY, -- dsh_<ulid>
tenant_id text NOT NULL,
owner_user_id text NOT NULL,
scope text NOT NULL CHECK (scope IN ('tenant','property','private')),
property_id text,
name_i18n jsonb NOT NULL,
description_i18n jsonb,
widget_ids text[] NOT NULL DEFAULT '{}',
shared_with jsonb NOT NULL DEFAULT '[]'::jsonb,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0,
CHECK (scope <> 'property' OR property_id IS NOT NULL)
);
ALTER TABLE analytics.dashboards ENABLE ROW LEVEL SECURITY;
CREATE POLICY dashboards_tenant_isolation ON analytics.dashboards
USING (tenant_id = current_setting('app.tenant_id', true));
CREATE INDEX ix_dashboards_tenant_owner ON analytics.dashboards (tenant_id, owner_user_id);
CREATE TABLE analytics.widgets (
id text PRIMARY KEY, -- wid_<ulid>
dashboard_id text NOT NULL REFERENCES analytics.dashboards(id),
tenant_id text NOT NULL,
spec jsonb NOT NULL,
position jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0
);
CREATE INDEX ix_widgets_dashboard ON analytics.widgets (dashboard_id);
ALTER TABLE analytics.widgets ENABLE ROW LEVEL SECURITY;
CREATE POLICY widgets_tenant_isolation ON analytics.widgets
USING (tenant_id = current_setting('app.tenant_id', true));
CREATE TABLE analytics.queries (
id text PRIMARY KEY, -- qry_<ulid>
tenant_id text NOT NULL,
owner_user_id text NOT NULL,
name_i18n jsonb NOT NULL,
source_tables text[] NOT NULL,
schema_version int NOT NULL,
sql_template text NOT NULL,
params jsonb NOT NULL,
byte_cap jsonb NOT NULL,
archived boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0
);
ALTER TABLE analytics.queries ENABLE ROW LEVEL SECURITY;
CREATE POLICY queries_tenant_isolation ON analytics.queries
USING (tenant_id = current_setting('app.tenant_id', true));
3.4 analytics.etl_jobs, analytics.etl_runs
CREATE TABLE analytics.etl_jobs (
id text PRIMARY KEY, -- etl_<ulid>
projection_id text NOT NULL REFERENCES analytics.projections(id),
trigger text NOT NULL CHECK (trigger IN ('cron','on_demand','backfill','event')),
window_from timestamptz NOT NULL,
window_to timestamptz NOT NULL,
scheduled_at timestamptz NOT NULL DEFAULT now(),
version int NOT NULL DEFAULT 0,
CHECK (window_from < window_to)
);
CREATE INDEX ix_etl_jobs_projection ON analytics.etl_jobs (projection_id, scheduled_at DESC);
CREATE TABLE analytics.etl_runs (
id text PRIMARY KEY, -- etr_<ulid>
job_id text NOT NULL REFERENCES analytics.etl_jobs(id),
status text NOT NULL CHECK (status IN ('queued','running','succeeded','failed','cancelled')),
bytes_scanned bigint NOT NULL DEFAULT 0,
bytes_written bigint NOT NULL DEFAULT 0,
rows_affected bigint NOT NULL DEFAULT 0,
slot_ms bigint NOT NULL DEFAULT 0,
started_at timestamptz,
succeeded_at timestamptz,
failed_at timestamptz,
error_code text,
error_detail text,
retry_of_run_id text,
attempt int NOT NULL DEFAULT 1,
warehouse_job_id text,
version int NOT NULL DEFAULT 0
);
CREATE INDEX ix_etl_runs_job_status ON analytics.etl_runs (job_id, status);
CREATE INDEX ix_etl_runs_status_started ON analytics.etl_runs (status, started_at DESC);
3.5 analytics.dq_checks, analytics.dq_latest_results
CREATE TABLE analytics.dq_checks (
id text PRIMARY KEY, -- dqc_<ulid>
key text NOT NULL UNIQUE,
rule jsonb NOT NULL,
severity text NOT NULL CHECK (severity IN ('info','warn','critical')),
enabled boolean NOT NULL DEFAULT true,
version int NOT NULL DEFAULT 0
);
CREATE TABLE analytics.dq_latest_results (
check_id text PRIMARY KEY REFERENCES analytics.dq_checks(id),
observed_value numeric NOT NULL,
expected_value numeric,
status text NOT NULL CHECK (status IN ('ok','warn','critical','error')),
observed_at timestamptz NOT NULL,
bigquery_job_id text
);
History rows are written to the dq_results BigQuery dataset (§ 5.3) instead of bloating Postgres.
3.6 analytics.tenant_query_budget
CREATE TABLE analytics.tenant_query_budget (
tenant_id text PRIMARY KEY,
daily_limit_bytes bigint NOT NULL,
current_window_start date NOT NULL,
current_window_bytes bigint NOT NULL DEFAULT 0,
updated_at timestamptz NOT NULL DEFAULT now()
);
3.7 analytics.outbox and analytics.inbox_processed
Standard platform shapes.
CREATE TABLE analytics.outbox (
id bigserial PRIMARY KEY,
envelope_id text NOT NULL UNIQUE,
tenant_id text NOT NULL,
subject text NOT NULL,
ordering_key text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
published_at timestamptz,
published_attempts int NOT NULL DEFAULT 0
);
CREATE INDEX ix_outbox_unpublished ON analytics.outbox (created_at) WHERE published_at IS NULL;
CREATE TABLE analytics.inbox_processed (
subject text NOT NULL,
message_id text NOT NULL,
processed_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (subject, message_id)
);
4. BigQuery datasets
| Dataset | Purpose | Retention |
|---|---|---|
events_raw | Raw event landing per subject | 730 d for operational, 7 y for operational_7y, 10 y object-locked policy mirrored from GCS for regulated |
analytics_curated | Curated fact + dim tables (versioned) | rolling 5 y default, configurable per table |
dq_results | DQ result history | 2 y |
tenant_views | Per-tenant authorized views | n/a (views) |
analytics_staging | Backfill staging tables (TTL 7 d) | 7 d |
5. Curated table BigQuery DDL (selected)
5.1 analytics_curated.fact_reservation_v1
CREATE TABLE `proj.analytics_curated.fact_reservation_v1`
(
tenant_id STRING NOT NULL,
reservation_id STRING NOT NULL,
property_id STRING NOT NULL,
room_type_id STRING NOT NULL,
rate_plan_id STRING NOT NULL,
channel STRING NOT NULL, -- 'direct'|'ota_booking'|'ota_expedia'|'meta'|'walk_in'
guest_country STRING,
arrival_date DATE NOT NULL,
departure_date DATE NOT NULL,
business_date DATE NOT NULL, -- often arrival_date; used for partitioning
nights INT64 NOT NULL,
status STRING NOT NULL, -- 'confirmed'|'cancelled'|'no_show'|'checked_in'|'checked_out'
adults INT64 NOT NULL,
children INT64 NOT NULL,
rooms INT64 NOT NULL,
gross_amount NUMERIC,
taxes_amount NUMERIC,
currency STRING,
lead_time_days INT64,
confirmed_at TIMESTAMP,
cancelled_at TIMESTAMP,
no_show_at TIMESTAMP,
checked_in_at TIMESTAMP,
checked_out_at TIMESTAMP,
_event_id STRING NOT NULL, -- last event applied
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, room_type_id
OPTIONS(
description = "One row per reservation (latest state). Idempotent MERGE keyed on (tenant_id, reservation_id).",
partition_expiration_days = 1825,
require_partition_filter = true
);
5.2 analytics_curated.fact_payment_v1
CREATE TABLE `proj.analytics_curated.fact_payment_v1`
(
tenant_id STRING NOT NULL,
payment_id STRING NOT NULL,
reservation_id STRING,
property_id STRING NOT NULL,
channel STRING,
method STRING NOT NULL, -- 'cash'|'card'|'transfer'|'wallet'|'ota_settlement'
amount NUMERIC NOT NULL,
currency STRING NOT NULL,
fx_to_tenant_ccy NUMERIC,
business_date DATE NOT NULL,
status STRING NOT NULL, -- 'authorized'|'captured'|'refunded'|'failed'|'voided'
refund_of_id STRING,
posted_at TIMESTAMP,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, method
OPTIONS(require_partition_filter = true);
5.3 analytics_curated.fact_housekeeping_task_v1
CREATE TABLE `proj.analytics_curated.fact_housekeeping_task_v1`
(
tenant_id STRING NOT NULL,
task_id STRING NOT NULL,
property_id STRING NOT NULL,
room_id STRING NOT NULL,
task_type STRING NOT NULL, -- 'turn_down'|'check_out_clean'|'maintenance'|'…'
assigned_to_user_id STRING,
status STRING NOT NULL, -- 'queued'|'in_progress'|'done'|'failed'
business_date DATE NOT NULL,
duration_seconds INT64,
started_at TIMESTAMP,
completed_at TIMESTAMP,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, status
OPTIONS(require_partition_filter = true);
5.4 analytics_curated.fact_lock_action_v1
CREATE TABLE `proj.analytics_curated.fact_lock_action_v1`
(
tenant_id STRING NOT NULL,
action_id STRING NOT NULL,
property_id STRING NOT NULL,
room_id STRING,
lock_id STRING NOT NULL,
actor_kind STRING NOT NULL, -- 'guest'|'staff'|'system'
actor_id STRING,
action STRING NOT NULL, -- 'unlock'|'lock'|'denied'|'low_battery'|'tamper'
result STRING NOT NULL, -- 'success'|'failure'
business_date DATE NOT NULL,
occurred_at TIMESTAMP NOT NULL,
latency_ms INT64,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, action;
5.5 Dimensions
CREATE TABLE `proj.analytics_curated.dim_property_v1` (
tenant_id STRING NOT NULL,
property_id STRING NOT NULL,
name STRING,
city STRING,
country STRING,
timezone STRING,
rooms_count INT64,
open_date DATE,
closed_date DATE,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
) CLUSTER BY tenant_id, property_id;
CREATE TABLE `proj.analytics_curated.dim_room_type_v1` (
tenant_id STRING NOT NULL,
room_type_id STRING NOT NULL,
property_id STRING NOT NULL,
code STRING,
capacity INT64,
base_rate NUMERIC,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
) CLUSTER BY tenant_id, property_id, room_type_id;
CREATE TABLE `proj.analytics_curated.dim_tenant_v1` (
tenant_id STRING NOT NULL,
display_name STRING,
data_residency STRING,
plan STRING,
created_at TIMESTAMP,
status STRING,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
) CLUSTER BY tenant_id;
CREATE TABLE `proj.analytics_curated.dim_calendar_v1` (
date DATE NOT NULL,
year INT64 NOT NULL,
quarter INT64 NOT NULL,
month INT64 NOT NULL,
iso_week INT64 NOT NULL,
day_of_week INT64 NOT NULL,
is_weekend BOOL NOT NULL,
is_holiday BOOL NOT NULL,
holiday_name STRING
);
5.6 analytics_curated.fact_demand_forecast_v1
CREATE TABLE `proj.analytics_curated.fact_demand_forecast_v1` (
tenant_id STRING NOT NULL,
property_id STRING NOT NULL,
room_type_id STRING NOT NULL,
business_date DATE NOT NULL,
horizon_days INT64 NOT NULL,
predicted_occupancy_pct NUMERIC,
predicted_adr NUMERIC,
ci80_low_occupancy NUMERIC,
ci80_high_occupancy NUMERIC,
model_id STRING NOT NULL,
model_version STRING NOT NULL,
produced_at TIMESTAMP NOT NULL,
_event_id STRING NOT NULL,
_ingested_at TIMESTAMP NOT NULL,
_schema_version INT64 NOT NULL DEFAULT 1
)
PARTITION BY business_date
CLUSTER BY tenant_id, property_id, room_type_id;
6. Frozen metric SQL (canonical examples)
These are the exact SQL bodies stored in metric_definitions.sql_template. The orchestrator binds @tenant_id, @from, @to, etc.
6.1 reservation.occupancy_pct v1
WITH room_nights_available AS (
SELECT d.date AS business_date, p.rooms_count AS rooms_avail
FROM `proj.analytics_curated.dim_calendar_v1` d
CROSS JOIN `proj.analytics_curated.dim_property_v1` p
WHERE p.tenant_id = @tenant_id
AND (@property_id IS NULL OR p.property_id = @property_id)
AND d.date BETWEEN @from AND @to
),
room_nights_sold AS (
SELECT business_date, SUM(rooms) AS sold
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = @tenant_id
AND status IN ('confirmed','checked_in','checked_out')
AND (@property_id IS NULL OR property_id = @property_id)
AND business_date BETWEEN @from AND @to
GROUP BY business_date
)
SELECT a.business_date AS date,
SAFE_DIVIDE(SUM(s.sold), SUM(a.rooms_avail)) * 100 AS value
FROM room_nights_available a
LEFT JOIN room_nights_sold s USING (business_date)
GROUP BY date
ORDER BY date;
6.2 reservation.adr v1
SELECT business_date AS date,
SAFE_DIVIDE(SUM(gross_amount - IFNULL(taxes_amount,0)), SUM(rooms)) AS value
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = @tenant_id
AND status IN ('checked_out','checked_in')
AND (@property_id IS NULL OR property_id = @property_id)
AND business_date BETWEEN @from AND @to
GROUP BY date
ORDER BY date;
6.3 reservation.revpar v1
WITH adr AS ( /* same as 6.2 */ ),
occ AS ( /* same as 6.1 */ )
SELECT a.date,
(adr.value * occ.value / 100) AS value
FROM adr JOIN occ USING (date);
6.4 funnel.meta_to_book_pct v1
SELECT business_date AS date,
SAFE_DIVIDE(
COUNTIF(channel='meta' AND status IN ('confirmed','checked_in','checked_out')),
COUNTIF(channel='meta')
) * 100 AS value
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = @tenant_id
AND business_date BETWEEN @from AND @to
GROUP BY date;
7. Authorized views (RLS)
Per SECURITY_MODEL §3, every curated table is exposed to tenant identities only via authorized views in tenant_views.<tenant_id>.<table> (or a single tenant_views.fact_reservation_v1 view that resolves caller identity). Pattern:
CREATE VIEW `proj.tenant_views.fact_reservation_v1` AS
SELECT *
FROM `proj.analytics_curated.fact_reservation_v1`
WHERE tenant_id = SESSION_USER_TENANT_ID();
SESSION_USER_TENANT_ID() is a UDF that resolves the caller's binding from tenant_views.access_bindings (per §8). The Looker Studio Community Connector uses these views; the Query API binds @tenant_id directly and never trusts caller-provided values.
8. Tenant access bindings
CREATE TABLE `proj.tenant_views.access_bindings` (
google_principal STRING NOT NULL, -- 'user:foo@hotel.com' or 'serviceAccount:looker-studio-…'
tenant_id STRING NOT NULL,
property_ids ARRAY<STRING>,
granted_at TIMESTAMP NOT NULL,
granted_by STRING NOT NULL,
PRIMARY KEY(google_principal, tenant_id) NOT ENFORCED
);
Binding rows are managed by BootstrapTenantAnalyticsUseCase and RebindAuthorizedViewsUseCase. Removed when a user permission is revoked or a tenant is deleted.
9. Indexes & query patterns
| Hot query | Index used |
|---|---|
| Find due ETL jobs | ix_etl_jobs_projection + run scan |
| Latest run per projection | ix_etl_runs_status_started |
| Dashboard list per user | ix_dashboards_tenant_owner |
| Widget read | PK + dashboard FK |
| Outbox publish loop | ix_outbox_unpublished |
10. Backups & retention
- Cloud SQL: daily backup, PITR 14 d.
- BigQuery: time-travel 7 d (default), table snapshot exports nightly to GCS for catastrophic recovery (last 30 d).
events_raw.*mirrors GCS event archive retention class; deletion respects regulated 10 y object-lock when applicable (we never DELETE; we use partition expiration and tenant-scoped purge).- DQ results retained 2 y in BigQuery.
11. Migrations
- Postgres migrations under
db/migrations/(Drizzle), forward-only. - BigQuery DDL applied through Terraform (
terraform/bigquery/); curated table changes use the_v<n>coexistence pattern from MIGRATION_PLAN.
Cross-references: SECURITY_MODEL §3, APPLICATION_LOGIC §4, docs/06.