DEPLOYMENT_TOPOLOGY — analytics-service
Sibling: SECURITY_MODEL · OBSERVABILITY · platform anchor: docs/02 §10 Deployment
analytics-service ships as four Cloud Run units per region plus orchestrated GCP-native components:
| Unit | Purpose | Runtime |
|---|---|---|
analytics-api | REST surface (queries, dashboards, widgets, metrics, internal sync) | Cloud Run service, Node 20 LTS |
analytics-pubsub-sink | Pub/Sub → events_raw.* BigQuery sink + DLQ | Cloud Run service (push subscriptions) |
analytics-etl-worker | Scheduled aggregation + DQ + forecast writeback | Cloud Run Job invoked by Cloud Workflows + Cloud Composer |
analytics-looker-broker | Looker Studio embed token + binding management | Cloud Run service (internal-only) |
Container base: node:20-alpine + minimal libs; no headless Chromium.
1. Compute & orchestration
| Unit | Min | Max | Concurrency | CPU / Mem | Notes |
|---|---|---|---|---|---|
analytics-api | 2 | 50 | 80 | 2 / 4 GiB | Stateless; warm pool keeps latency stable |
analytics-pubsub-sink | 2 | 30 | 100 | 1 / 2 GiB | Push subscription with ack deadline 60 s |
analytics-etl-worker | n/a | parallel up to 20 jobs | 1 | 4 / 8 GiB | Cloud Run Jobs; per-job timeout 60 min |
analytics-looker-broker | 1 | 5 | 40 | 0.5 / 1 GiB | Mints embed JWTs |
Orchestration:
- Cloud Workflows drive ETL DAGs (extract → transform → load → DQ → publish). Each step calls
analytics-etl-workerjob with parameters. - Cloud Composer (Airflow 2.x) used only for cross-domain DAGs (e.g., demand-forecast feature pipeline that spans multiple services); single small environment per region.
- Cloud Scheduler triggers Workflows on cron (per metric definition cadence).
- Pub/Sub push subscriptions for the sink; pull for control-plane events.
2. Regions & residency
| Region | Tenants | Cloud SQL | BigQuery dataset region | Composer |
|---|---|---|---|---|
europe-west3 | EU/MENA tenants | regional HA | EU (multi-region) → optional regional pin | europe-west3 |
asia-south1 | South Asia (PK/IN tenants where allowed) | regional HA | asia-south1 | asia-south1 |
me-central1 | GCC | regional HA | me-central1 | me-central1 |
Cross-region replication is forbidden. Per-tenant residency is decided at tenant creation by tenant-service and recorded; deployments are duplicated across regions.
3. Service accounts (least privilege)
| GSA | Scope |
|---|---|
analytics-api@… | Cloud SQL Client; BigQuery Data Viewer on analytics_curated.*; BigQuery Job User; Secret Manager accessor (signing key) |
analytics-sink@… | Pub/Sub Subscriber; BigQuery Data Editor on events_raw.*; KMS Encrypter |
analytics-etl@… | BigQuery Job User; Data Editor on analytics_curated.* and dq_results.*; Cloud SQL Client (write) |
analytics-looker@… | KMS Signer (embed key); Cloud SQL Client (read on tenant_views.access_bindings) |
looker-studio-<tenantId>@… | Per-tenant principal; granted to authorized views only |
Workload Identity binds Kubernetes/Cloud Run identities to GSAs. No JSON keys ever materialized.
4. Networking
- All units
internal-and-cloud-load-balancingingress; public path only via API gateway → BFF. - Egress to BigQuery, Pub/Sub, Secret Manager, KMS via private Google access.
- VPC-SC perimeter encloses BigQuery + GCS + Pub/Sub for analytics; ingress from outside perimeter denied.
5. Configuration (12-factor)
Env-driven; all secrets via Secret Manager refs. Sample (prod):
NODE_ENV=production
SERVICE_NAME=analytics-service
REGION=europe-west3
DATABASE_URL=__resolved_at_boot__
BIGQUERY_PROJECT=ghasi-melmastoon-prod
BIGQUERY_LOCATION=europe-west3
BIGQUERY_CURATED_DATASET=analytics_curated
BIGQUERY_RAW_DATASET=events_raw
PUBSUB_PROJECT=ghasi-melmastoon-prod
PUBSUB_DLQ_TOPIC=analytics.dlq
DEFAULT_QUERY_BYTE_CAP=1073741824 # 1 GiB
DEFAULT_TENANT_DAILY_BUDGET=53687091200 # 50 GiB
LOOKER_EMBED_KMS_KEY=projects/.../cryptoKeys/melmastoon-analytics-embed-signer
AI_ORCHESTRATOR_BASE_URL=https://ai-orchestrator.internal
AI_ORCHESTRATOR_AUDIENCE=https://ai-orchestrator.internal
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.melmastoon.internal
LOG_LEVEL=info
6. Deploy pipeline
Cloud Deploy targets per region:
build → unit-and-integration-tests → schema-drift-check → artifact-registry-push
↓
deploy:dev (auto) → smoke + load smoke
↓
deploy:stg (auto on develop) → Pact verify + DQ replay + canary 10% / 30 min
↓
deploy:prod-eu (manual approve) → canary 10% / 30 min → 100%
↓
deploy:prod-asia / prod-me (parallel after EU green)
Migrations:
- Postgres migrations run from a one-shot Cloud Run Job before traffic shift.
- BigQuery DDL applies via Terraform; new tables ship as additive; renames/breaking changes follow two-phase coexistence (MIGRATION_PLAN).
Rollback: each release is a Cloud Run revision; rollback flips traffic to previous revision in seconds. ETL job rollbacks restore Workflow definition + worker container.
7. Capacity & cost envelope (per-region steady state)
| Resource | Estimate |
|---|---|
analytics-api | 4 instances avg, 12 peak |
analytics-pubsub-sink | 6 instances avg, 18 peak |
analytics-etl-worker | ~120 job runs/day (mixed cadences) |
| Cloud SQL | 2 vCPU / 8 GiB HA |
| BigQuery storage (curated) | ~10 GiB/tenant/year (active), ~3 GiB/tenant/year (long-term) |
| BigQuery slots | reservation 200 baseline + autoscale 200 |
| Composer | small env (3 worker nodes) |
Cost guardrails: per-tenant byte budgets (default 50 GiB/day), reservation autoscale ceiling, snapshot generators auto-paused when budget exceeded (SECURITY_MODEL §9).
8. Disaster recovery
- Postgres: PITR 7 days, daily snapshot 35-day retention; HA replica.
- BigQuery: time-travel 7 days; snapshot tables for curated layer weekly with 90-day retention.
- Workflows / Composer: definitions in IaC (Terraform); restorable by re-applying.
- Pub/Sub: subscription retention 7 days; replay possible from message storage.
- RTO / RPO: RTO 30 min (region failover possible only within residency); RPO 5 min for Postgres, 1 h for curated tables (replayable from raw).
Cross-references: SECURITY_MODEL §3, OBSERVABILITY §10 cost, MIGRATION_PLAN.