Skip to main content

SERVICE_READINESS — analytics-service

Sibling: SERVICE_OVERVIEW · DEPLOYMENT_TOPOLOGY · platform anchor: docs/standards/SERVICE_TEMPLATE

Go/no-go checklist for analytics-service. Each item must be [x] before tenants are routed to this service in a region.


1. Architecture & contracts

  • SERVICE_OVERVIEW reviewed by domain lead and architecture council.
  • DOMAIN_MODEL invariants implemented and unit-tested.
  • APPLICATION_LOGIC ports/adapters separation verified (grep -R "process.env" src/ outside infrastructure/config returns 0 hits).
  • API_CONTRACTS match generated OpenAPI; Pact contracts with bff-backoffice-service verified for all routes.
  • EVENT_SCHEMAS for all published events registered in @melmastoon/event-contracts.
  • Frozen SQL for all published metric definitions exists in infra/bigquery/metrics/*.sql.

2. Storage & data

  • Postgres migrations applied (pnpm migrate:status clean) and schema-drift CI green.
  • BigQuery curated DDL applied via Terraform; _schema_version set per table.
  • All metadata tables have <table>_tenant_isolation RLS policy + tenant-isolation integration test green.
  • Authorized views created in tenant_views.* for every curated table; SESSION_USER_TENANT_ID() UDF deployed.
  • CMEK keys provisioned in target region with rotation policy; deny-by-default IAM verified.
  • tenant_views.access_bindings reconciliation job scheduled and tested.
  • Per-tenant byte budget defaults set; admin override path tested.

3. Security

  • SECURITY_MODEL reviewed by security squad.
  • JWT validation (BFF + internal) enforced on all routes.
  • Cross-tenant integration test (test/integration/tenant-isolation.spec.ts) green.
  • Saved-query parameter binding test green (no string concatenation).
  • Looker Studio embed signing key provisioned in KMS; never exported.
  • PII redaction tests for logs green.
  • gitleaks clean on the release commit.

4. Observability

  • OBSERVABILITY dashboards deployed (Service Health, Pipeline Freshness, Query Economics, Data Quality, AI Usage, Tenant View).
  • All required span attributes emitted (verified via 1 % traffic sample on stg).
  • PagerDuty service analytics-service-<region> configured with on-call rotation.
  • Alerts armed: WidgetQueryErrorRate, CriticalMetricStale, ETLJobFailed, DQCriticalAlert, BigQueryByteBudget80/100, ForecastWritebackFail, PubSubSinkLag.
  • Synthetic checks running (healthz, widget probe, ETL probe, Looker mint).
  • SLO definitions (99.9 / 99.5 / 5-min freshness) wired to error-budget burn alerts.

5. Resilience & performance

  • FAILURE_MODES runbooks present and linked from PagerDuty alerts.
  • Retries + circuit breakers configured for BigQuery and orchestrator clients.
  • DLQ topic analytics.dlq provisioned; replayer tool tested.
  • Load test (TESTING_STRATEGY §6) targets met in stg with 30 % headroom.
  • Chaos game day completed within last 90 days.
  • Cost guardrails: byte budgets, snapshot auto-pause, slot reservation autoscale ceiling.

6. Deployment & ops

  • DEPLOYMENT_TOPOLOGY terraform applied per region with peer review.
  • Cloud Workflows + Composer DAGs deployed; per-job timeouts and retries set.
  • Cloud Run revisions canary-promoted with 30 min observation in stg → prod.
  • Rollback procedure rehearsed (revision flip + Workflow rollback + Terraform plan/apply).
  • Disaster recovery: PITR verified, BQ snapshot restore rehearsed, RTO/RPO documented in DEPLOYMENT_TOPOLOGY §8.

7. Documentation & process

  • LOCAL_DEV_SETUP verified end-to-end by a developer not on the squad.
  • SYNC_CONTRACT reviewed by Electron desktop squad; conflict policies documented per aggregate.
  • AI_INTEGRATION reviewed by AI squad; HITL and off-switch wired.
  • SERVICE_RISK_REGISTER reviewed and signed by tenant ops, security, finance.
  • MIGRATION_PLAN feature flags wired and reversible.
  • On-call rotation defined; primary + secondary identified; runbooks linked.

  • Per-region residency configuration verified (SECURITY_MODEL §10).
  • Right-to-erasure path (tenant.deleted.v1 cascade) verified end-to-end.
  • PII inventory for raw events updated in docs/07 §11.
  • Data Processing Agreement (DPA) addendum reviewed for Looker Studio re-share.
  • Audit anchor inclusion verified (audit events appear in daily Merkle root).

9. AI capability gates

  • All AI capabilities default-off per tenant.
  • Off-switch tested (server-side enforcement).
  • Budget guardrails tested; budget exhaustion path graceful.
  • AI provenance recorded for every AI-affected output (AIProvenance non-null).
  • Forecast writeback validates tenant per row.

10. Sign-offs

RoleNameDate
Service owner
Architecture
Security
Data & analytics
AI
Tenant ops
Finance (cost)
On-call lead

A region is "tenant-ready" only when every checkbox is [x] and all sign-offs collected. Any unchecked box requires a documented exception with owner + expiry.

Cross-references: SERVICE_RISK_REGISTER, MIGRATION_PLAN, DEPLOYMENT_TOPOLOGY.