SERVICE_READINESS — analytics-service
Sibling: SERVICE_OVERVIEW · DEPLOYMENT_TOPOLOGY · platform anchor: docs/standards/SERVICE_TEMPLATE
Go/no-go checklist for analytics-service. Each item must be [x] before tenants are routed to this service in a region.
1. Architecture & contracts
- SERVICE_OVERVIEW reviewed by domain lead and architecture council.
- DOMAIN_MODEL invariants implemented and unit-tested.
- APPLICATION_LOGIC ports/adapters separation verified (
grep -R "process.env" src/outsideinfrastructure/configreturns 0 hits). - API_CONTRACTS match generated OpenAPI; Pact contracts with
bff-backoffice-serviceverified for all routes. - EVENT_SCHEMAS for all published events registered in
@melmastoon/event-contracts. - Frozen SQL for all
publishedmetric definitions exists ininfra/bigquery/metrics/*.sql.
2. Storage & data
- Postgres migrations applied (
pnpm migrate:statusclean) and schema-drift CI green. - BigQuery curated DDL applied via Terraform;
_schema_versionset per table. - All metadata tables have
<table>_tenant_isolationRLS policy + tenant-isolation integration test green. - Authorized views created in
tenant_views.*for every curated table;SESSION_USER_TENANT_ID()UDF deployed. - CMEK keys provisioned in target region with rotation policy; deny-by-default IAM verified.
-
tenant_views.access_bindingsreconciliation job scheduled and tested. - Per-tenant byte budget defaults set; admin override path tested.
3. Security
- SECURITY_MODEL reviewed by security squad.
- JWT validation (BFF + internal) enforced on all routes.
- Cross-tenant integration test (
test/integration/tenant-isolation.spec.ts) green. - Saved-query parameter binding test green (no string concatenation).
- Looker Studio embed signing key provisioned in KMS; never exported.
- PII redaction tests for logs green.
-
gitleaksclean on the release commit.
4. Observability
- OBSERVABILITY dashboards deployed (Service Health, Pipeline Freshness, Query Economics, Data Quality, AI Usage, Tenant View).
- All required span attributes emitted (verified via 1 % traffic sample on stg).
- PagerDuty service
analytics-service-<region>configured with on-call rotation. - Alerts armed: WidgetQueryErrorRate, CriticalMetricStale, ETLJobFailed, DQCriticalAlert, BigQueryByteBudget80/100, ForecastWritebackFail, PubSubSinkLag.
- Synthetic checks running (healthz, widget probe, ETL probe, Looker mint).
- SLO definitions (99.9 / 99.5 / 5-min freshness) wired to error-budget burn alerts.
5. Resilience & performance
- FAILURE_MODES runbooks present and linked from PagerDuty alerts.
- Retries + circuit breakers configured for BigQuery and orchestrator clients.
- DLQ topic
analytics.dlqprovisioned; replayer tool tested. - Load test (TESTING_STRATEGY §6) targets met in stg with 30 % headroom.
- Chaos game day completed within last 90 days.
- Cost guardrails: byte budgets, snapshot auto-pause, slot reservation autoscale ceiling.
6. Deployment & ops
- DEPLOYMENT_TOPOLOGY terraform applied per region with peer review.
- Cloud Workflows + Composer DAGs deployed; per-job timeouts and retries set.
- Cloud Run revisions canary-promoted with 30 min observation in stg → prod.
- Rollback procedure rehearsed (revision flip + Workflow rollback + Terraform plan/apply).
- Disaster recovery: PITR verified, BQ snapshot restore rehearsed, RTO/RPO documented in DEPLOYMENT_TOPOLOGY §8.
7. Documentation & process
- LOCAL_DEV_SETUP verified end-to-end by a developer not on the squad.
- SYNC_CONTRACT reviewed by Electron desktop squad; conflict policies documented per aggregate.
- AI_INTEGRATION reviewed by AI squad; HITL and off-switch wired.
- SERVICE_RISK_REGISTER reviewed and signed by tenant ops, security, finance.
- MIGRATION_PLAN feature flags wired and reversible.
- On-call rotation defined; primary + secondary identified; runbooks linked.
8. Compliance & legal
- Per-region residency configuration verified (SECURITY_MODEL §10).
- Right-to-erasure path (
tenant.deleted.v1cascade) verified end-to-end. - PII inventory for raw events updated in docs/07 §11.
- Data Processing Agreement (DPA) addendum reviewed for Looker Studio re-share.
- Audit anchor inclusion verified (audit events appear in daily Merkle root).
9. AI capability gates
- All AI capabilities default-off per tenant.
- Off-switch tested (server-side enforcement).
- Budget guardrails tested; budget exhaustion path graceful.
- AI provenance recorded for every AI-affected output (
AIProvenancenon-null). - Forecast writeback validates tenant per row.
10. Sign-offs
| Role | Name | Date |
|---|---|---|
| Service owner | ||
| Architecture | ||
| Security | ||
| Data & analytics | ||
| AI | ||
| Tenant ops | ||
| Finance (cost) | ||
| On-call lead |
A region is "tenant-ready" only when every checkbox is [x] and all sign-offs collected. Any unchecked box requires a documented exception with owner + expiry.
Cross-references: SERVICE_RISK_REGISTER, MIGRATION_PLAN, DEPLOYMENT_TOPOLOGY.