TESTING_STRATEGY — analytics-service
Sibling: APPLICATION_LOGIC · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL
Trophy mix: heavy on integration + contract because the surface area is BigQuery + Pub/Sub + Postgres + curated tables.
1. Test pyramid
| Layer | Goal | Coverage target |
|---|---|---|
| Unit (Vitest) | Domain rules, value objects, SQL builder, byte-cap validators | 90 %+ statements on domain/, 80 %+ overall |
Contract (@melmastoon/event-contracts, OpenAPI, Pact) | Event payloads, REST DTOs, BFF consumer-driven contracts | 100 % published events |
| Integration (Testcontainers + BigQuery emulator) | Real Postgres + Pub/Sub emulator + BigQuery emulator (or sandbox project) | All use cases × happy + 1 unhappy |
| E2E (Playwright + sync-service stub) | Backoffice widget render + Looker probe | Top 10 widgets |
| Load (k6) | Widget query & ETL throughput | SLO budget headroom |
| Chaos (Gremlin/Litmus) | Pub/Sub backlog, BigQuery 503, Postgres failover | Quarterly game day |
2. Unit tests (must)
domain/MetricDefinition.spec.ts— invariants: required dimensions exist; SQL template includes@tenant_id; archive transitions.domain/Projection.spec.ts— partitioning rule;_schema_versionrequired; status FSM.domain/Dashboard.spec.ts— widget add/remove invariants; layout overlap check; OCC bumps revision.domain/Widget.spec.ts— display config matches metric output type.domain/Query.spec.ts— saved query parameter binding only; no concatenation.application/ByteCap.spec.ts— dry-run cap enforcement, daily-budget rollover.application/IdempotencyKey.spec.ts— same key returns same response within 24 h.infrastructure/SqlBuilder.spec.ts— every generated SQL includestenant_idpredicate; param-only binding.
Property-based (fast-check):
- Dimension intersection with
propertyAccess[]always returns subset. - Dashboard add/remove n widgets is order-preserving.
- DQ check arithmetic (freshness, row-count threshold) over random ETL run timelines.
3. Contract tests
3.1 Event contracts
- For every published event, validate fixtures against JSON schema in
@melmastoon/event-contracts/analytics. - Backwards-compat fixtures live under
test/contracts/events/v1/and are loaded into v2 deserializers.
3.2 REST contracts
- OpenAPI (
spec/openapi.yaml) generated from NestJS controllers; CI fails on drift. - Pact tests with
bff-backoffice-serviceforPOST /widgets/:id:run,POST /queries:run,POST /dashboards. Pact broker version-locks during release.
3.3 BigQuery contracts
- Curated tables ship with
_schema_version. Schema drift detector runs nightly comparing live table schema toinfra/bigquery/<table>.json. - Frozen SQL fixtures (
test/sql/<metric>.sql.snap) checked in; mutation requires explicit review (these are the metric definitions).
4. Integration tests (mandatory three)
test/integration/tenant-isolation.spec.ts — two tenants, A's JWT cannot read B's dashboards/widgets/queries; cross-tenant widget:run returns zero rows even when SQL injection is attempted via params.
test/integration/outbox.spec.ts — publish at-least-once guarantee under simulated Pub/Sub failure: every committed write eventually appears on the topic with stable eventId.
test/integration/inbox.spec.ts — duplicate Pub/Sub deliveries (tenant.deleted.v1, ai.forecast.produced.v1) are processed exactly once; replays after restart converge.
Additional integration suites:
- ETL idempotency. Re-running a job for the same window does not double-count; MERGE keys verified.
- Authorized view enforcement. Synthetic Looker token for tenant A cannot query tenant B's curated rows even when the view is queried directly.
- Forecast writeback. Valid envelope merges; tenant mismatch raises
ANALYTICS.FORECAST_INVALID_TENANT; partial batch failure surfaces row-level errors. - Cascade purge.
tenant.deleted.v1removes all metadata + curated rows; verifier query returns 0 rows for tenant. - Saved query parameter binding. Crafted SQL injection in params is blocked at parser layer.
Testcontainers boot Postgres (with RLS), Pub/Sub emulator, GCS emulator. BigQuery uses the dedicated melmastoon-test-bq sandbox project with per-PR datasets cleaned up via TTL.
5. Saga participation tests
- Forecast pipeline. ETL completion →
metric.computed.v1→ orchestrator stub →ai.forecast.produced.v1→ curated row visible. Failure injection in each hop verifies retry/DLQ behavior. - Tenant lifecycle.
tenant.deleted.v1triggers purge across metadata, curated tables, and authorized view bindings; idempotent re-delivery handled.
6. Performance / load
k6 scenarios in test/load/:
| Scenario | Target |
|---|---|
| Widget query (cached) | 200 RPS, p95 ≤ 200 ms |
| Widget query (uncached) | 50 RPS, p95 ≤ 4 s, byte cap respected |
| ETL hourly batch | 10 M source rows in ≤ 8 min |
| Pub/Sub sink | 5 k events/s sustained, lag p95 ≤ 60 s |
| Forecast writeback | 200 k rows in ≤ 2 min |
Run nightly in stg; results posted to #perf-analytics.
7. Chaos / resilience
Quarterly game day:
- Inject 30 %
503on BigQuery for 5 min — verify retry/backoff and graceful errors. - Drop Postgres primary — verify failover transparent to Pub/Sub consumer.
- Pause Pub/Sub subscription for 15 min — verify backlog catches up without data loss.
- Inject 200 % byte budget for one tenant — verify auto-pause + alert.
- Looker Studio outage — verify graceful 503 to embed clients with cached data.
8. Static checks & linting
eslintwith@melmastoon/eslint-config-service(forbidsprocess.env.*outsideconfig/, forbidsconsole.*, requires explicit return types on exported functions).tsc --noEmitstrict.melmastoon-sql-lint— custom rule: every BigQuery template must includetenant_idin WHERE clause and bind via@tenant_id; CI breaks otherwise.gitleaksfor secret scanning.npm audit --omit=dev+ Snyk on PR.
9. Test data & fixtures
- Synthetic tenant
tnt_synthetic_<region>with golden curated rows (~50 k) for stable Playwright runs. - Faker-backed event generators (
test/factories/events.ts) emit valid envelopes with deterministic seeds. - BigQuery sandbox dataset auto-seeded from
test/fixtures/bq/*.jsonlper PR run.
10. CI gates
| Stage | Gate |
|---|---|
| PR | unit + contract + integration + lint + audit + schema-drift |
| Merge to develop | + load smoke (10 % traffic) |
| Release candidate | + chaos smoke + Pact verification + DQ replay |
| Prod deploy | health gate + canary metrics for 30 min before 100 % |
Coverage uploaded to Codecov; budget = 80 % overall, 90 % domain/. Drops > 1 % block PRs.
Cross-references: APPLICATION_LOGIC, DATA_MODEL §3 RLS, DEPLOYMENT_TOPOLOGY.