Skip to main content

TESTING_STRATEGY — analytics-service

Sibling: APPLICATION_LOGIC · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL

Trophy mix: heavy on integration + contract because the surface area is BigQuery + Pub/Sub + Postgres + curated tables.


1. Test pyramid

LayerGoalCoverage target
Unit (Vitest)Domain rules, value objects, SQL builder, byte-cap validators90 %+ statements on domain/, 80 %+ overall
Contract (@melmastoon/event-contracts, OpenAPI, Pact)Event payloads, REST DTOs, BFF consumer-driven contracts100 % published events
Integration (Testcontainers + BigQuery emulator)Real Postgres + Pub/Sub emulator + BigQuery emulator (or sandbox project)All use cases × happy + 1 unhappy
E2E (Playwright + sync-service stub)Backoffice widget render + Looker probeTop 10 widgets
Load (k6)Widget query & ETL throughputSLO budget headroom
Chaos (Gremlin/Litmus)Pub/Sub backlog, BigQuery 503, Postgres failoverQuarterly game day

2. Unit tests (must)

  • domain/MetricDefinition.spec.ts — invariants: required dimensions exist; SQL template includes @tenant_id; archive transitions.
  • domain/Projection.spec.ts — partitioning rule; _schema_version required; status FSM.
  • domain/Dashboard.spec.ts — widget add/remove invariants; layout overlap check; OCC bumps revision.
  • domain/Widget.spec.ts — display config matches metric output type.
  • domain/Query.spec.ts — saved query parameter binding only; no concatenation.
  • application/ByteCap.spec.ts — dry-run cap enforcement, daily-budget rollover.
  • application/IdempotencyKey.spec.ts — same key returns same response within 24 h.
  • infrastructure/SqlBuilder.spec.ts — every generated SQL includes tenant_id predicate; param-only binding.

Property-based (fast-check):

  • Dimension intersection with propertyAccess[] always returns subset.
  • Dashboard add/remove n widgets is order-preserving.
  • DQ check arithmetic (freshness, row-count threshold) over random ETL run timelines.

3. Contract tests

3.1 Event contracts

  • For every published event, validate fixtures against JSON schema in @melmastoon/event-contracts/analytics.
  • Backwards-compat fixtures live under test/contracts/events/v1/ and are loaded into v2 deserializers.

3.2 REST contracts

  • OpenAPI (spec/openapi.yaml) generated from NestJS controllers; CI fails on drift.
  • Pact tests with bff-backoffice-service for POST /widgets/:id:run, POST /queries:run, POST /dashboards. Pact broker version-locks during release.

3.3 BigQuery contracts

  • Curated tables ship with _schema_version. Schema drift detector runs nightly comparing live table schema to infra/bigquery/<table>.json.
  • Frozen SQL fixtures (test/sql/<metric>.sql.snap) checked in; mutation requires explicit review (these are the metric definitions).

4. Integration tests (mandatory three)

test/integration/tenant-isolation.spec.ts — two tenants, A's JWT cannot read B's dashboards/widgets/queries; cross-tenant widget:run returns zero rows even when SQL injection is attempted via params.

test/integration/outbox.spec.ts — publish at-least-once guarantee under simulated Pub/Sub failure: every committed write eventually appears on the topic with stable eventId.

test/integration/inbox.spec.ts — duplicate Pub/Sub deliveries (tenant.deleted.v1, ai.forecast.produced.v1) are processed exactly once; replays after restart converge.

Additional integration suites:

  • ETL idempotency. Re-running a job for the same window does not double-count; MERGE keys verified.
  • Authorized view enforcement. Synthetic Looker token for tenant A cannot query tenant B's curated rows even when the view is queried directly.
  • Forecast writeback. Valid envelope merges; tenant mismatch raises ANALYTICS.FORECAST_INVALID_TENANT; partial batch failure surfaces row-level errors.
  • Cascade purge. tenant.deleted.v1 removes all metadata + curated rows; verifier query returns 0 rows for tenant.
  • Saved query parameter binding. Crafted SQL injection in params is blocked at parser layer.

Testcontainers boot Postgres (with RLS), Pub/Sub emulator, GCS emulator. BigQuery uses the dedicated melmastoon-test-bq sandbox project with per-PR datasets cleaned up via TTL.


5. Saga participation tests

  • Forecast pipeline. ETL completion → metric.computed.v1 → orchestrator stub → ai.forecast.produced.v1 → curated row visible. Failure injection in each hop verifies retry/DLQ behavior.
  • Tenant lifecycle. tenant.deleted.v1 triggers purge across metadata, curated tables, and authorized view bindings; idempotent re-delivery handled.

6. Performance / load

k6 scenarios in test/load/:

ScenarioTarget
Widget query (cached)200 RPS, p95 ≤ 200 ms
Widget query (uncached)50 RPS, p95 ≤ 4 s, byte cap respected
ETL hourly batch10 M source rows in ≤ 8 min
Pub/Sub sink5 k events/s sustained, lag p95 ≤ 60 s
Forecast writeback200 k rows in ≤ 2 min

Run nightly in stg; results posted to #perf-analytics.


7. Chaos / resilience

Quarterly game day:

  1. Inject 30 % 503 on BigQuery for 5 min — verify retry/backoff and graceful errors.
  2. Drop Postgres primary — verify failover transparent to Pub/Sub consumer.
  3. Pause Pub/Sub subscription for 15 min — verify backlog catches up without data loss.
  4. Inject 200 % byte budget for one tenant — verify auto-pause + alert.
  5. Looker Studio outage — verify graceful 503 to embed clients with cached data.

8. Static checks & linting

  • eslint with @melmastoon/eslint-config-service (forbids process.env.* outside config/, forbids console.*, requires explicit return types on exported functions).
  • tsc --noEmit strict.
  • melmastoon-sql-lint — custom rule: every BigQuery template must include tenant_id in WHERE clause and bind via @tenant_id; CI breaks otherwise.
  • gitleaks for secret scanning.
  • npm audit --omit=dev + Snyk on PR.

9. Test data & fixtures

  • Synthetic tenant tnt_synthetic_<region> with golden curated rows (~50 k) for stable Playwright runs.
  • Faker-backed event generators (test/factories/events.ts) emit valid envelopes with deterministic seeds.
  • BigQuery sandbox dataset auto-seeded from test/fixtures/bq/*.jsonl per PR run.

10. CI gates

StageGate
PRunit + contract + integration + lint + audit + schema-drift
Merge to develop+ load smoke (10 % traffic)
Release candidate+ chaos smoke + Pact verification + DQ replay
Prod deployhealth gate + canary metrics for 30 min before 100 %

Coverage uploaded to Codecov; budget = 80 % overall, 90 % domain/. Drops > 1 % block PRs.

Cross-references: APPLICATION_LOGIC, DATA_MODEL §3 RLS, DEPLOYMENT_TOPOLOGY.