Testing Standards — Ghasi Melmastoon

Companion: CODING_STANDARDS.md · DEFINITION_OF_DONE.md · docs/11-testing-strategy-qa.md · SERVICE_TEMPLATE.md

This document is the single, authoritative test standard for every PR. It defines the test pyramid, harness configuration, naming, fixture rules, mocking policy, and coverage targets. Every PR is checked against it. The code-reviewer and pr-test-analyzer subagents enforce it.

If this document and a .mdc rule disagree, this document wins. Update both in the same commit.

1. Test pyramid (the contract)

Layer	Runtime	Speed budget	Volume	Tooling
Unit	In-process, no I/O	< 50 ms each, < 60 s total per package	70–80 % of all tests	Vitest
Integration	Testcontainers (Postgres, Redis, Pub/Sub emulator, OpenSearch)	< 2 s each, < 8 min total per service	15–25 %	Vitest + `@testcontainers/postgresql`
Contract	In-process; verifies a frozen schema	< 100 ms each	1 per producer-consumer pair	Pact (HTTP) + AJV / Zod (events)
E2E	Real browser, real device, real DB	< 30 s each, < 15 min per app	One per W-NN workflow + critical happy-path of every story flagged `e2e`	Playwright (web/desktop) · Detox (mobile)
Performance	Real browser/runtime; budget gated	< 2 min per scenario	One per critical screen + endpoint	Lighthouse-CI (web) · Reassure (mobile) · k6 (services)
A11y	Real browser; rule-set enforced	< 5 s each	Layered into every E2E	`axe-playwright` · `@axe-core/react`

Inverted pyramid is forbidden. A PR that ships only E2E coverage of a new use case will fail review unless the use case is genuinely untestable below the E2E layer (rare; document with an ADR).

2. Coverage targets (CI-enforced via Vitest `coverage.thresholds`)

Code area	Statements	Branches	Functions	Lines	Mutation (Stryker, changed files)
`domain/` aggregates	95	90	95	95	≥ 75 % killed
`domain/` value objects	100	95	100	100	≥ 85 % killed
`domain/` domain services	90	85	90	90	≥ 70 % killed
`application/` use cases	90	85	90	90	—
`infrastructure/` adapters	80	70	80	80	—
`presentation/` controllers	80	70	80	80	—
Shared packages (`@ghasi/*`)	90	85	90	90	≥ 80 % killed
Frontend components (logic-bearing)	85	75	85	85	—

A PR drops below threshold? CI blocks merge. Coverage exclusions live in vitest.config.ts per package and require a comment with rationale.

3. Mandatory test files per service (forever-passing)

Every service ships and keeps these passing:

Path	What it proves
`test/integration/tenant-isolation.spec.ts`	A query against tenant A returns zero rows for tenant B; RLS is on for every multi-tenant table.
`test/integration/outbox.spec.ts`	A successful aggregate write enqueues exactly one outbox row per declared event; failure rolls back both.
`test/integration/inbox.spec.ts`	Re-delivering the same Pub/Sub message produces a single side effect (idempotent apply).
`test/integration/idempotency.spec.ts`	Two writes with the same `Idempotency-Key` return the same response and only one persisted side effect.
`test/integration/error-codes.spec.ts`	Every controller path under failure returns a canonical `MELMASTOON.<DOMAIN>.<CODE>`.
`test/contract/openapi.spec.ts`	Generated `openapi.json` matches the committed `openapi.json` (no drift).

scaffold-service generates these as smoke tests; the team fills them with real assertions per service.

4. File naming, layout, and execution

<service>/
└── test/
    ├── unit/                             # mirrors src/ paths
    │   ├── domain/
    │   ├── application/
    │   ├── infrastructure/
    │   └── presentation/
    ├── integration/                      # mandatory + per-feature
    │   └── <feature>.integration.spec.ts
    ├── contract/                         # Pact + JSON-Schema
    │   ├── <consumer>.pact.spec.ts
    │   └── <event>.schema.spec.ts
    ├── e2e/                              # Playwright/Detox
    │   └── <workflow-id>.e2e.spec.ts
    └── perf/
        └── <scenario>.perf.spec.ts

File suffix	Runner / project
`*.spec.ts`	Vitest unit project
`*.integration.spec.ts`	Vitest integration project (Testcontainers)
`.pact.spec.ts` / `.schema.spec.ts`	Vitest contract project
`*.e2e.spec.ts`	Playwright / Detox project (not Vitest)
`*.perf.spec.ts`	Lighthouse-CI / Reassure / k6 project

Vitest projects are split via vitest.workspace.ts so pnpm test runs only unit by default; CI runs the integration / contract / e2e / perf matrices separately.

5. Test naming

Spec names use behavior-first wording — never function names.

describe('Reservation', () => {
  describe('confirm()', () => {
    it('rejects confirmation when no inventory hold is attached', () => { ... });
    it('emits melmastoon.reservation.booking.confirmed.v1 on success', () => { ... });
    it('throws MELMASTOON.RESERVATION.OVERBOOKING_BLOCKED when inventory is exhausted', () => { ... });
    it('is idempotent under repeated calls with the same Idempotency-Key', () => { ... });
  });
});

One assertion concept per it (multiple expect lines are fine if they prove the same thing).
it reads as a sentence in plain English. No "should" prefix (it's noise).
Failure-mode tests start with rejects, throws, returns, emits, is, does not.

6. Fixtures, factories, and builders

No literal object soup in tests. Use builders from @ghasi/test-utils or per-service test/builders/.
Builders follow the aReservation().withTenant(t).withCheckIn(d).build() pattern; default to a valid, minimal aggregate.
Database seeding uses @ghasi/test-utils/seed (not raw SQL inside a spec).
HTTP request bodies use Zod-derived sample factories (sampleConfirmReservationCommand({ overrides })).
No shared mutable state across specs. Each spec gets a fresh tenant + fresh DB schema (pg_tmp template DB pattern).

7. Mocking policy

Want to test	Approach
Domain logic	No mocks. Construct real aggregates + value objects.
Use case	Mock ports (the interfaces the use case depends on). Use `@ghasi/test-utils/port-mocks`. Never mock the domain.
Adapter	Real third party via Testcontainer or vendor sandbox (Stripe test, Pub/Sub emulator). Mocks only when sandbox doesn't exist (document why).
Controller	Spin up a Nest test module with the use case mocked. Or use the integration project to drive end-to-end.
Frontend hook	Mock the typed BFF client at the function boundary; use MSW for full HTTP simulation.
React component	Render with React Testing Library; query by role + accessible name. Snapshot tests are forbidden except for visual regression (Chromatic).

Forbidden:

jest.mock(...) of the System Under Test.
Mocking Date, Math.random, crypto.randomUUID directly. Inject Clock, RNG, IdFactory ports per CODING_STANDARDS.md §11.
Mocking the database driver (use Testcontainers).

8. Determinism

Inject Clock, RNG, IdFactory everywhere they are used. Tests bind FakeClock, SeededRNG, DeterministicIdFactory.
No real network calls in unit tests. The HTTP test helper throws if any unmocked call escapes.
All async tests use await; no setTimeout-based "wait and hope". Use waitFor from RTL or vi.runAllTimersAsync.
Snapshot files for serialized aggregates / event payloads live alongside the spec; reviewers diff them.

9. Integration tests with Testcontainers

import { startPostgres, startPubSubEmulator } from '@ghasi/test-utils/containers';

let pg: StartedPostgresContainer;
let pubsub: StartedPubSubEmulator;

beforeAll(async () => {
  pg = await startPostgres({ image: 'postgres:16-alpine' });
  pubsub = await startPubSubEmulator();
  await runMigrations(pg.getConnectionUri());
});

afterAll(async () => {
  await pg.stop();
  await pubsub.stop();
});

One container per spec file (not per it); reuse via beforeAll.
Each spec gets a clean schema or a BEGIN; … ROLLBACK; transaction.
Containers run on CI in parallel only when isolated by port + name.
A failing integration spec must save container logs to test-results/<service>/<spec>/<container>.log for triage.

10. Contract tests

Boundary	Standard
BFF → service (HTTP)	Pact consumer test in the BFF; provider verification in the service CI job. The Pact broker is the source of truth.
Service → service (HTTP, rare)	Same Pact pattern.
Service → service (events)	JSON-Schema in `event-schemas/melmastoon/<service>/<aggregate>/<event>/v<N>.json`. Producer test asserts emitted payload validates; consumer test asserts incoming payload validates.
Frontend → BFF	OpenAPI-generated client in `@ghasi/api-clients`; client + server share the schema.

Schema breakage = major version bump on the route or event subject. CI OpenAPI diff gate blocks otherwise.

11. End-to-end tests

App	Tooling	Project layout
`web-meta`	Playwright	`apps/web-meta/test/e2e/<W-NN>-<slug>.e2e.spec.ts`
`web-tenant-booking`	Playwright	`apps/web-tenant-booking/test/e2e/...`
`desktop-backoffice`	Playwright (Electron driver)	`apps/desktop-backoffice/test/e2e/...`
`mobile`	Detox	`apps/mobile/e2e/...`

Rules:

Every workflow W-NN from docs/frontend/05-frontend-workflows.md ships with at least one happy-path E2E.
Every workflow that has an offline branch ships an offline-path E2E (network blocked at the test driver layer).
Every E2E ends with an axe.run() assertion (zero serious + critical violations).
Every E2E captures a screenshot + trace on failure into test-results/.
Flaky E2Es are quarantined within 24 h (move to e2e/quarantine/ with a Jira ticket; auto-fail if not unquarantined within 14 days).

12. Accessibility tests

Component layer (@ghasi/ui-melmastoon): every component has an axe test in its Storybook story (@storybook/addon-a11y).
Page layer: every Playwright E2E runs injectAxe() + checkA11y() on the entry screen and after every state transition that changes the DOM materially.
Mobile: Detox a11y pass (device.disableSynchronization aware) on every screen entered.
The full WCAG 2.2 AA matrix lives in docs/frontend/16-accessibility-checklist.md (added in Wave 1) and maps to axe rule IDs.

13. Performance tests

Surface	Tool	Budget source
Web (`web-meta`, `web-tenant-booking`)	Lighthouse-CI with budgets per route	`docs/frontend/04-frontend-design-guidelines.md §11`
Mobile	Reassure (render counts + JS thread time)	Same
Desktop renderer	Lighthouse in Electron mode	Same
Services	k6 (RPS, p95, p99 per endpoint)	`services/<name>/SERVICE_OVERVIEW.md` SLOs
Sync	Custom harness (`@ghasi/test-utils/sync`) measuring outbox flush + conflict resolution	`services/<name>/SYNC_CONTRACT.md`

A PR that worsens any budget by > 10 % fails CI; > 5 % requires a justified ADR.

14. AI / prompt regression tests

Every prompt in ai-orchestrator-service/prompts/<id>/v<n>.md has a golden eval in test/ai-eval/<id>.eval.ts:
- Fixed seed.
- 10–50 representative inputs.
- Pass/fail criteria (string match, JSON schema, numeric threshold, BLEU/ROUGE for free text).
Eval runs on every PR that touches the prompt file or the orchestrator's routing.
A new model version is canary'd against the current production version's eval; rollout requires ≥ 95 % parity on critical prompts.

15. Tenant-isolation test (the canary)

Every service includes this exact test (or a near-clone), enforced by the code-reviewer agent:

it('returns no rows for tenant B when seeded for tenant A', async () => {
  await seed.aReservation().withTenant(tenantA).build();
  await runAs(tenantB, async (ctx) => {
    const rows = await reservationsRepo.findAll(ctx);
    expect(rows).toHaveLength(0);
  });
});

If RLS is misconfigured this is the first thing to fail. Never delete it.

16. Continuous Integration matrix

Job	Trigger	Surface	Blocks merge
`lint`	Every PR	All	Yes
`typecheck`	Every PR	All	Yes
`unit`	Every PR	Per package via Turbo filter	Yes
`integration`	Every PR	Affected services only	Yes
`contract`	Every PR	Affected services + BFFs	Yes
`openapi-diff`	Every PR	Services with API changes	Yes (unless major bump approved)
`pact-verify`	Every PR	BFF-touching services	Yes
`e2e-web-meta`	Every PR touching `apps/web-meta` or its services	Web	Yes
`e2e-web-tenant-booking`	Same as above	Web	Yes
`e2e-desktop`	Every PR touching `apps/desktop-backoffice` or related services	Desktop	Yes
`e2e-mobile`	Every PR touching `apps/mobile`	Mobile	Yes
`lighthouse-ci`	Every PR touching `apps/web-*`	Web	Yes (budgets)
`axe`	Layered into all e2e jobs	All UI	Yes (zero serious/critical)
`security-osv`	Every PR	All	Yes (no high/critical)
`secret-scan`	Every PR	All	Yes
`mutation` (Stryker)	Daily, plus on `domain/` changes	Per package	Warning, not block (becomes block in Wave 3)

17. Local developer workflow

pnpm install
pnpm typecheck         # full repo
pnpm lint              # full repo
pnpm test              # unit only (fast)
pnpm test:integration  # uses Testcontainers; requires Docker
pnpm test:contract     # Pact + schema
pnpm --filter @ghasi/service-reservation test  # one package

Before opening a PR, the agent / dev runs pnpm verify which is lint && typecheck && test && test:integration --filter=changed && openapi-diff.

18. Anti-patterns (auto-flagged)

Snapshot tests on UI markup (use Chromatic visual regression instead).
any in spec files.
Mocking the SUT (the system under test).
Time-based waits (setTimeout(... )).
Order-dependent specs (any spec that depends on the prior spec's leftovers).
Spec files > 500 LOC (decompose by feature).
Single spec asserting > 5 expectations (split).
Testing private methods directly (test through public API).
expect(true).toBe(true) placeholders left in main.
Disabled specs (it.skip, describe.skip) without a Jira ticket in the comment.

19. Cross-references

docs/11-testing-strategy-qa.md — long-form rationale and per-domain test heuristics.
.cursor/rules/80-testing.mdc — short rules pack loaded by AI tools.
SERVICE_TEMPLATE.md — per-service skeleton (test paths included).
DEFINITION_OF_DONE.md — the merge checklist.

20. Versioning of this document

Same governance as CODING_STANDARDS.md §19: label testing-standards, sign-off per surface, ADR if loosening, changelog entry. Coverage threshold changes always require an ADR.

1. Test pyramid (the contract)​

2. Coverage targets (CI-enforced via Vitest coverage.thresholds)​

3. Mandatory test files per service (forever-passing)​

4. File naming, layout, and execution​

5. Test naming​

6. Fixtures, factories, and builders​

7. Mocking policy​

8. Determinism​

9. Integration tests with Testcontainers​

10. Contract tests​

11. End-to-end tests​

12. Accessibility tests​

13. Performance tests​

14. AI / prompt regression tests​

15. Tenant-isolation test (the canary)​

16. Continuous Integration matrix​

17. Local developer workflow​

18. Anti-patterns (auto-flagged)​

19. Cross-references​

20. Versioning of this document​