Skip to main content

Testing Standards — Ghasi Melmastoon

Companion: CODING_STANDARDS.md · DEFINITION_OF_DONE.md · docs/11-testing-strategy-qa.md · SERVICE_TEMPLATE.md

This document is the single, authoritative test standard for every PR. It defines the test pyramid, harness configuration, naming, fixture rules, mocking policy, and coverage targets. Every PR is checked against it. The code-reviewer and pr-test-analyzer subagents enforce it.

If this document and a .mdc rule disagree, this document wins. Update both in the same commit.


1. Test pyramid (the contract)

LayerRuntimeSpeed budgetVolumeTooling
UnitIn-process, no I/O< 50 ms each, < 60 s total per package70–80 % of all testsVitest
IntegrationTestcontainers (Postgres, Redis, Pub/Sub emulator, OpenSearch)< 2 s each, < 8 min total per service15–25 %Vitest + @testcontainers/postgresql
ContractIn-process; verifies a frozen schema< 100 ms each1 per producer-consumer pairPact (HTTP) + AJV / Zod (events)
E2EReal browser, real device, real DB< 30 s each, < 15 min per appOne per W-NN workflow + critical happy-path of every story flagged e2ePlaywright (web/desktop) · Detox (mobile)
PerformanceReal browser/runtime; budget gated< 2 min per scenarioOne per critical screen + endpointLighthouse-CI (web) · Reassure (mobile) · k6 (services)
A11yReal browser; rule-set enforced< 5 s eachLayered into every E2Eaxe-playwright · @axe-core/react

Inverted pyramid is forbidden. A PR that ships only E2E coverage of a new use case will fail review unless the use case is genuinely untestable below the E2E layer (rare; document with an ADR).


2. Coverage targets (CI-enforced via Vitest coverage.thresholds)

Code areaStatementsBranchesFunctionsLinesMutation (Stryker, changed files)
domain/ aggregates95909595≥ 75 % killed
domain/ value objects10095100100≥ 85 % killed
domain/ domain services90859090≥ 70 % killed
application/ use cases90859090
infrastructure/ adapters80708080
presentation/ controllers80708080
Shared packages (@ghasi/*)90859090≥ 80 % killed
Frontend components (logic-bearing)85758585

A PR drops below threshold? CI blocks merge. Coverage exclusions live in vitest.config.ts per package and require a comment with rationale.


3. Mandatory test files per service (forever-passing)

Every service ships and keeps these passing:

PathWhat it proves
test/integration/tenant-isolation.spec.tsA query against tenant A returns zero rows for tenant B; RLS is on for every multi-tenant table.
test/integration/outbox.spec.tsA successful aggregate write enqueues exactly one outbox row per declared event; failure rolls back both.
test/integration/inbox.spec.tsRe-delivering the same Pub/Sub message produces a single side effect (idempotent apply).
test/integration/idempotency.spec.tsTwo writes with the same Idempotency-Key return the same response and only one persisted side effect.
test/integration/error-codes.spec.tsEvery controller path under failure returns a canonical MELMASTOON.<DOMAIN>.<CODE>.
test/contract/openapi.spec.tsGenerated openapi.json matches the committed openapi.json (no drift).

scaffold-service generates these as smoke tests; the team fills them with real assertions per service.


4. File naming, layout, and execution

<service>/
└── test/
├── unit/ # mirrors src/ paths
│ ├── domain/
│ ├── application/
│ ├── infrastructure/
│ └── presentation/
├── integration/ # mandatory + per-feature
│ └── <feature>.integration.spec.ts
├── contract/ # Pact + JSON-Schema
│ ├── <consumer>.pact.spec.ts
│ └── <event>.schema.spec.ts
├── e2e/ # Playwright/Detox
│ └── <workflow-id>.e2e.spec.ts
└── perf/
└── <scenario>.perf.spec.ts
File suffixRunner / project
*.spec.tsVitest unit project
*.integration.spec.tsVitest integration project (Testcontainers)
*.pact.spec.ts / *.schema.spec.tsVitest contract project
*.e2e.spec.tsPlaywright / Detox project (not Vitest)
*.perf.spec.tsLighthouse-CI / Reassure / k6 project

Vitest projects are split via vitest.workspace.ts so pnpm test runs only unit by default; CI runs the integration / contract / e2e / perf matrices separately.


5. Test naming

Spec names use behavior-first wording — never function names.

describe('Reservation', () => {
describe('confirm()', () => {
it('rejects confirmation when no inventory hold is attached', () => { ... });
it('emits melmastoon.reservation.booking.confirmed.v1 on success', () => { ... });
it('throws MELMASTOON.RESERVATION.OVERBOOKING_BLOCKED when inventory is exhausted', () => { ... });
it('is idempotent under repeated calls with the same Idempotency-Key', () => { ... });
});
});
  • One assertion concept per it (multiple expect lines are fine if they prove the same thing).
  • it reads as a sentence in plain English. No "should" prefix (it's noise).
  • Failure-mode tests start with rejects, throws, returns, emits, is, does not.

6. Fixtures, factories, and builders

  • No literal object soup in tests. Use builders from @ghasi/test-utils or per-service test/builders/.
  • Builders follow the aReservation().withTenant(t).withCheckIn(d).build() pattern; default to a valid, minimal aggregate.
  • Database seeding uses @ghasi/test-utils/seed (not raw SQL inside a spec).
  • HTTP request bodies use Zod-derived sample factories (sampleConfirmReservationCommand({ overrides })).
  • No shared mutable state across specs. Each spec gets a fresh tenant + fresh DB schema (pg_tmp template DB pattern).

7. Mocking policy

Want to testApproach
Domain logicNo mocks. Construct real aggregates + value objects.
Use caseMock ports (the interfaces the use case depends on). Use @ghasi/test-utils/port-mocks. Never mock the domain.
AdapterReal third party via Testcontainer or vendor sandbox (Stripe test, Pub/Sub emulator). Mocks only when sandbox doesn't exist (document why).
ControllerSpin up a Nest test module with the use case mocked. Or use the integration project to drive end-to-end.
Frontend hookMock the typed BFF client at the function boundary; use MSW for full HTTP simulation.
React componentRender with React Testing Library; query by role + accessible name. Snapshot tests are forbidden except for visual regression (Chromatic).

Forbidden:

  • jest.mock(...) of the System Under Test.
  • Mocking Date, Math.random, crypto.randomUUID directly. Inject Clock, RNG, IdFactory ports per CODING_STANDARDS.md §11.
  • Mocking the database driver (use Testcontainers).

8. Determinism

  • Inject Clock, RNG, IdFactory everywhere they are used. Tests bind FakeClock, SeededRNG, DeterministicIdFactory.
  • No real network calls in unit tests. The HTTP test helper throws if any unmocked call escapes.
  • All async tests use await; no setTimeout-based "wait and hope". Use waitFor from RTL or vi.runAllTimersAsync.
  • Snapshot files for serialized aggregates / event payloads live alongside the spec; reviewers diff them.

9. Integration tests with Testcontainers

import { startPostgres, startPubSubEmulator } from '@ghasi/test-utils/containers';

let pg: StartedPostgresContainer;
let pubsub: StartedPubSubEmulator;

beforeAll(async () => {
pg = await startPostgres({ image: 'postgres:16-alpine' });
pubsub = await startPubSubEmulator();
await runMigrations(pg.getConnectionUri());
});

afterAll(async () => {
await pg.stop();
await pubsub.stop();
});
  • One container per spec file (not per it); reuse via beforeAll.
  • Each spec gets a clean schema or a BEGIN; … ROLLBACK; transaction.
  • Containers run on CI in parallel only when isolated by port + name.
  • A failing integration spec must save container logs to test-results/<service>/<spec>/<container>.log for triage.

10. Contract tests

BoundaryStandard
BFF → service (HTTP)Pact consumer test in the BFF; provider verification in the service CI job. The Pact broker is the source of truth.
Service → service (HTTP, rare)Same Pact pattern.
Service → service (events)JSON-Schema in event-schemas/melmastoon/<service>/<aggregate>/<event>/v<N>.json. Producer test asserts emitted payload validates; consumer test asserts incoming payload validates.
Frontend → BFFOpenAPI-generated client in @ghasi/api-clients; client + server share the schema.

Schema breakage = major version bump on the route or event subject. CI OpenAPI diff gate blocks otherwise.


11. End-to-end tests

AppToolingProject layout
web-metaPlaywrightapps/web-meta/test/e2e/<W-NN>-<slug>.e2e.spec.ts
web-tenant-bookingPlaywrightapps/web-tenant-booking/test/e2e/...
desktop-backofficePlaywright (Electron driver)apps/desktop-backoffice/test/e2e/...
mobileDetoxapps/mobile/e2e/...

Rules:

  • Every workflow W-NN from docs/frontend/05-frontend-workflows.md ships with at least one happy-path E2E.
  • Every workflow that has an offline branch ships an offline-path E2E (network blocked at the test driver layer).
  • Every E2E ends with an axe.run() assertion (zero serious + critical violations).
  • Every E2E captures a screenshot + trace on failure into test-results/.
  • Flaky E2Es are quarantined within 24 h (move to e2e/quarantine/ with a Jira ticket; auto-fail if not unquarantined within 14 days).

12. Accessibility tests

  • Component layer (@ghasi/ui-melmastoon): every component has an axe test in its Storybook story (@storybook/addon-a11y).
  • Page layer: every Playwright E2E runs injectAxe() + checkA11y() on the entry screen and after every state transition that changes the DOM materially.
  • Mobile: Detox a11y pass (device.disableSynchronization aware) on every screen entered.
  • The full WCAG 2.2 AA matrix lives in docs/frontend/16-accessibility-checklist.md (added in Wave 1) and maps to axe rule IDs.

13. Performance tests

SurfaceToolBudget source
Web (web-meta, web-tenant-booking)Lighthouse-CI with budgets per routedocs/frontend/04-frontend-design-guidelines.md §11
MobileReassure (render counts + JS thread time)Same
Desktop rendererLighthouse in Electron modeSame
Servicesk6 (RPS, p95, p99 per endpoint)services/<name>/SERVICE_OVERVIEW.md SLOs
SyncCustom harness (@ghasi/test-utils/sync) measuring outbox flush + conflict resolutionservices/<name>/SYNC_CONTRACT.md

A PR that worsens any budget by > 10 % fails CI; > 5 % requires a justified ADR.


14. AI / prompt regression tests

  • Every prompt in ai-orchestrator-service/prompts/<id>/v<n>.md has a golden eval in test/ai-eval/<id>.eval.ts:
    • Fixed seed.
    • 10–50 representative inputs.
    • Pass/fail criteria (string match, JSON schema, numeric threshold, BLEU/ROUGE for free text).
  • Eval runs on every PR that touches the prompt file or the orchestrator's routing.
  • A new model version is canary'd against the current production version's eval; rollout requires ≥ 95 % parity on critical prompts.

15. Tenant-isolation test (the canary)

Every service includes this exact test (or a near-clone), enforced by the code-reviewer agent:

it('returns no rows for tenant B when seeded for tenant A', async () => {
await seed.aReservation().withTenant(tenantA).build();
await runAs(tenantB, async (ctx) => {
const rows = await reservationsRepo.findAll(ctx);
expect(rows).toHaveLength(0);
});
});

If RLS is misconfigured this is the first thing to fail. Never delete it.


16. Continuous Integration matrix

JobTriggerSurfaceBlocks merge
lintEvery PRAllYes
typecheckEvery PRAllYes
unitEvery PRPer package via Turbo filterYes
integrationEvery PRAffected services onlyYes
contractEvery PRAffected services + BFFsYes
openapi-diffEvery PRServices with API changesYes (unless major bump approved)
pact-verifyEvery PRBFF-touching servicesYes
e2e-web-metaEvery PR touching apps/web-meta or its servicesWebYes
e2e-web-tenant-bookingSame as aboveWebYes
e2e-desktopEvery PR touching apps/desktop-backoffice or related servicesDesktopYes
e2e-mobileEvery PR touching apps/mobileMobileYes
lighthouse-ciEvery PR touching apps/web-*WebYes (budgets)
axeLayered into all e2e jobsAll UIYes (zero serious/critical)
security-osvEvery PRAllYes (no high/critical)
secret-scanEvery PRAllYes
mutation (Stryker)Daily, plus on domain/ changesPer packageWarning, not block (becomes block in Wave 3)

17. Local developer workflow

pnpm install
pnpm typecheck # full repo
pnpm lint # full repo
pnpm test # unit only (fast)
pnpm test:integration # uses Testcontainers; requires Docker
pnpm test:contract # Pact + schema
pnpm --filter @ghasi/service-reservation test # one package

Before opening a PR, the agent / dev runs pnpm verify which is lint && typecheck && test && test:integration --filter=changed && openapi-diff.


18. Anti-patterns (auto-flagged)

  • Snapshot tests on UI markup (use Chromatic visual regression instead).
  • any in spec files.
  • Mocking the SUT (the system under test).
  • Time-based waits (setTimeout(... )).
  • Order-dependent specs (any spec that depends on the prior spec's leftovers).
  • Spec files > 500 LOC (decompose by feature).
  • Single spec asserting > 5 expectations (split).
  • Testing private methods directly (test through public API).
  • expect(true).toBe(true) placeholders left in main.
  • Disabled specs (it.skip, describe.skip) without a Jira ticket in the comment.

19. Cross-references


20. Versioning of this document

Same governance as CODING_STANDARDS.md §19: label testing-standards, sign-off per surface, ADR if loosening, changelog entry. Coverage threshold changes always require an ADR.