Testing Standards — Ghasi Melmastoon
Companion: CODING_STANDARDS.md · DEFINITION_OF_DONE.md · docs/11-testing-strategy-qa.md · SERVICE_TEMPLATE.md
This document is the single, authoritative test standard for every PR. It defines the test pyramid, harness configuration, naming, fixture rules, mocking policy, and coverage targets. Every PR is checked against it. The code-reviewer and pr-test-analyzer subagents enforce it.
If this document and a .mdc rule disagree, this document wins. Update both in the same commit.
1. Test pyramid (the contract)
| Layer | Runtime | Speed budget | Volume | Tooling |
|---|---|---|---|---|
| Unit | In-process, no I/O | < 50 ms each, < 60 s total per package | 70–80 % of all tests | Vitest |
| Integration | Testcontainers (Postgres, Redis, Pub/Sub emulator, OpenSearch) | < 2 s each, < 8 min total per service | 15–25 % | Vitest + @testcontainers/postgresql |
| Contract | In-process; verifies a frozen schema | < 100 ms each | 1 per producer-consumer pair | Pact (HTTP) + AJV / Zod (events) |
| E2E | Real browser, real device, real DB | < 30 s each, < 15 min per app | One per W-NN workflow + critical happy-path of every story flagged e2e | Playwright (web/desktop) · Detox (mobile) |
| Performance | Real browser/runtime; budget gated | < 2 min per scenario | One per critical screen + endpoint | Lighthouse-CI (web) · Reassure (mobile) · k6 (services) |
| A11y | Real browser; rule-set enforced | < 5 s each | Layered into every E2E | axe-playwright · @axe-core/react |
Inverted pyramid is forbidden. A PR that ships only E2E coverage of a new use case will fail review unless the use case is genuinely untestable below the E2E layer (rare; document with an ADR).
2. Coverage targets (CI-enforced via Vitest coverage.thresholds)
| Code area | Statements | Branches | Functions | Lines | Mutation (Stryker, changed files) |
|---|---|---|---|---|---|
domain/ aggregates | 95 | 90 | 95 | 95 | ≥ 75 % killed |
domain/ value objects | 100 | 95 | 100 | 100 | ≥ 85 % killed |
domain/ domain services | 90 | 85 | 90 | 90 | ≥ 70 % killed |
application/ use cases | 90 | 85 | 90 | 90 | — |
infrastructure/ adapters | 80 | 70 | 80 | 80 | — |
presentation/ controllers | 80 | 70 | 80 | 80 | — |
Shared packages (@ghasi/*) | 90 | 85 | 90 | 90 | ≥ 80 % killed |
| Frontend components (logic-bearing) | 85 | 75 | 85 | 85 | — |
A PR drops below threshold? CI blocks merge. Coverage exclusions live in vitest.config.ts per package and require a comment with rationale.
3. Mandatory test files per service (forever-passing)
Every service ships and keeps these passing:
| Path | What it proves |
|---|---|
test/integration/tenant-isolation.spec.ts | A query against tenant A returns zero rows for tenant B; RLS is on for every multi-tenant table. |
test/integration/outbox.spec.ts | A successful aggregate write enqueues exactly one outbox row per declared event; failure rolls back both. |
test/integration/inbox.spec.ts | Re-delivering the same Pub/Sub message produces a single side effect (idempotent apply). |
test/integration/idempotency.spec.ts | Two writes with the same Idempotency-Key return the same response and only one persisted side effect. |
test/integration/error-codes.spec.ts | Every controller path under failure returns a canonical MELMASTOON.<DOMAIN>.<CODE>. |
test/contract/openapi.spec.ts | Generated openapi.json matches the committed openapi.json (no drift). |
scaffold-service generates these as smoke tests; the team fills them with real assertions per service.
4. File naming, layout, and execution
<service>/
└── test/
├── unit/ # mirrors src/ paths
│ ├── domain/
│ ├── application/
│ ├── infrastructure/
│ └── presentation/
├── integration/ # mandatory + per-feature
│ └── <feature>.integration.spec.ts
├── contract/ # Pact + JSON-Schema
│ ├── <consumer>.pact.spec.ts
│ └── <event>.schema.spec.ts
├── e2e/ # Playwright/Detox
│ └── <workflow-id>.e2e.spec.ts
└── perf/
└── <scenario>.perf.spec.ts
| File suffix | Runner / project |
|---|---|
*.spec.ts | Vitest unit project |
*.integration.spec.ts | Vitest integration project (Testcontainers) |
*.pact.spec.ts / *.schema.spec.ts | Vitest contract project |
*.e2e.spec.ts | Playwright / Detox project (not Vitest) |
*.perf.spec.ts | Lighthouse-CI / Reassure / k6 project |
Vitest projects are split via vitest.workspace.ts so pnpm test runs only unit by default; CI runs the integration / contract / e2e / perf matrices separately.
5. Test naming
Spec names use behavior-first wording — never function names.
describe('Reservation', () => {
describe('confirm()', () => {
it('rejects confirmation when no inventory hold is attached', () => { ... });
it('emits melmastoon.reservation.booking.confirmed.v1 on success', () => { ... });
it('throws MELMASTOON.RESERVATION.OVERBOOKING_BLOCKED when inventory is exhausted', () => { ... });
it('is idempotent under repeated calls with the same Idempotency-Key', () => { ... });
});
});
- One assertion concept per
it(multipleexpectlines are fine if they prove the same thing). itreads as a sentence in plain English. No "should" prefix (it's noise).- Failure-mode tests start with
rejects,throws,returns,emits,is,does not.
6. Fixtures, factories, and builders
- No literal object soup in tests. Use builders from
@ghasi/test-utilsor per-servicetest/builders/. - Builders follow the
aReservation().withTenant(t).withCheckIn(d).build()pattern; default to a valid, minimal aggregate. - Database seeding uses
@ghasi/test-utils/seed(not raw SQL inside a spec). - HTTP request bodies use Zod-derived sample factories (
sampleConfirmReservationCommand({ overrides })). - No shared mutable state across specs. Each spec gets a fresh tenant + fresh DB schema (
pg_tmptemplate DB pattern).
7. Mocking policy
| Want to test | Approach |
|---|---|
| Domain logic | No mocks. Construct real aggregates + value objects. |
| Use case | Mock ports (the interfaces the use case depends on). Use @ghasi/test-utils/port-mocks. Never mock the domain. |
| Adapter | Real third party via Testcontainer or vendor sandbox (Stripe test, Pub/Sub emulator). Mocks only when sandbox doesn't exist (document why). |
| Controller | Spin up a Nest test module with the use case mocked. Or use the integration project to drive end-to-end. |
| Frontend hook | Mock the typed BFF client at the function boundary; use MSW for full HTTP simulation. |
| React component | Render with React Testing Library; query by role + accessible name. Snapshot tests are forbidden except for visual regression (Chromatic). |
Forbidden:
jest.mock(...)of the System Under Test.- Mocking
Date,Math.random,crypto.randomUUIDdirectly. InjectClock,RNG,IdFactoryports per CODING_STANDARDS.md §11. - Mocking the database driver (use Testcontainers).
8. Determinism
- Inject
Clock,RNG,IdFactoryeverywhere they are used. Tests bindFakeClock,SeededRNG,DeterministicIdFactory. - No real network calls in unit tests. The HTTP test helper throws if any unmocked call escapes.
- All async tests use
await; nosetTimeout-based "wait and hope". UsewaitForfrom RTL orvi.runAllTimersAsync. - Snapshot files for serialized aggregates / event payloads live alongside the spec; reviewers diff them.
9. Integration tests with Testcontainers
import { startPostgres, startPubSubEmulator } from '@ghasi/test-utils/containers';
let pg: StartedPostgresContainer;
let pubsub: StartedPubSubEmulator;
beforeAll(async () => {
pg = await startPostgres({ image: 'postgres:16-alpine' });
pubsub = await startPubSubEmulator();
await runMigrations(pg.getConnectionUri());
});
afterAll(async () => {
await pg.stop();
await pubsub.stop();
});
- One container per spec file (not per
it); reuse viabeforeAll. - Each spec gets a clean schema or a
BEGIN; … ROLLBACK;transaction. - Containers run on CI in parallel only when isolated by port + name.
- A failing integration spec must save container logs to
test-results/<service>/<spec>/<container>.logfor triage.
10. Contract tests
| Boundary | Standard |
|---|---|
| BFF → service (HTTP) | Pact consumer test in the BFF; provider verification in the service CI job. The Pact broker is the source of truth. |
| Service → service (HTTP, rare) | Same Pact pattern. |
| Service → service (events) | JSON-Schema in event-schemas/melmastoon/<service>/<aggregate>/<event>/v<N>.json. Producer test asserts emitted payload validates; consumer test asserts incoming payload validates. |
| Frontend → BFF | OpenAPI-generated client in @ghasi/api-clients; client + server share the schema. |
Schema breakage = major version bump on the route or event subject. CI OpenAPI diff gate blocks otherwise.
11. End-to-end tests
| App | Tooling | Project layout |
|---|---|---|
web-meta | Playwright | apps/web-meta/test/e2e/<W-NN>-<slug>.e2e.spec.ts |
web-tenant-booking | Playwright | apps/web-tenant-booking/test/e2e/... |
desktop-backoffice | Playwright (Electron driver) | apps/desktop-backoffice/test/e2e/... |
mobile | Detox | apps/mobile/e2e/... |
Rules:
- Every workflow
W-NNfromdocs/frontend/05-frontend-workflows.mdships with at least one happy-path E2E. - Every workflow that has an offline branch ships an offline-path E2E (network blocked at the test driver layer).
- Every E2E ends with an
axe.run()assertion (zero serious + critical violations). - Every E2E captures a screenshot + trace on failure into
test-results/. - Flaky E2Es are quarantined within 24 h (move to
e2e/quarantine/with a Jira ticket; auto-fail if not unquarantined within 14 days).
12. Accessibility tests
- Component layer (
@ghasi/ui-melmastoon): every component has anaxetest in its Storybook story (@storybook/addon-a11y). - Page layer: every Playwright E2E runs
injectAxe()+checkA11y()on the entry screen and after every state transition that changes the DOM materially. - Mobile: Detox a11y pass (
device.disableSynchronizationaware) on every screen entered. - The full WCAG 2.2 AA matrix lives in
docs/frontend/16-accessibility-checklist.md(added in Wave 1) and maps toaxerule IDs.
13. Performance tests
| Surface | Tool | Budget source |
|---|---|---|
Web (web-meta, web-tenant-booking) | Lighthouse-CI with budgets per route | docs/frontend/04-frontend-design-guidelines.md §11 |
| Mobile | Reassure (render counts + JS thread time) | Same |
| Desktop renderer | Lighthouse in Electron mode | Same |
| Services | k6 (RPS, p95, p99 per endpoint) | services/<name>/SERVICE_OVERVIEW.md SLOs |
| Sync | Custom harness (@ghasi/test-utils/sync) measuring outbox flush + conflict resolution | services/<name>/SYNC_CONTRACT.md |
A PR that worsens any budget by > 10 % fails CI; > 5 % requires a justified ADR.
14. AI / prompt regression tests
- Every prompt in
ai-orchestrator-service/prompts/<id>/v<n>.mdhas a golden eval intest/ai-eval/<id>.eval.ts:- Fixed seed.
- 10–50 representative inputs.
- Pass/fail criteria (string match, JSON schema, numeric threshold, BLEU/ROUGE for free text).
- Eval runs on every PR that touches the prompt file or the orchestrator's routing.
- A new model version is canary'd against the current production version's eval; rollout requires ≥ 95 % parity on critical prompts.
15. Tenant-isolation test (the canary)
Every service includes this exact test (or a near-clone), enforced by the code-reviewer agent:
it('returns no rows for tenant B when seeded for tenant A', async () => {
await seed.aReservation().withTenant(tenantA).build();
await runAs(tenantB, async (ctx) => {
const rows = await reservationsRepo.findAll(ctx);
expect(rows).toHaveLength(0);
});
});
If RLS is misconfigured this is the first thing to fail. Never delete it.
16. Continuous Integration matrix
| Job | Trigger | Surface | Blocks merge |
|---|---|---|---|
lint | Every PR | All | Yes |
typecheck | Every PR | All | Yes |
unit | Every PR | Per package via Turbo filter | Yes |
integration | Every PR | Affected services only | Yes |
contract | Every PR | Affected services + BFFs | Yes |
openapi-diff | Every PR | Services with API changes | Yes (unless major bump approved) |
pact-verify | Every PR | BFF-touching services | Yes |
e2e-web-meta | Every PR touching apps/web-meta or its services | Web | Yes |
e2e-web-tenant-booking | Same as above | Web | Yes |
e2e-desktop | Every PR touching apps/desktop-backoffice or related services | Desktop | Yes |
e2e-mobile | Every PR touching apps/mobile | Mobile | Yes |
lighthouse-ci | Every PR touching apps/web-* | Web | Yes (budgets) |
axe | Layered into all e2e jobs | All UI | Yes (zero serious/critical) |
security-osv | Every PR | All | Yes (no high/critical) |
secret-scan | Every PR | All | Yes |
mutation (Stryker) | Daily, plus on domain/ changes | Per package | Warning, not block (becomes block in Wave 3) |
17. Local developer workflow
pnpm install
pnpm typecheck # full repo
pnpm lint # full repo
pnpm test # unit only (fast)
pnpm test:integration # uses Testcontainers; requires Docker
pnpm test:contract # Pact + schema
pnpm --filter @ghasi/service-reservation test # one package
Before opening a PR, the agent / dev runs pnpm verify which is lint && typecheck && test && test:integration --filter=changed && openapi-diff.
18. Anti-patterns (auto-flagged)
- Snapshot tests on UI markup (use Chromatic visual regression instead).
anyin spec files.- Mocking the SUT (the system under test).
- Time-based waits (
setTimeout(... )). - Order-dependent specs (any spec that depends on the prior spec's leftovers).
- Spec files > 500 LOC (decompose by feature).
- Single spec asserting > 5 expectations (split).
- Testing private methods directly (test through public API).
expect(true).toBe(true)placeholders left in main.- Disabled specs (
it.skip,describe.skip) without a Jira ticket in the comment.
19. Cross-references
- docs/11-testing-strategy-qa.md — long-form rationale and per-domain test heuristics.
- .cursor/rules/80-testing.mdc — short rules pack loaded by AI tools.
- SERVICE_TEMPLATE.md — per-service skeleton (test paths included).
- DEFINITION_OF_DONE.md — the merge checklist.
20. Versioning of this document
Same governance as CODING_STANDARDS.md §19: label testing-standards, sign-off per surface, ADR if loosening, changelog entry. Coverage threshold changes always require an ADR.