maintenance-service · TESTING_STRATEGY

Test pyramid: unit (fast, no I/O) → integration (Postgres + Pub/Sub emulator) → contract (Pact + JSON Schema) → E2E (real Cloud Run staging) → chaos / low-bandwidth scenarios. CI gate: 90%+ branch coverage on domain/ and application/, 70%+ on infrastructure/. Mandatory specs (per SERVICE_TEMPLATE.md) listed in §4.

1. Unit tests — domain layer

Located under domain/__tests__/. No I/O, no clock, no random. Use frozen now, deterministic id factory.

1.1 State machine

All allowed transitions per matrix in DOMAIN_MODEL.md §3 are exercised with positive cases.
Every disallowed transition is exercised and asserts MELMASTOON.MAINTENANCE.INVALID_STATUS_TRANSITION.
verified and cancelled are terminal: every command on them is rejected with WORK_ORDER_TERMINAL.

1.2 Invariants

#	Test
2	`severity = critical` without `assetId/roomId` → `SEVERITY_REQUIRES_TARGET`
3	Auto-OOO event emitted only when severity ∈ {high, critical} AND target is room/room-attached
4	Second open WO on same `(asset, category)` → `DUPLICATE_OPEN_WORK_ORDER` unless `allowDuplicate`
5	Cost line in non-base currency → `COST_CURRENCY_MISMATCH`
6	`Vendor.channelPreference.primary = 'call_only'` → `VENDOR_CHANNEL_MISMATCH` on automated notify (without manual ack)
8	`PartUsage.quantity > Part.onHand` → `PART_OUT_OF_STOCK` and emits `WorkOrderBlocked`
9	`nextDueAt` strictly monotonic on completion (regression rejected)
10	Schedule firing twice in same hour-bucket → no second draft WO
11	Mutation on `verified`/`cancelled` → `WORK_ORDER_TERMINAL`
12	Stale OCC version → `OCC_CONFLICT`
13	`relocationRequired` only set when overlap and severity high+

1.3 Cost rollup

Property-based tests (fast-check):

costRollup always equals Σ costLines.amount for any sequence of mutations.
costRollup.currency = tenant.baseCurrency always.
Adding a PartUsage adds exactly one kind: 'part' cost line.

1.4 SLA timer

slaTimer.dueAt = startedAt + targetMinutes exactly.
evaluateSlaBreach is idempotent within the same minute (does not double-increment breachCount).
A WO transitioned to resolved no longer breaches.

1.5 Preventive cadence

nextPreventiveDueAt:

kind: 'time' adds everyDays in tenant tz, DST-safe.
kind: 'run_hours' requires runHoursDelta > 0; otherwise next due unchanged.
kind: 'composite' returns the earlier of time-based and run-hours-based projections.

2. Unit tests — application layer (use cases)

Located under application/__tests__/. Ports stubbed with hand-written fakes (typed, behaviour-verifiable). One use case = one spec file.

Each use case test asserts:

Authorization branch (allowed and denied roles).
Happy path returns expected DTO and emits expected events to outbox port.
Each failure mode of dependencies (PART_OUT_OF_STOCK, OCC_CONFLICT, IamClient denies) returns the right error code.
Side effects (notification dispatch) are best-effort: notification port failure does not fail the use case.

3. Integration tests

Located under integration/. Uses Testcontainers: Postgres 16, Pub/Sub emulator, Memorystore-compatible Redis. Real outbox + relay running.

3.1 Mandatory specs (per `SERVICE_TEMPLATE.md`)

`tenant-isolation.spec.ts`

Two tenants with overlapping ULIDs in scope.
Insert WOs / Assets / Vendors / Parts / Schedules under tenant A.
Connect as tenant B → assert empty results from every list endpoint.
Direct REST call as tenant B for tenant A's WO id → 404.
Direct DB query without SET LOCAL app.tenant_id → returns nothing (RLS deny by default).
Asserts the same on /api/v1/sync/maintenance/pull.

`outbox.spec.ts`

A successful CreateWorkOrderUseCase writes the WO row AND the outbox event in one transaction.
Forcing the Pub/Sub publish to fail leaves the outbox row unpublished; the relay later succeeds.
The outbox row carries the same tenantId and correlationId as the request.
Crashing between WO write and Pub/Sub publish does not cause divergence (verified by injecting a process.exit between commit and publish).

`inbox.spec.ts`

Replaying the same Pub/Sub messageId twice on mnt.in.housekeeping.maintenance_required produces exactly one WO.
Replaying with a different messageId but same logical content produces a second WO (or collapses per invariant #4 — both verified).
Inbox dedupe table grows monotonically; pruning preserves correctness.

3.2 State machine integration

For every transition: end-to-end through the controller, with a real DB and outbox, asserting the resulting row state, version increment, and exactly the expected outbox event(s).

3.3 Auto-OOO choreography

Publish housekeeping.room.maintenance_required.v1 with severity hint high.
Assert: WO created with causedRoomBlock=true and work_order.room_blocked.v1 published.
Publish property.room.taken_out_of_order.v1 for that room → WO inbox handler links it.
Publish WorkOrderVerified → outbox publishes room_release_request.

3.4 Relocation choreography

Pre-seed reservation-service projection with a confirmed reservation overlapping a room.
Create high-severity WO on that room.
Assert: work_order.relocation_required.v1 outbox event fires with the right reservation id.

3.5 Preventive scheduler

Create a schedule with nextDueAt = now − 1 min.
Run the scheduler tick.
Assert one draft WO created, one preventive.due.v1 published, nextDueAt advanced.
Run the scheduler tick again within the same hour-bucket → no new WO; assert dedupe row hit.

3.6 SLA breach scanner

Create WO with sla_due_at in the past.
Run the scanner.
Assert WorkOrderSlaBreached event published; sla_breached_at set; breach_count = 1.
Re-run the scanner within the same minute → no duplicate event.

3.7 Vendor reminder & escalation

Create WO assigned to vendor with channelPreference = whatsapp, no acknowledgement.
Advance time past vendorReminderMinutes.
Run reminder worker → assert one notification dispatched.
Repeat past escalation threshold → assert WorkOrderEscalated published.

3.8 PartUsage atomic decrement

Concurrently issue two recordPartUsage calls each consuming Part.onHand exactly.
Assert exactly one succeeds; the other returns PART_OUT_OF_STOCK; resulting onHand is correct.

3.9 Vendor invoice → folio choreography

Record vendor invoice on a resolved WO → assert vendor.invoice_recorded.v1 published.
Simulate billing.vendor_invoice.posted.v1 consumed → assert vendor_invoice_posted_to_folio = true.

3.10 Sync push command flow

Push WorkOrder.Resolve with stale expectedVersion → server returns OCC_CONFLICT.
Push twice with same commandId → second returns the first's stored result (idempotent).
Push WorkOrder.Resolve with parts that exceed local-stale Part.onHand → server PART_OUT_OF_STOCK.

4. Contract tests

4.1 Pact (consumer-driven)

Consumers we publish for: property-service, reservation-service, housekeeping-service, notification-service, billing-service, analytics-service, audit-service, bff-backoffice-service, sync-service.

Each consumer publishes their expected events from us in their Pact broker; our CI verifies every event we publish matches every consumer's expectation.

We are a consumer of: housekeeping-service, lock-integration-service, property-service, reservation-service, staff-service, tenant-service, billing-service. Our Pact files describe what we expect from those services.

4.2 OpenAPI contract

pnpm test:openapi regenerates the OpenAPI spec from NestJS decorators and asserts no diff with the committed openapi/v1.yaml.
A second job runs openapi-diff against the previously published version and fails on breaking changes.

4.3 Event schema golden tests

For each event type, a JSON sample lives in events/__tests__/golden/.
The TS interface, the JSON Schema, and the golden sample are kept in lock-step.

5. End-to-end (E2E) tests

Run nightly on staging Cloud Run + Cloud SQL:

Provision a fresh test tenant with seed data.
Run a representative scenario:
- Auto-create WO from housekeeping flag.
- Auto-OOO publishes; property-service flips room.
- Assign vendor (test vendor with mock notification destination).
- Resolve with parts.
- Verify; OOO released.
Inspect: events landed in expected topics; analytics-service ingested cost; sync pull from a synthetic device returns the WO.

Tools: Playwright (BFF), @melmastoon/test-harness (event harness), gcloud pubsub for assertions.

6. Chaos / low-bandwidth tests

Pub/Sub partition outage: simulate broker unreachability; verify outbox grows, no events lost; on recovery, all publish.
Cloud SQL failover: trigger the staging replica failover; verify connection recovery within 30 s and no data loss.
AI orchestrator timeout: verify all use cases fail-soft.
High latency push: throttle the device → 50 KB/s with 1.5 s RTT; assert sync push completes within 60 s for a 100-command batch.
Clock skew: device clock 30 min ahead → assert server warns and accepts; -30 min behind → same.

7. Performance / load

Scenario	Target
`GET /work-orders` (filtered, page 50)	p95 ≤ 400 ms at 200 RPS, 50 tenants concurrent
`POST /work-orders` (no AI)	p95 ≤ 600 ms at 50 RPS sustained
Outbox relay	publish 1,000 events / s with p99 lag < 5 s
Preventive scheduler	process 5,000 due schedules per minute
Sync push batch (100 commands)	p95 ≤ 3 s server time

Locust scripts in perf/ invoked via GitHub Actions on every PR touching application/ or infrastructure/.

8. Test data builders

Builders in test/builders/ (TypeScript, pure). Examples:

const wo = aWorkOrder()
  .withTenant('tnt_test_a')
  .withProperty('prop_test_kabul_1')
  .withRoom('room_test_204')
  .withCategory('hvac')
  .withSeverity('high')
  .withSource('guest_complaint')
  .build();

Builders are pure constructors; no I/O. Only used to build inputs — never to bypass use cases.

9. Coverage gates (CI)

Layer	Branch coverage minimum
`domain/`	95%
`application/` (use cases + workers)	90%
`infrastructure/` (adapters)	70%
`controllers/`	80%
Mandatory specs present	enforced by lint rule

Below threshold = build fails.

10. Anti-patterns (will not pass review)

❌ Tests that hit a real network or production GCP.
❌ Tests that import from infrastructure/ while pretending to be unit tests.
❌ Sleeping in tests (use injected clock).
❌ Asserting only on side-effect logs (assert on observable state and emitted events).
❌ Skipping mandatory specs.
❌ Marking a flaky test it.skip without an issue ref + owner.

1. Unit tests — domain layer​

1.1 State machine​

1.2 Invariants​

1.3 Cost rollup​

1.4 SLA timer​

1.5 Preventive cadence​

2. Unit tests — application layer (use cases)​

3. Integration tests​

3.1 Mandatory specs (per SERVICE_TEMPLATE.md)​

tenant-isolation.spec.ts​

outbox.spec.ts​

inbox.spec.ts​

3.2 State machine integration​

3.3 Auto-OOO choreography​

3.4 Relocation choreography​

3.5 Preventive scheduler​

3.6 SLA breach scanner​

3.7 Vendor reminder & escalation​

3.8 PartUsage atomic decrement​

3.9 Vendor invoice → folio choreography​

3.10 Sync push command flow​

4. Contract tests​

4.1 Pact (consumer-driven)​

4.2 OpenAPI contract​

4.3 Event schema golden tests​

5. End-to-end (E2E) tests​

6. Chaos / low-bandwidth tests​

7. Performance / load​

8. Test data builders​

9. Coverage gates (CI)​

10. Anti-patterns (will not pass review)​