Skip to main content

maintenance-service · TESTING_STRATEGY

Test pyramid: unit (fast, no I/O) → integration (Postgres + Pub/Sub emulator) → contract (Pact + JSON Schema) → E2E (real Cloud Run staging) → chaos / low-bandwidth scenarios. CI gate: 90%+ branch coverage on domain/ and application/, 70%+ on infrastructure/. Mandatory specs (per SERVICE_TEMPLATE.md) listed in §4.

1. Unit tests — domain layer

Located under domain/__tests__/. No I/O, no clock, no random. Use frozen now, deterministic id factory.

1.1 State machine

  • All allowed transitions per matrix in DOMAIN_MODEL.md §3 are exercised with positive cases.
  • Every disallowed transition is exercised and asserts MELMASTOON.MAINTENANCE.INVALID_STATUS_TRANSITION.
  • verified and cancelled are terminal: every command on them is rejected with WORK_ORDER_TERMINAL.

1.2 Invariants

#Test
2severity = critical without assetId/roomIdSEVERITY_REQUIRES_TARGET
3Auto-OOO event emitted only when severity ∈ {high, critical} AND target is room/room-attached
4Second open WO on same (asset, category)DUPLICATE_OPEN_WORK_ORDER unless allowDuplicate
5Cost line in non-base currency → COST_CURRENCY_MISMATCH
6Vendor.channelPreference.primary = 'call_only'VENDOR_CHANNEL_MISMATCH on automated notify (without manual ack)
8PartUsage.quantity > Part.onHandPART_OUT_OF_STOCK and emits WorkOrderBlocked
9nextDueAt strictly monotonic on completion (regression rejected)
10Schedule firing twice in same hour-bucket → no second draft WO
11Mutation on verified/cancelledWORK_ORDER_TERMINAL
12Stale OCC version → OCC_CONFLICT
13relocationRequired only set when overlap and severity high+

1.3 Cost rollup

Property-based tests (fast-check):

  • costRollup always equals Σ costLines.amount for any sequence of mutations.
  • costRollup.currency = tenant.baseCurrency always.
  • Adding a PartUsage adds exactly one kind: 'part' cost line.

1.4 SLA timer

  • slaTimer.dueAt = startedAt + targetMinutes exactly.
  • evaluateSlaBreach is idempotent within the same minute (does not double-increment breachCount).
  • A WO transitioned to resolved no longer breaches.

1.5 Preventive cadence

nextPreventiveDueAt:

  • kind: 'time' adds everyDays in tenant tz, DST-safe.
  • kind: 'run_hours' requires runHoursDelta > 0; otherwise next due unchanged.
  • kind: 'composite' returns the earlier of time-based and run-hours-based projections.

2. Unit tests — application layer (use cases)

Located under application/__tests__/. Ports stubbed with hand-written fakes (typed, behaviour-verifiable). One use case = one spec file.

Each use case test asserts:

  • Authorization branch (allowed and denied roles).
  • Happy path returns expected DTO and emits expected events to outbox port.
  • Each failure mode of dependencies (PART_OUT_OF_STOCK, OCC_CONFLICT, IamClient denies) returns the right error code.
  • Side effects (notification dispatch) are best-effort: notification port failure does not fail the use case.

3. Integration tests

Located under integration/. Uses Testcontainers: Postgres 16, Pub/Sub emulator, Memorystore-compatible Redis. Real outbox + relay running.

3.1 Mandatory specs (per SERVICE_TEMPLATE.md)

tenant-isolation.spec.ts

  • Two tenants with overlapping ULIDs in scope.
  • Insert WOs / Assets / Vendors / Parts / Schedules under tenant A.
  • Connect as tenant B → assert empty results from every list endpoint.
  • Direct REST call as tenant B for tenant A's WO id → 404.
  • Direct DB query without SET LOCAL app.tenant_id → returns nothing (RLS deny by default).
  • Asserts the same on /api/v1/sync/maintenance/pull.

outbox.spec.ts

  • A successful CreateWorkOrderUseCase writes the WO row AND the outbox event in one transaction.
  • Forcing the Pub/Sub publish to fail leaves the outbox row unpublished; the relay later succeeds.
  • The outbox row carries the same tenantId and correlationId as the request.
  • Crashing between WO write and Pub/Sub publish does not cause divergence (verified by injecting a process.exit between commit and publish).

inbox.spec.ts

  • Replaying the same Pub/Sub messageId twice on mnt.in.housekeeping.maintenance_required produces exactly one WO.
  • Replaying with a different messageId but same logical content produces a second WO (or collapses per invariant #4 — both verified).
  • Inbox dedupe table grows monotonically; pruning preserves correctness.

3.2 State machine integration

For every transition: end-to-end through the controller, with a real DB and outbox, asserting the resulting row state, version increment, and exactly the expected outbox event(s).

3.3 Auto-OOO choreography

  • Publish housekeeping.room.maintenance_required.v1 with severity hint high.
  • Assert: WO created with causedRoomBlock=true and work_order.room_blocked.v1 published.
  • Publish property.room.taken_out_of_order.v1 for that room → WO inbox handler links it.
  • Publish WorkOrderVerified → outbox publishes room_release_request.

3.4 Relocation choreography

  • Pre-seed reservation-service projection with a confirmed reservation overlapping a room.
  • Create high-severity WO on that room.
  • Assert: work_order.relocation_required.v1 outbox event fires with the right reservation id.

3.5 Preventive scheduler

  • Create a schedule with nextDueAt = now − 1 min.
  • Run the scheduler tick.
  • Assert one draft WO created, one preventive.due.v1 published, nextDueAt advanced.
  • Run the scheduler tick again within the same hour-bucket → no new WO; assert dedupe row hit.

3.6 SLA breach scanner

  • Create WO with sla_due_at in the past.
  • Run the scanner.
  • Assert WorkOrderSlaBreached event published; sla_breached_at set; breach_count = 1.
  • Re-run the scanner within the same minute → no duplicate event.

3.7 Vendor reminder & escalation

  • Create WO assigned to vendor with channelPreference = whatsapp, no acknowledgement.
  • Advance time past vendorReminderMinutes.
  • Run reminder worker → assert one notification dispatched.
  • Repeat past escalation threshold → assert WorkOrderEscalated published.

3.8 PartUsage atomic decrement

  • Concurrently issue two recordPartUsage calls each consuming Part.onHand exactly.
  • Assert exactly one succeeds; the other returns PART_OUT_OF_STOCK; resulting onHand is correct.

3.9 Vendor invoice → folio choreography

  • Record vendor invoice on a resolved WO → assert vendor.invoice_recorded.v1 published.
  • Simulate billing.vendor_invoice.posted.v1 consumed → assert vendor_invoice_posted_to_folio = true.

3.10 Sync push command flow

  • Push WorkOrder.Resolve with stale expectedVersion → server returns OCC_CONFLICT.
  • Push twice with same commandId → second returns the first's stored result (idempotent).
  • Push WorkOrder.Resolve with parts that exceed local-stale Part.onHand → server PART_OUT_OF_STOCK.

4. Contract tests

4.1 Pact (consumer-driven)

Consumers we publish for: property-service, reservation-service, housekeeping-service, notification-service, billing-service, analytics-service, audit-service, bff-backoffice-service, sync-service.

Each consumer publishes their expected events from us in their Pact broker; our CI verifies every event we publish matches every consumer's expectation.

We are a consumer of: housekeeping-service, lock-integration-service, property-service, reservation-service, staff-service, tenant-service, billing-service. Our Pact files describe what we expect from those services.

4.2 OpenAPI contract

  • pnpm test:openapi regenerates the OpenAPI spec from NestJS decorators and asserts no diff with the committed openapi/v1.yaml.
  • A second job runs openapi-diff against the previously published version and fails on breaking changes.

4.3 Event schema golden tests

  • For each event type, a JSON sample lives in events/__tests__/golden/.
  • The TS interface, the JSON Schema, and the golden sample are kept in lock-step.

5. End-to-end (E2E) tests

Run nightly on staging Cloud Run + Cloud SQL:

  1. Provision a fresh test tenant with seed data.
  2. Run a representative scenario:
    • Auto-create WO from housekeeping flag.
    • Auto-OOO publishes; property-service flips room.
    • Assign vendor (test vendor with mock notification destination).
    • Resolve with parts.
    • Verify; OOO released.
  3. Inspect: events landed in expected topics; analytics-service ingested cost; sync pull from a synthetic device returns the WO.

Tools: Playwright (BFF), @melmastoon/test-harness (event harness), gcloud pubsub for assertions.

6. Chaos / low-bandwidth tests

  • Pub/Sub partition outage: simulate broker unreachability; verify outbox grows, no events lost; on recovery, all publish.
  • Cloud SQL failover: trigger the staging replica failover; verify connection recovery within 30 s and no data loss.
  • AI orchestrator timeout: verify all use cases fail-soft.
  • High latency push: throttle the device → 50 KB/s with 1.5 s RTT; assert sync push completes within 60 s for a 100-command batch.
  • Clock skew: device clock 30 min ahead → assert server warns and accepts; -30 min behind → same.

7. Performance / load

ScenarioTarget
GET /work-orders (filtered, page 50)p95 ≤ 400 ms at 200 RPS, 50 tenants concurrent
POST /work-orders (no AI)p95 ≤ 600 ms at 50 RPS sustained
Outbox relaypublish 1,000 events / s with p99 lag < 5 s
Preventive schedulerprocess 5,000 due schedules per minute
Sync push batch (100 commands)p95 ≤ 3 s server time

Locust scripts in perf/ invoked via GitHub Actions on every PR touching application/ or infrastructure/.

8. Test data builders

Builders in test/builders/ (TypeScript, pure). Examples:

const wo = aWorkOrder()
.withTenant('tnt_test_a')
.withProperty('prop_test_kabul_1')
.withRoom('room_test_204')
.withCategory('hvac')
.withSeverity('high')
.withSource('guest_complaint')
.build();

Builders are pure constructors; no I/O. Only used to build inputs — never to bypass use cases.

9. Coverage gates (CI)

LayerBranch coverage minimum
domain/95%
application/ (use cases + workers)90%
infrastructure/ (adapters)70%
controllers/80%
Mandatory specs presentenforced by lint rule

Below threshold = build fails.

10. Anti-patterns (will not pass review)

  • ❌ Tests that hit a real network or production GCP.
  • ❌ Tests that import from infrastructure/ while pretending to be unit tests.
  • ❌ Sleeping in tests (use injected clock).
  • ❌ Asserting only on side-effect logs (assert on observable state and emitted events).
  • ❌ Skipping mandatory specs.
  • ❌ Marking a flaky test it.skip without an issue ref + owner.