maintenance-service · TESTING_STRATEGY
Test pyramid: unit (fast, no I/O) → integration (Postgres + Pub/Sub emulator) → contract (Pact + JSON Schema) → E2E (real Cloud Run staging) → chaos / low-bandwidth scenarios. CI gate: 90%+ branch coverage on
domain/andapplication/, 70%+ oninfrastructure/. Mandatory specs (perSERVICE_TEMPLATE.md) listed in §4.
1. Unit tests — domain layer
Located under domain/__tests__/. No I/O, no clock, no random. Use frozen now, deterministic id factory.
1.1 State machine
- All allowed transitions per matrix in
DOMAIN_MODEL.md§3 are exercised with positive cases. - Every disallowed transition is exercised and asserts
MELMASTOON.MAINTENANCE.INVALID_STATUS_TRANSITION. verifiedandcancelledare terminal: every command on them is rejected withWORK_ORDER_TERMINAL.
1.2 Invariants
| # | Test |
|---|---|
| 2 | severity = critical without assetId/roomId → SEVERITY_REQUIRES_TARGET |
| 3 | Auto-OOO event emitted only when severity ∈ {high, critical} AND target is room/room-attached |
| 4 | Second open WO on same (asset, category) → DUPLICATE_OPEN_WORK_ORDER unless allowDuplicate |
| 5 | Cost line in non-base currency → COST_CURRENCY_MISMATCH |
| 6 | Vendor.channelPreference.primary = 'call_only' → VENDOR_CHANNEL_MISMATCH on automated notify (without manual ack) |
| 8 | PartUsage.quantity > Part.onHand → PART_OUT_OF_STOCK and emits WorkOrderBlocked |
| 9 | nextDueAt strictly monotonic on completion (regression rejected) |
| 10 | Schedule firing twice in same hour-bucket → no second draft WO |
| 11 | Mutation on verified/cancelled → WORK_ORDER_TERMINAL |
| 12 | Stale OCC version → OCC_CONFLICT |
| 13 | relocationRequired only set when overlap and severity high+ |
1.3 Cost rollup
Property-based tests (fast-check):
costRollupalways equalsΣ costLines.amountfor any sequence of mutations.costRollup.currency = tenant.baseCurrencyalways.- Adding a
PartUsageadds exactly onekind: 'part'cost line.
1.4 SLA timer
slaTimer.dueAt = startedAt + targetMinutesexactly.evaluateSlaBreachis idempotent within the same minute (does not double-incrementbreachCount).- A WO transitioned to
resolvedno longer breaches.
1.5 Preventive cadence
nextPreventiveDueAt:
kind: 'time'addseveryDaysin tenant tz, DST-safe.kind: 'run_hours'requiresrunHoursDelta > 0; otherwise next due unchanged.kind: 'composite'returns the earlier of time-based and run-hours-based projections.
2. Unit tests — application layer (use cases)
Located under application/__tests__/. Ports stubbed with hand-written fakes (typed, behaviour-verifiable). One use case = one spec file.
Each use case test asserts:
- Authorization branch (allowed and denied roles).
- Happy path returns expected DTO and emits expected events to outbox port.
- Each failure mode of dependencies (
PART_OUT_OF_STOCK,OCC_CONFLICT,IamClientdenies) returns the right error code. - Side effects (notification dispatch) are best-effort: notification port failure does not fail the use case.
3. Integration tests
Located under integration/. Uses Testcontainers: Postgres 16, Pub/Sub emulator, Memorystore-compatible Redis. Real outbox + relay running.
3.1 Mandatory specs (per SERVICE_TEMPLATE.md)
tenant-isolation.spec.ts
- Two tenants with overlapping ULIDs in scope.
- Insert WOs / Assets / Vendors / Parts / Schedules under tenant A.
- Connect as tenant B → assert empty results from every list endpoint.
- Direct REST call as tenant B for tenant A's WO id → 404.
- Direct DB query without
SET LOCAL app.tenant_id→ returns nothing (RLS deny by default). - Asserts the same on
/api/v1/sync/maintenance/pull.
outbox.spec.ts
- A successful
CreateWorkOrderUseCasewrites the WO row AND the outbox event in one transaction. - Forcing the Pub/Sub publish to fail leaves the outbox row unpublished; the relay later succeeds.
- The outbox row carries the same
tenantIdandcorrelationIdas the request. - Crashing between WO write and Pub/Sub publish does not cause divergence (verified by injecting a process.exit between commit and publish).
inbox.spec.ts
- Replaying the same Pub/Sub
messageIdtwice onmnt.in.housekeeping.maintenance_requiredproduces exactly one WO. - Replaying with a different
messageIdbut same logical content produces a second WO (or collapses per invariant #4 — both verified). - Inbox dedupe table grows monotonically; pruning preserves correctness.
3.2 State machine integration
For every transition: end-to-end through the controller, with a real DB and outbox, asserting the resulting row state, version increment, and exactly the expected outbox event(s).
3.3 Auto-OOO choreography
- Publish
housekeeping.room.maintenance_required.v1with severity hinthigh. - Assert: WO created with
causedRoomBlock=trueandwork_order.room_blocked.v1published. - Publish
property.room.taken_out_of_order.v1for that room → WO inbox handler links it. - Publish
WorkOrderVerified→ outbox publishesroom_release_request.
3.4 Relocation choreography
- Pre-seed
reservation-serviceprojection with a confirmed reservation overlapping a room. - Create high-severity WO on that room.
- Assert:
work_order.relocation_required.v1outbox event fires with the right reservation id.
3.5 Preventive scheduler
- Create a schedule with
nextDueAt= now − 1 min. - Run the scheduler tick.
- Assert one draft WO created, one
preventive.due.v1published,nextDueAtadvanced. - Run the scheduler tick again within the same hour-bucket → no new WO; assert dedupe row hit.
3.6 SLA breach scanner
- Create WO with
sla_due_atin the past. - Run the scanner.
- Assert
WorkOrderSlaBreachedevent published;sla_breached_atset;breach_count= 1. - Re-run the scanner within the same minute → no duplicate event.
3.7 Vendor reminder & escalation
- Create WO assigned to vendor with
channelPreference = whatsapp, no acknowledgement. - Advance time past
vendorReminderMinutes. - Run reminder worker → assert one notification dispatched.
- Repeat past escalation threshold → assert
WorkOrderEscalatedpublished.
3.8 PartUsage atomic decrement
- Concurrently issue two
recordPartUsagecalls each consumingPart.onHandexactly. - Assert exactly one succeeds; the other returns
PART_OUT_OF_STOCK; resultingonHandis correct.
3.9 Vendor invoice → folio choreography
- Record vendor invoice on a resolved WO → assert
vendor.invoice_recorded.v1published. - Simulate
billing.vendor_invoice.posted.v1consumed → assertvendor_invoice_posted_to_folio = true.
3.10 Sync push command flow
- Push
WorkOrder.Resolvewith staleexpectedVersion→ server returnsOCC_CONFLICT. - Push twice with same
commandId→ second returns the first's stored result (idempotent). - Push
WorkOrder.Resolvewith parts that exceed local-stalePart.onHand→ serverPART_OUT_OF_STOCK.
4. Contract tests
4.1 Pact (consumer-driven)
Consumers we publish for: property-service, reservation-service, housekeeping-service, notification-service, billing-service, analytics-service, audit-service, bff-backoffice-service, sync-service.
Each consumer publishes their expected events from us in their Pact broker; our CI verifies every event we publish matches every consumer's expectation.
We are a consumer of: housekeeping-service, lock-integration-service, property-service, reservation-service, staff-service, tenant-service, billing-service. Our Pact files describe what we expect from those services.
4.2 OpenAPI contract
pnpm test:openapiregenerates the OpenAPI spec from NestJS decorators and asserts no diff with the committedopenapi/v1.yaml.- A second job runs
openapi-diffagainst the previously published version and fails on breaking changes.
4.3 Event schema golden tests
- For each event type, a JSON sample lives in
events/__tests__/golden/. - The TS interface, the JSON Schema, and the golden sample are kept in lock-step.
5. End-to-end (E2E) tests
Run nightly on staging Cloud Run + Cloud SQL:
- Provision a fresh test tenant with seed data.
- Run a representative scenario:
- Auto-create WO from housekeeping flag.
- Auto-OOO publishes; property-service flips room.
- Assign vendor (test vendor with mock notification destination).
- Resolve with parts.
- Verify; OOO released.
- Inspect: events landed in expected topics; analytics-service ingested cost; sync pull from a synthetic device returns the WO.
Tools: Playwright (BFF), @melmastoon/test-harness (event harness), gcloud pubsub for assertions.
6. Chaos / low-bandwidth tests
- Pub/Sub partition outage: simulate broker unreachability; verify outbox grows, no events lost; on recovery, all publish.
- Cloud SQL failover: trigger the staging replica failover; verify connection recovery within 30 s and no data loss.
- AI orchestrator timeout: verify all use cases fail-soft.
- High latency push: throttle the device → 50 KB/s with 1.5 s RTT; assert sync push completes within 60 s for a 100-command batch.
- Clock skew: device clock 30 min ahead → assert server warns and accepts; -30 min behind → same.
7. Performance / load
| Scenario | Target |
|---|---|
GET /work-orders (filtered, page 50) | p95 ≤ 400 ms at 200 RPS, 50 tenants concurrent |
POST /work-orders (no AI) | p95 ≤ 600 ms at 50 RPS sustained |
| Outbox relay | publish 1,000 events / s with p99 lag < 5 s |
| Preventive scheduler | process 5,000 due schedules per minute |
| Sync push batch (100 commands) | p95 ≤ 3 s server time |
Locust scripts in perf/ invoked via GitHub Actions on every PR touching application/ or infrastructure/.
8. Test data builders
Builders in test/builders/ (TypeScript, pure). Examples:
const wo = aWorkOrder()
.withTenant('tnt_test_a')
.withProperty('prop_test_kabul_1')
.withRoom('room_test_204')
.withCategory('hvac')
.withSeverity('high')
.withSource('guest_complaint')
.build();
Builders are pure constructors; no I/O. Only used to build inputs — never to bypass use cases.
9. Coverage gates (CI)
| Layer | Branch coverage minimum |
|---|---|
domain/ | 95% |
application/ (use cases + workers) | 90% |
infrastructure/ (adapters) | 70% |
controllers/ | 80% |
| Mandatory specs present | enforced by lint rule |
Below threshold = build fails.
10. Anti-patterns (will not pass review)
- ❌ Tests that hit a real network or production GCP.
- ❌ Tests that import from
infrastructure/while pretending to be unit tests. - ❌ Sleeping in tests (use injected clock).
- ❌ Asserting only on side-effect logs (assert on observable state and emitted events).
- ❌ Skipping mandatory specs.
- ❌ Marking a flaky test
it.skipwithout an issue ref + owner.