11 — Testing & QA Strategy

Companion: 01 Product Overview · 02 Enterprise Architecture · 03 Microservices Catalog · 04 Event-Driven Architecture · 05 API Design · 06 Data Models · 07 Security & Tenancy · 08 AI Architecture · 09 Lock & Key Integration · 10 Payments · Frontend Web/Mobile · Frontend Desktop (Electron) · Definition of Done · 07 Epics & Stories

Stack reminder (non-negotiable). Desktop is Electron (Node 20 main + Chromium renderer + Vite + React + better-sqlite3 + ONNX Runtime Node). Cloud is Google Cloud Platform (Cloud Run, Cloud SQL, Pub/Sub, Memorystore, Cloud Storage, Secret Manager, Vertex AI, BigQuery). There is no Tauri, no AWS, no Azure. Substitutions require an explicit unanimous ADR.

This document is the canonical, implementation-grade testing strategy for Ghasi Melmastoon. Every section is enforced by CI, runbooks, or scheduled jobs. Skipping a category requires an ADR — never an opinion.

1. Goal & Quality Bar

Testing is the load-bearing structure that lets a small team ship a multi-tenant, AI-first, offline-first hotel SaaS into low-resource markets without losing reservations, double-charging cards, leaking guest data, or stranding a guest at a locked door. The quality bar is therefore not "tests pass" — it is evidence that the system behaves correctly under all the operating modes from docs/01-product-overview.md §9, across all 22 services, on every supported surface and locale.

1.1 Coverage targets (merge-blocking)

Surface / domain	Unit line	Unit branch	Mutation	Integration	Notes
Domain aggregates (every service)	≥ 95%	≥ 95%	≥ 75%	n/a	`/domain/**`; pure TS, no I/O
Value objects	100%	100%	≥ 85%	n/a	Branded IDs, money, FX, AIProvenance, Locale
Domain services	≥ 90%	≥ 90%	≥ 70%	required	Policies, specifications
Application use-cases	≥ 85%	≥ 85%	≥ 65%	required	Commands, queries, sagas
Infrastructure adapters	≥ 80%	≥ 75%	≥ 60%	required	Repos, vendor adapters
payment-gateway-service	100%	100%	≥ 90%	required + sandbox	Money never wrong
lock-integration-service	100%	100%	≥ 90%	required + vendor-mock	Locked doors are loud failures
iam-service	100%	100%	≥ 90%	required + security	AuthZ bypass = breach
Sync logic (`bff-backoffice-service` + per-service sync ports)	100%	100%	≥ 90%	required + chaos	Lost mutations = lost revenue
Frontend components	≥ 80%	≥ 75%	n/a	n/a	Unit + component
BFFs	≥ 85%	≥ 80%	≥ 65%	required	Aggregation correctness

1.2 Journey & contract coverage (merge-blocking)

Every P0 journey (J-01 … J-12, J-19, J-20 — see docs/journeys/01-core-user-journeys.md §2) has an automated end-to-end test that runs on every PR (smoke) and nightly (full happy + failure paths).
Every offline-capable flow has an offline E2E test exercising at least one of: 1h offline, 24h offline, conflict simulation, outbox flush ordering.
Every service-to-service edge in the context map (see docs/03-microservices/README.md) has a Pact contract published and verified on every provider PR.
Every event in the schema registry has a producer conformance test and at least one consumer projection test.
Every saga (booking, cancellation, check-in, check-out, date-change, listing-publish, AI-HITL) has happy-path, every-compensation, idempotent-retry, and partial-failure-injection tests.

1.3 Quality philosophy

Every production incident is a missing test. Every missing test is a missed risk conversation. Every flake left enabled is a slow-motion outage.

Four guiding stances:

Tests are executable specifications — domain tests read like the ubiquitous language; AC clauses from 07-epics-and-user-stories.md map 1:1 to a Given/When/Then.
Shift-left and shift-right equally — pre-merge gates catch regressions; production observability + synthetic monitors catch emergent behaviour; both feed the backlog.
AI is a first-class testable surface — non-determinism does not excuse non-testing; it raises the bar (regression suites, eval harness, structured-output assertions, cost guardrails).
Offline is a tier-0 invariant — any test plan that only works online is incomplete and merge-blocked.

1.4 Scope

In-scope: all 22 services in docs/03-microservices/README.md; the four user-facing surfaces (Consumer Web, Consumer Mobile, Tenant Booking Web/Mobile, Electron Desktop) and the platform Control Plane; all event producers/consumers and projections; the desktop sync engine and ONNX edge AI worker; all infra-as-code (Terraform, Helm).

Out-of-scope: third-party vendor SaaS internals (we test our contracts with them, not their systems); manufacturer hardware-level qualification of partner lock encoders.

2. Test Pyramid

The classic pyramid is insufficient for an event-driven, offline-first, AI-native system. Ghasi Melmastoon uses an extended pyramid with orthogonal axes:

                          ┌──────────────────────────┐
                          │   Chaos / Replay / DR    │   weekly · staging · monthly · prod
                          ├──────────────────────────┤
                          │  E2E Journeys (J-01..22) │   ~250 tests · < 18 min smoke
                          ├──────────────────────────┤
                          │ Contract (Pact + events) │   ~600 tests · < 3 min
                          ├──────────────────────────┤
                          │  Integration (Testcont.) │   ~2,200 tests · < 12 min
                          ├──────────────────────────┤
                          │       Unit (domain)      │   ~14,000+ tests · < 4 min parallel
                          └──────────────────────────┘

  Orthogonal axes (apply across multiple tiers):
   ┌─────────┬────────────┬──────────┬─────────┬──────┬────────┬──────────────┐
   │  A11y   │  Security  │  AI eval │ Offline │ Perf │  Sync  │ Localisation │
   └─────────┴────────────┴──────────┴─────────┴──────┴────────┴──────────────┘

2.1 Volume targets

Tier	Approx count target	Wall-clock budget	Frequency
Unit	12 k – 18 k	< 4 min (parallel)	every commit
Integration	1.5 k – 2.5 k	< 12 min	every PR
Contract	400 – 700	< 3 min	every PR
E2E (smoke)	≤ 80 critical	< 18 min	every PR
E2E (full)	200 – 250	< 60 min	nightly
Offline	25 – 40	< 25 min	nightly
Performance (k6)	30+ scenarios	nightly	nightly + pre-release
Chaos	15+ experiments	weekly staging / monthly prod	weekly / monthly
AI eval	1 k+ prompts × models	on prompt change + nightly	continuous
A11y (axe)	every changed surface	< 90 s per surface	every PR
Visual regression	every component story + key pages	< 5 min	every PR
Security (SAST/DAST/SCA/secrets)	continuous	< 8 min PR + nightly DAST	every PR + nightly

2.2 Why this shape

Domain unit tests are cheap, deterministic, and fast — they should dominate by count.
Integration tests prove our infra wiring (RLS, outbox, Postgres queries) and dominate by value per test but not by count (slower).
Contract tests are mandatory because the system is event-driven and BFF-fronted; no other layer catches the same class of regression.
E2E proves journeys, but is too slow and brittle to be the safety net — it is the last line of defence.
Orthogonal axes (A11y, AI eval, Offline) cut across all tiers because they are correctness concerns, not test categories.

3. Test Types — Definitions

Type	What it proves	Tooling	Boundary
Unit	A function, value object, or aggregate behaves to spec in isolation. Zero I/O, zero time-dep unless injected.	Vitest (default), Jest where required by tooling	`/domain/`, `/lib/`
Component	A React component renders, handles input, and emits events to spec.	Vitest + Testing Library (React) + jsdom	`/components/**`
Integration	A service module talks to real Postgres + Redis + Pub/Sub emulator and writes/reads correctly.	Vitest + Testcontainers (Postgres 16, Redis 7, Pub/Sub emulator, MinIO/Cloud Storage emulator)	`/application/`, `/infrastructure/`
Contract	A consumer of an HTTP API or event will not break a producer (or vice versa).	Pact (HTTP), Schema Registry conformance (events)	service ⇄ service
E2E (web/mobile)	A user can complete a journey through real surfaces against a deployed environment.	Playwright (web), Detox (mobile), shared journey IDs J-NN	full surface
E2E (desktop, Electron)	A staff member can run a journey through the Electron app, online and offline.	Playwright Electron + spectron-style harness	Electron renderer + main
Offline	An offline scenario produces correct mutations and reconciles correctly on sync.	Playwright Electron + custom sync harness + Toxiproxy	Electron + bff-backoffice
Performance	Throughput and latency meet SLOs under realistic load.	k6 + custom scenarios; Lighthouse for web vitals; React Native perf tools	API + frontends
Security	Code, dependencies, and runtime surfaces have no known critical/high vulnerabilities.	Snyk (SCA), gitleaks (secrets), CodeQL (SAST), OWASP ZAP (DAST), pen-test pre-launch	every layer
A11y	Surfaces meet WCAG 2.2 AA across locales and assistive tech.	axe-core (axe-playwright + axe-react), NVDA smoke (Windows for Electron)	every surface
Visual regression	Pixel-diff per component story and key page in LTR + RTL.	Chromatic + Storybook	every component + key pages
AI eval	An AI capability meets a behaviour band (correctness, safety, format, cost).	YAML test cases run by the AI eval harness against `ai-orchestrator-service`	AI gateway
Chaos	The system degrades and recovers under network/pod/peripheral faults.	Pumba (containers), Toxiproxy (network), custom Kubernetes/CR fault injection	infra
Mutation	Tests actually catch defects (kill mutants), not just exercise lines.	Stryker	`/domain/`, `/lib/`
Compliance	Tax, audit-log, residency, and erasure invariants hold per jurisdiction.	Custom test packs per jurisdiction	`pricing/billing/reporting/iam`
Localisation	RTL/LTR + locale-aware formatting + numerals work end-to-end.	Cross-locale test runner + axe + visual regression	every frontend
Synthetic monitor	Production keeps being correct from outside, not just from inside.	Synthetic probes (Cloud Monitoring) + dedicated test tenant	production

4. Tooling Stack

Concern	Tool	Why it is the chosen tool
TypeScript unit	Vitest	Native ESM, Vite-aligned, fast watch, parity with desktop renderer build
Where Vitest can't	Jest	Used in legacy or RN packages where ecosystem requires it; no new code added
Component / DOM	Testing Library (React, React Native)	Behaviour-first, no implementation-detail snapshots
Web E2E	Playwright	Multi-browser, trace + video + screenshot, native parallelism
Desktop E2E	Playwright Electron	Drives the real Electron app; can intercept main + renderer
Mobile E2E	Detox (RN)	Reliable on physical + emulator; gray-box harness for RN
Performance	k6	TypeScript-friendly, scenario-based, easy to integrate into CI
Frontend perf	Lighthouse CI + RUM	Enforced budgets per page on tenant booking + consumer surfaces
Contract — HTTP	Pact (consumer + provider, broker hosted)	Consumer-driven contracts; broker integrates with PR gate
Contract — events	Schema Registry conformance + golden samples	Backward-compatibility enforced
API smoke	Postman / Newman	Hand-curated smoke for shared APIs in staging
DAST	OWASP ZAP	Free, scriptable, fits CI; baselines + delta scans
SCA	Snyk	Dependencies, container images
SAST	CodeQL	First-party, GitHub-hosted, broad TS coverage
Secrets	gitleaks	Pre-commit + CI
A11y	axe-core (axe-playwright + axe-react)	WCAG 2.2 AA rules; integrated into Playwright
Screen reader smoke	NVDA on Windows for Electron; VoiceOver macOS smoke	Manual smoke per release wave
Visual regression	Chromatic + Storybook	RTL + LTR stories; review queue with team approval
Chaos network	Pumba + Toxiproxy	Container + network-level fault injection
Mutation	Stryker	TS-native, CI-friendly
AI eval harness	YAML cases + custom runner against `ai-orchestrator-service`	Provenance-aware, cost-aware, regression-aware
Coverage	c8 (V8) + Vitest report	Native, per-package thresholds

All tooling is wrapped behind workspace npm scripts (pnpm test:unit, pnpm test:integration, …) and CI workflows. Engineers do not interact with tools directly.

5. Per-Service Test Pattern

Every backend service repository follows the same test layout. Deviations require ADR.

services/<service-name>/
├── src/
│   ├── presentation/
│   ├── application/
│   ├── domain/
│   └── infrastructure/
└── test/
    ├── unit/                       # mirrors src/, fast, no I/O
    │   ├── domain/
    │   └── application/
    ├── integration/                # Testcontainers; Postgres + Pub/Sub emulator
    │   ├── outbox.spec.ts          # mandatory
    │   ├── inbox.spec.ts           # mandatory
    │   ├── tenant-isolation.spec.ts # mandatory
    │   └── <use-case>.spec.ts
    ├── contract/                   # Pact provider + consumer
    │   ├── provider.spec.ts
    │   └── consumer.spec.ts
    ├── e2e/                        # Playwright API runner against staging
    │   └── <journey>.spec.ts
    └── fixtures/
        ├── builders/               # aLearner().with...().build() style
        └── seeds/

5.1 What is required per service

Unit tests for every aggregate, every value object, every domain service.
Integration tests for every use-case (command/query handler) using Testcontainers for Postgres 16, Redis 7, Pub/Sub emulator (gcr.io/google.com/cloudsdktool/cloud-sdk with gcloud beta emulators pubsub start), MinIO for Cloud Storage, and Mailpit for email send capture where applicable.
Contract tests for every API the service consumes (Pact consumer) and every API/event the service produces (Pact provider + schema registry conformance).
E2E tests for every journey the service participates in, run via Playwright API runner against the deployed staging environment.
Coverage thresholds enforced at the package level in CI per the table in §1.1.

5.2 Mandatory test files (all services)

File	Purpose
`tenant-isolation.spec.ts`	Two tenants seeded with identical entity IDs; cross-tenant reads must return zero rows; direct ID access must be blocked by RLS.
`outbox.spec.ts`	Outbox is transactional with the aggregate write; flusher publishes once; restart after kill produces no duplicates.
`inbox.spec.ts`	Consumer dedupes by message ID; out-of-order delivery is handled per-aggregate; replays are idempotent.
`idempotency.spec.ts`	Duplicate idempotency-key returns the original response within the TTL; mismatched payload returns conflict.

5.3 Coverage thresholds in CI

vitest.config.ts per package contains:

coverage: {
  reporter: ['text', 'lcov', 'html'],
  thresholds: {
    'src/domain/**': { lines: 95, branches: 95, functions: 95, statements: 95 },
    'src/application/**': { lines: 85, branches: 85, functions: 85, statements: 85 },
    'src/infrastructure/**': { lines: 80, branches: 75, functions: 80, statements: 80 },
  },
  exclude: ['**/*.d.ts', '**/migrations/**', '**/openapi.json'],
}

CI fails the job on any miss. Mutation thresholds are enforced separately by Stryker.

5.4 Builders, fakes, and clocks

Data builders in __builders__/ (e.g., aReservation().forTenant(tA).withRooms(2).withPolicy('flex24').build()).
Time is injected via a Clock port; tests use FakeClock at a fixed ISO-8601 instant.
IDs via IdFactory; tests use SeededIdFactory for determinism.
Money as bigint micro-units everywhere; MoneyVO ensures no float math.
AI via the AIClient port; tests inject a RecordingAIClient with golden responses.

6. Multi-Tenant Isolation Tests (mandatory)

Multi-tenancy is enforced at three layers (docs/02-enterprise-architecture.md) — domain, DB (Postgres RLS), and API (request context). Every layer must be tested.

6.1 Required test set per service

Test	What it proves
`cross_tenant_read_blocked`	A query as Tenant A returns zero rows from Tenant B's table even when filters are absent.
`cross_tenant_mutation_blocked`	An update of Tenant B's row as Tenant A is rejected by RLS, never silently dropped.
`direct_id_access_blocked`	A GET by primary ID belonging to Tenant B as Tenant A returns 404 (not 403, to avoid existence leaks).
`tenant_context_required`	Every endpoint rejects requests missing `X-Tenant-Id` or with a JWT `tids` mismatch.
`bulk_admin_requires_global_scope`	Bulk endpoints require `X-Tenant-Scope: global` and platform-admin claim.
`event_payload_carries_tenantid`	Every event published carries `tenantId` matching the producing aggregate's tenant.
`consumer_drops_mismatched_tenant`	A consumer receiving an event from another tenant context does not project.
`cross_tenant_search_only_via_search_aggregation`	Only `search-aggregation-service` may read across tenants; only over `cross_tenant_searchable=true` fields.
`rls_bypass_attempt_logged`	Setting `set role` or `SET LOCAL row_security = off` is forbidden in app code; tests assert the absence at the migration layer; if attempted at runtime, an audit alarm fires.
`service_to_service_propagation`	Service A → Service B propagates `tenant_id` in headers and trace context; B rejects mismatch.

6.2 Why not skip — ever

Tenant isolation is the single highest-blast-radius defect class in this product. The seven-figure-incident risk justifies the marginal CI minute. Disabling these tests requires a risk-accepted ADR co-signed by Security and the service owner.

6.3 Sample isolation test (excerpt)

describe('reservation-service · tenant isolation', () => {
  it('blocks cross-tenant read with identical aggregate IDs', async () => {
    const tenantA = aTenant().build();
    const tenantB = aTenant().build();
    const sharedId = ReservationId.generate();
    await seedReservation({ id: sharedId, tenantId: tenantA.id });
    await seedReservation({ id: sharedId, tenantId: tenantB.id });

    const resA = await api.as(tenantA).get(`/api/v1/reservations/${sharedId}`);
    expect(resA.body.tenantId).toBe(tenantA.id);

    const resBAsA = await api.as(tenantA).get(`/api/v1/reservations/${sharedId}?explicitTenant=${tenantB.id}`);
    expect(resBAsA.status).toBe(404); // never 403, never 200
  });
});

7. Saga Tests

Sagas are the integration-correctness backbone. Each named saga in docs/04-event-driven-architecture.md has a dedicated suite.

7.1 Sagas in scope

Saga	Participants	Compensation chain
SAGA-MEL-BOOKING-CONFIRM	reservation → inventory(hold) → pricing(quote) → payment-gateway(intent) → reservation(confirm) → lock-integration(issue@check-in) → notification	release hold · refund · revoke key
SAGA-MEL-CANCELLATION	reservation → billing(refund-eligibility) → payment-gateway(refund) → inventory(release) → lock-integration(revoke) → notification	reverse refund (rare) · re-allocate
SAGA-MEL-CHECKIN	reservation(check-in) → lock-integration(issue) → billing(open-folio) → notification	revoke key on rollback
SAGA-MEL-CHECKOUT	reservation(check-out) → lock-integration(revoke) → housekeeping(queue-turnover) → billing(close-folio) → notification	re-open folio · cancel turnover
SAGA-MEL-DATE-CHANGE	reservation(modify) → inventory(re-hold) → pricing(re-quote) → billing(adjust) → lock-integration(update) → notification	restore prior dates · refund delta
SAGA-MEL-LISTING-PUBLISH	property → theme-config → pricing → search-aggregation → bff-consumer (cache invalidation)	unpublish on partial failure
SAGA-MEL-AI-HITL	originating service → ai-orchestrator(propose+provenance) → human review → originating service(commit with decisionId)	discard proposal · audit

7.2 Required test cases per saga

For every saga, four classes of tests are mandatory:

Happy path. Each step completes; the final state is reachable; the full chain of events is published in order; the final aggregate carries the expected snapshot.
Per-step compensation. Inject a failure at step N for each N; the compensation chain runs in reverse; the system returns to a consistent state; the original idempotency key is honoured for retries.
Idempotent retry. Redeliver each step's event; downstream consumers do not double-apply; outbox publishers do not double-publish.
Partial failure injection. Combine two faults in different steps (e.g., payment succeeds but lock issuance flakes twice then succeeds); ensure the saga reaches eventual success or a deterministic terminal failure.

7.3 Saga test harness

A shared @ghasi/saga-harness package wraps Pub/Sub emulator + Postgres + a fake-clock + injectable per-step failure switches. A test reads:

runSaga('SAGA-MEL-BOOKING-CONFIRM', {
  givenInventoryAvailable: true,
  givenQuoteValid: true,
  whenPaymentFailsAtStep: 'capture',
  thenStateAfterCompensation: 'cancelled_by_compensation',
  thenEventsPublished: [
    'reservation.held.v1',
    'payment.intent.created.v1',
    'payment.failed.v1',
    'inventory.hold.released.v1',
    'reservation.confirm_failed.v1',
  ],
});

8. Sync & Offline Tests (Electron desktop)

Sync correctness is the second-highest-blast-radius defect class. Offline tests are first-class CI citizens, not ad-hoc.

8.1 Required scenario coverage

ID	Scenario	Expected behaviour
O-01	Desktop offline 1h, 5 reservations modified, then reconnect	All 5 mutations reach server in original order; no duplicates; outbox empty after flush
O-02	Desktop offline 24h, 50 housekeeping updates + 10 walk-ins + 3 cash-on-arrival check-ins, then reconnect	All applied; conflicts surfaced if any per per-aggregate policy; sync completes within 5 minutes on a 1 Mbps link
O-03	Desktop offline 7d (max grace)	App still usable for read + mutate; clear UI banner at 5d/6d/7d boundaries; on reconnect, sync proceeds in pages
O-04	Conflict simulation: rate plan changed both server-side and offline desktop-side	Per-aggregate policy applied; rate-plan policy is "server wins on monetary fields, last-writer-wins on display name"; operator notified for material conflicts
O-05	Conflict: same reservation date-changed by two devices offline	First flusher wins; second device receives `409 conflict`, surfaces the diff, lets operator decide
O-06	Outbox flush ordering	Events flush FIFO; a failing event blocks its dependents but not unrelated outbox queues
O-07	Idempotency on retry	Same outbox row resubmitted; server's inbox dedupes; only one application
O-08	Encrypted SQLite integrity	Power-loss mid-transaction; on restart, WAL replays cleanly; no partial commits
O-09	Lock issuance offline (vendor cached creds)	Credential created locally, card encoded, issuance event staged; on reconnect, central record reconciles; cache age limits respected
O-10	Cash-on-arrival check-in offline	Check-in proceeds; cash drawer reflects deposit; reservation transitions; on reconnect, all events reach server in correct order
O-11	Conflict on housekeeping assignment	LWW per policy; the loser's UI updates within 60 s
O-12	Sync paused under low bandwidth	Throttle activates ≤ 256 kbps; large media deferred; small mutations still flush
O-13	Sync recovery shows accurate progress	Operator UI shows "X of Y synced", queue depth, elapsed time
O-14	Operator forces re-sync	Forced re-sync rebuilds local state cleanly; no data loss; takes ≤ N minutes for the configured tenant size
O-15	Migration mid-flight	An app update with schema migration runs cleanly; existing outbox preserved across migration; rollback supported

8.2 Tooling

Playwright Electron drives the real Electron app; BrowserWindow is intercepted to drive renderer interactions, app.relaunch() is exercised, webContents.session is examined for storage state.
Toxiproxy sits in front of the sync endpoint to inject latency, packet loss, bandwidth caps.
Per-test Postgres + Pub/Sub emulator via Testcontainers acts as the cloud side.
Custom sync harness seeds the desktop SQLite with a known state and runs assertion helpers (expectOutboxEmpty(), expectAggregateState(), expectConflictRecorded()).

8.3 SQLCipher integrity tests

KDF (PBKDF2-SHA512 ≥ 256k iterations) verified per release.
DB key derivation: OS keychain fragment + JWT-bound device fragment; tampering with either renders the DB unreadable.
Power-loss simulation via kill -9 on the Electron main process during a transaction; on restart, WAL replays cleanly.

8.4 Auto-update tamper tests

The signed update manifest is fetched, its signature verified against the embedded public key.
Tampered manifests are rejected (test fixture flips a byte; assert refusal).
Staged rollouts: a fixture pretends to be a different staging cohort; assert correct cohort assignment.

9. Lock Integration Tests

Lock integration is the most failure-mode-rich subsystem because it spans cloud, desktop, USB/serial peripherals, and remote vendor APIs.

9.1 Vendor mocks

Every vendor adapter (TTLock, Salto, Assa Abloy, generic Wiegand) ships with a vendor-mock that is contract-faithful to the real SDK. The mock supports failure-injection switches:

vendor down (timeout, 5xx)
vendor refuses (4xx)
vendor accepts but no-op (silent failure)
vendor responds slowly (latency injection)
vendor sends out-of-order callbacks
vendor returns invalid signature on callback

9.2 Generic Wiegand mock encoder

For the generic Wiegand path, a USB/serial mock encoder runs in CI as a Node process exposing /dev/ttyUSBmock. Tests:

Encode card with valid credential
Encode fails (no card present)
Encode fails (card error response)
Re-encode after failure
Card revoke

9.3 Required scenario coverage

ID	Scenario	Expected
L-01	Issue mobile-key online, vendor up	Credential created, invite SMS dispatched, key reachable on guest's phone
L-02	Issue mobile-key with vendor down	Issuance queues, retry with backoff; UI prompts fallback PIN; queue drains on vendor recovery
L-03	Encode RFID card online	Card encoded, credential persisted, audit recorded
L-04	Encode RFID card offline (cached creds)	Card encoded against cached creds, issuance event staged, syncs on reconnect
L-05	Cached creds expired (> 24h)	Offline issuance refused; clear UI prompt to come online
L-06	Lost-key revoke	Credential transitions to `revoked`; new credential issued; audit captures both
L-07	Mobile-key invite delivery test	SMS or WhatsApp delivered; if both fail, fallback to email; if all fail, alert
L-08	Clock-skew tolerance	Lock device clock ±5 minutes from server; credential still valid; outside window → reject with clear error
L-09	Vendor anomaly (door opened after revoke)	Anomaly detected by AI; alert raised; door logged with severity
L-10	Vendor secret rotation	Rotation runs; previous secret valid until cutover; failed rotation alerts
L-11	Adapter swap (TTLock → Salto)	Existing reservations re-issue keys via new vendor on next check-in; no double-credentials
L-12	Concurrent issuance for same room	Optimistic lock; second wins or both safely serialised per policy
L-13	Issuance during reservation modification	Atomic update of validUntil; rollback on partial failure

10. Payment Tests

Payments are the third-highest-blast-radius surface (after IAM and locks). Tests cover correctness, idempotency, PCI scope, and reconciliation.

10.1 Adapter test matrix

Adapter	Sandbox provider	Required scenarios
Stripe	Stripe test mode	success · 3DS challenge · capture · partial refund · full refund · chargeback simulation · webhook idempotency · webhook signature failure · FX-snapshot freeze
PayPal	PayPal sandbox	success · cancel mid-flow · webhook idempotency · refund
Cash-on-arrival	local mock	deposit at confirm · arrival capture · no-show → policy charge · partial deposit
MFS (JazzCash, Easypaisa, Fawry, M-Pesa)	provider sandbox or mock	initiate · OTP confirm · callback signature · failure-rail-down

10.2 Required scenarios (cross-adapter)

ID	Scenario	Expected
P-01	Card payment end-to-end with 3DS	Capture succeeds, reservation `held → confirmed`, FX snapshot frozen
P-02	Webhook delivered twice	Only one state transition; idempotency-key check passes
P-03	Webhook signature invalid	Rejected with 401; security alert fires
P-04	Payment fails (declined)	Reservation rolls back to held with extended TTL; UI reflects
P-05	Partial refund	Folio shows credit and remaining capture
P-06	Full refund + lock revoke	Refund succeeds, key revoked, reservation cancelled
P-07	Chargeback simulation	Chargeback aggregate created; folio flagged; evidence upload works
P-08	Cash-on-arrival happy path	Reservation confirmed_pending_payment; check-in captures cash; folio reflects
P-09	FX-snapshot freeze	Display currency USD, settlement AFN; folio records both with snapshot timestamp
P-10	PCI scope assertion	No PAN appears in any non-PCI service log; assertion at log-shipper level fails build if violated
P-11	EOD cash drawer reconciliation	Expected vs counted vs variance; variance > X% requires manager approval; events published
P-12	Provider rotation	Switch tenant from Stripe-A to Stripe-B; new bookings route to B; old bookings refundable on A

10.3 PCI scope verification

A nightly job greps all non-PCI service logs in a sandbox window for PAN-shaped strings and known test card numbers. Any match fails the job and pages security.

11. AI Eval Harness

The AI eval harness is the only acceptable way to gate AI quality. It runs nightly in staging and on every change to a prompt or model in CI.

11.1 What the harness does

Loads YAML test cases per AI capability.
Runs each case against ai-orchestrator-service in the appropriate environment (staging by default; CI for prompt-changed PRs).
Asserts response shape (structured output), behaviour bands (safety, refusal, helpfulness), and cost guardrails.
Records aiProvenance for every run.
Compares against the latest accepted baseline; significant regression blocks merge or halts the nightly run.

11.2 YAML test case shape

capability: pricing.suggest_rate
version: v3
description: >
  Suggest a daily rate for a property given recent bookings, market context,
  and a known seasonality flag.
inputs:
  property_id: "prop-staging-001"
  date_range: "2026-06-15..2026-06-22"
  baseline_rate_micro_usd: 75000000
  occupancy_last_14d: 0.62
  seasonality: "tourist_high"
expectations:
  output_schema_valid: true       # must parse against pricing.suggest_rate.v3
  must_include_fields:
    - "suggested_rate_micro_usd"
    - "rationale"
    - "expected_uplift_pct"
  numeric_bounds:
    suggested_rate_micro_usd:
      min: 60000000
      max: 200000000
    expected_uplift_pct:
      min: -25
      max: 80
  rationale_quality:
    min_words: 20
    must_mention_one_of: ["occupancy", "seasonality", "demand"]
  safety:
    must_not_contain_pii: true
    must_not_contain_political: true
provenance:
  must_attach: true
  required_fields: ["model", "version", "promptId", "traceId"]
cost_budget_micro_usd: 2500
hitl_required: true

11.3 Required AI capability evals (each as one or more YAML files)

Capability	Files	Notes
Pricing rate suggestion	`pricing.suggest_rate.*.yaml`	One per market (AF, TJ, IR, PK, EG)
Demand forecast	`forecast.occupancy.*.yaml`	14d + 90d horizons
Housekeeping order	`housekeeping.order.*.yaml`	Edge ONNX + cloud equivalence
Anomaly detection — bookings	`anomaly.booking.*.yaml`	False-positive ceiling
Anomaly detection — payments	`anomaly.payment.*.yaml`	Fraud signals
Anomaly detection — locks	`anomaly.lock.*.yaml`	Door-event sequences
Upsell recommendation	`upsell.recommend.*.yaml`	Per persona
Guest message draft	`message.draft.*.yaml`	Per locale (PS, FA, AR, EN, FR, TJK), per context
Translation hint	`translate.hint.*.yaml`	PS↔EN, FA↔EN, AR↔EN
Theme contrast adjust	`theme.contrast.*.yaml`	A11y constraint

11.4 Cost guardrails

Each capability has a cost_budget_micro_usd per call.
Per-tenant daily budget enforced by the orchestrator; exceeding band degrades to lower-cost models.
The eval harness asserts cost stays within band; cost regression > 20% blocks merge.

11.5 Provenance metadata tests

Every successful AI response in the harness must carry aiProvenance = { model, version, promptId, traceId, reviewedBy?, reviewedAt?, local }.
Tests assert presence and well-formedness; missing fields fail the case.

11.6 HITL gate enforcement tests

For every capability marked hitl_required: true, a test asserts that committing the originating action without a decisionId fails with MELMASTOON.AI.HITL_BYPASS.
A test asserts that committing with a decisionId succeeds.
A test asserts that the orchestrator audit log captures the decision with {decisionId, decisionBy, timestamp}.

11.7 Edge-vs-cloud equivalence

For every capability offered both ways (e.g., housekeeping order, basic forecasting, anomaly heuristics), an equivalence test asserts the edge ONNX result is within a configured tolerance of the cloud Vertex result on a fixed corpus. Drift beyond tolerance triggers a retraining backlog item.

11.8 Regression detection

Each capability's nightly run produces a metric (e.g., F1 for anomaly classifiers, RMSE for forecasts, BLEU/ROUGE for messages, contrast pass/fail for theme adjustments).
Metrics are stored; significant regression versus rolling baseline (configurable) opens an automated issue, blocks new prompt promotion, and notifies the owning team.

12. Frontend Test Strategy

Web, mobile, and desktop frontends each have a tailored stack while sharing the design system, i18n bundle, and AI provenance contract.

12.1 Web (Next.js — Consumer + Tenant Booking)

Unit & component: Vitest + Testing Library (React) + jsdom. Coverage ≥ 80% lines on app/, components/, hooks/, services/.
E2E: Playwright. Run on every PR (smoke) and nightly (full).
Visual regression: Chromatic against Storybook stories; mandatory for every component story to ship LTR + RTL variants.
A11y: axe-playwright on every E2E and per-story axe-storybook in CI.
Cross-locale: Per-locale snapshots for at least Pashto, Dari, Persian, Arabic, English, Tajik on key screens.
Performance: Lighthouse CI per page with budgets enforced.

12.2 Mobile (React Native — Consumer + Tenant Booking + self-check-in)

Unit & component: Jest + Testing Library (RN). Coverage ≥ 80%.
E2E: Detox on emulators in CI (Android API 30+, iOS 16+).
Cross-locale: Same locales as web.
A11y: Native a11y APIs covered; manual VoiceOver/TalkBack smoke per release.
Performance: RN profiler smoke; cold-start budget per platform.

12.3 Desktop (Electron — backoffice)

Unit & component: Vitest + Testing Library (React) + jsdom for the renderer.
E2E: Playwright Electron driving the real packaged app.
Offline: dedicated Playwright Electron suites under test/e2e/offline/; see §8.
NVDA smoke (Windows): scripted screen-reader smoke per release wave for the front-desk core flows.
CSP & contextBridge: unit tests assert the window.melmastoon surface is exhaustive and that no other globals leak; CSP violation tests load malicious payloads and assert the renderer rejects them.

12.4 Visual regression (cross-surface)

Chromatic stories per component in LTR + RTL × at least three locales.
Story library shared across web and Electron renderer where components are common.
Manual review queue; tolerance thresholds tuned per component.

12.5 Cross-locale tests

Surface	Required locales tested in CI
Consumer Web	ps, fa, ar, en, tjk, fr
Consumer Mobile	ps, fa, ar, en
Tenant Booking Web	ps, fa, ar, en, tjk, fr
Tenant Booking Mobile	ps, fa, ar, en
Electron Desktop	ps, fa, ar, en, tjk
Notifications	per template, per locale

RTL screenshot tests are mandatory on every UI PR; the Definition of Done explicitly enforces this.

13. Performance Testing

13.1 k6 scenarios per BFF

Scenario	BFF	Goal	Pass criteria
Search storm	bff-consumer-service	Burst-of-search load on meta layer	p95 < 1500 ms · err rate < 0.5% at 100 RPS sustained
Booking burst	bff-tenant-booking-service	Concurrent booking flows during a sale	p95 < 1200 ms on quote · < 800 ms on confirm step (excluding 3DS) at 50 RPS
Sync flood	bff-backoffice-service	Many desktops reconnecting after a regional outage	p95 sync push < 2 s at 200 concurrent desktops; conflict rate within tolerance
AI gateway storm	ai-orchestrator-service	Surge in AI calls	p95 < 3 s · cost within budget; circuit-breakers engage on provider 5xx surge
Inventory hot key	inventory-service	Peak demand on one room type	No oversell · throughput within target
Webhook storm	payment-gateway-service	Provider replays many webhooks	All processed exactly once · idempotency-key check passes

13.2 SLO targets (per-service summary; full table per service `OBSERVABILITY.md`)

Service	p50 latency	p95 latency	Availability	Notes
iam-service	50 ms	200 ms	99.9%	Authn endpoints sub-200 ms
reservation-service	80 ms	300 ms (read) / 600 ms (write)	99.9%	Saga end-to-end < 1.5 s p95
payment-gateway-service	100 ms	800 ms	99.95%	Excludes 3DS challenge; webhook p95 < 200 ms
lock-integration-service	100 ms	500 ms (issue)	99.9%	Vendor calls p95 budget per vendor SLA
bff-tenant-booking-service	70 ms	400 ms	99.9%	Booking funnel pages
bff-consumer-service	70 ms	400 ms	99.9%	Meta search pages
bff-backoffice-service	80 ms	400 ms (read) / 800 ms (sync)	99.9%	Sync push < 2 s p95
search-aggregation-service	50 ms	300 ms	99.9%	Index reads
ai-orchestrator-service	200 ms	3000 ms	99.5%	AI provider latency dominant

13.3 Frontend performance budgets

Page	LCP	INP	CLS	JS gzip
Consumer meta search list	≤ 2.5 s	≤ 200 ms	≤ 0.1	≤ 180 KB
Tenant booking landing	≤ 2.0 s	≤ 200 ms	≤ 0.1	≤ 200 KB
Tenant booking checkout	≤ 2.0 s	≤ 200 ms	≤ 0.05	≤ 220 KB
Electron desktop dashboard	≤ 1.5 s (cold) · ≤ 400 ms (warm)	≤ 100 ms	≤ 0.05	n/a

Budgets enforced by Lighthouse CI on web and the Electron renderer; regressions block merge.

13.4 Soak tests (24h)

Run nightly against staging.
24h sustained mixed-load profile (search + booking + sync + AI + webhook); assert no memory growth, no leak in the desktop sync worker, no Pub/Sub lag.

13.5 Load profile per release wave

Wave	Tenants	Concurrent desktops	Concurrent booking flows	AI RPS	Notes
R1	50	200	50	5	Validate staging-scale viability
R2	250	1,000	200	25	AI-native operations on
R3	1,000	5,000	1,000	100	Globalisation stage

14. Security Testing

14.1 Categories & tooling

Category	Tool	Trigger
SAST	CodeQL	every PR
SCA	Snyk	every PR + nightly
Secrets	gitleaks	pre-commit + CI
DAST	OWASP ZAP	nightly against staging
Container scan	Trivy + Snyk Container	on image build
Pen-test	external firm	pre-launch + per major release
Threat-model review	manual	per ADR introducing new boundary

14.2 Electron-specific security tests

Test	What it proves
`contextBridge_surface_enumeration`	Only `window.melmastoon.*` is exposed to the renderer; no other globals leak; `ipcRenderer` and `require` are absent
`csp_violation`	A malicious script injection attempt is refused by the CSP; the violation is reported to the audit log
`nodeIntegration_off_assertion`	`nodeIntegration: false`, `contextIsolation: true`, `sandbox: true`, `webSecurity: true` on every BrowserWindow
`sqlcipher_kdf`	Key derivation uses ≥ 256k iterations of PBKDF2-SHA512; tampering with iterations fails the check
`auto_update_signature_tamper`	A fixture flips a byte in the update manifest; the updater refuses to install
`keytar_secrets`	Refresh token + DB key fragment never appear in plaintext on disk
`deep_link_validation`	`melmastoon://` deep links are validated against an allowlist; arbitrary file URIs refused
`serialport_access_scoped`	Serial/HID access is mediated through the typed bridge; renderer cannot enumerate devices

14.3 Auth & authorization tests

AuthN: token rotation, refresh family invalidation on reuse, MFA enrolment + enforcement, SSO callback validation, device binding cert lifecycle.
AuthZ: per-role matrix tested across every service (positive + negative); tenant scope enforced on every endpoint; ABAC rules tested where present.

14.4 Penetration testing

External pen-test pre-Wave R1 launch and every major release thereafter.
Scope: all surfaces (Consumer Web, Tenant Booking Web, Mobile, Electron Desktop, Control Plane), all public APIs, IAM, payment, lock, sync.
Findings tracked to closure; blocking severity = critical or high.

15. Compliance Testing

15.1 Tax computation matrix

For each supported jurisdiction (AF, TJ, IR, PK, EG, AE, SA, OM, BH, NP, BD, KE, …), a unit test pack asserts:

VAT/GST applied at correct rate.
Tourism levy applied where required.
Tax-exempt cases (diplomatic, gov-issued, tax-free zones) honoured on proof.
Snapshot-at-confirm: changes after confirm do not retroactively alter the folio.

15.2 Audit log integrity

Every state-changing event subscribes to the audit log.
Daily Merkle anchoring job writes a root and stores it.
Test asserts inclusion proofs work for entries days/weeks back.
Tampering with a stored entry breaks proofs.

15.3 Data residency tests

Cross-region writes are rejected.
Background jobs respect tenant region.
Reporting and analytics aggregations stay within region (or are regional-fanned).

Erasure request flow tested end-to-end against a sandbox tenant.
PII fields tombstoned; legal/financial records remain (with reference IDs only).
Erasure certificate generated and verifiable.

16. Accessibility Testing

16.1 Target & tooling

Target: WCAG 2.2 AA on every user-facing surface, every locale.
Tooling: axe-core via axe-playwright (every E2E) and axe-storybook (every component story).
Manual smoke: NVDA on Windows for the Electron desktop core flows; VoiceOver on macOS for the Tenant Booking web on Safari; per release wave.

16.2 Per-surface checklist

Surface	A11y CI checks	Manual smoke
Consumer Web	axe-playwright on E2E suite + axe-storybook	per release wave
Tenant Booking Web	axe-playwright on E2E + axe-storybook	per release wave
Consumer Mobile	RN a11y test props + Detox a11y assertions	per release wave
Electron Desktop	axe-playwright (renderer)	NVDA per release wave
Notifications (email)	template-level lint (alt text, link text, contrast)	per release wave

Reading order verified in RTL locales (Pashto, Dari, Persian, Arabic).
Screen reader pronunciation smoke for at least Pashto + Dari content blocks (NVDA + Espeak voice).
Mixed-direction text bidi rendering verified.

17. Internationalization Testing

17.1 Required coverage

Every UI surface tested in at least Pashto + Dari + English in CI; additional locales (Persian, Arabic, French, Tajik) tested per release wave.
Bidi text rendering tested with mixed PS/EN strings, AR/EN strings, and bidi-isolating UI fragments.
Locale-aware date formats tested per calendar (Gregorian + Hijri + Solar Hijri presentational variants).
Number formatting tested per locale (decimal separator, grouping, percent).
Currency formatting tested per currency (symbol, position, grouping).
Numeral systems tested: Latin numerals across all locales by default; Arabic-Indic supported on display where tenant prefers; inputs always Latin numerals for safety.

17.2 RTL screenshot regression

Every component story has a storyName—rtl variant.
Chromatic regression catches mirroring defects.
Logical CSS only (padding-inline, margin-block, inset-inline-start); per-PR lint rejects padding-left/right and margin-left/right.

17.3 Translation completeness gate

A CI job parses extracted strings against per-locale bundles; missing keys for active locales emit warnings; missing keys for tenant-default locales fail the job.

18. Test Data Strategy

18.1 `@ghasi/test-data` synthetic tenant generator

A monorepo package generates realistic tenants (50+ seeded for staging) deterministically from a seed.
Outputs:
- Tenant + property hierarchy (single-property, multi-property, chain operator).
- Rooms with realistic room types and counts (10 / 30 / 80 / 200 room properties).
- Rate plans across BAR, weekly, government, corporate, non-refundable.
- Reservations across held / confirmed / checked_in / checked_out / cancelled / no_show with realistic distributions.
- Folios with charges, payments, refunds matching reservations.
- Housekeeping tasks, maintenance tickets, key credentials.
- Multi-locale content blocks (ps, fa, ar, en, tjk).

18.2 Determinism + safety

Seed-based; the same seed produces the same tenant pack; safe to recreate.
PII-free. All names, phones, emails, IDs are synthetic; never derived from real users.
Sandbox prefixes (e2e-<runId>-…) for ephemeral test tenants; janitor jobs purge within 1 hour.

18.3 Where to use which dataset

Environment	Dataset
Local dev	small synthetic pack (3 tenants, 30 rooms total)
CI ephemeral	per-PR pack created on demand
Staging	50+ tenants seeded on rebuild; refreshed weekly
Pre-production	mirror of staging plus a staged-canary tenant
Production	no synthetic tenants except dedicated synthetic-monitor sandbox tenant (heavily isolated)

18.4 Data refresh policy

Staging refresh weekly, with the option to restore on demand.
Per-PR ephemeral environments seeded fresh on environment provisioning.
Production never receives synthetic data outside the sandbox tenant.

19. CI/CD Integration

19.1 Per-service pipeline

lint → typecheck → unit (with coverage) → integration (Testcontainers) → contract (Pact) →
coverage gate → mutation (changed files) → image build → vulnerability scan

Failures in any step block merge. Coverage gate is per-package, per-layer thresholds (see §1.1).

19.2 Per-frontend pipeline

lint → typecheck → unit → component → axe-storybook → visual (Chromatic) → e2e (smoke) →
RUM bundle size assertion

Frontend E2E runs against an ephemeral preview environment provisioned by Cloud Run + per-PR namespacing.

19.3 PR-touched service deeper run

A PR changing service X triggers:
- Full integration suite for X.
- Pact verification for X as provider against current consumer contracts.
- Pact publication for X as consumer.
- E2E smoke covering journeys X participates in.
- Mutation testing on changed /domain/** files.

19.4 Nightly pipeline

full E2E (web + mobile + desktop, including offline) →
soak (24h k6 mixed load) →
AI eval suite (against staging ai-orchestrator) →
DAST scan (OWASP ZAP) →
SCA full + container vuln scan →
Chromatic full library →
synthetic monitor results aggregation

A failure rolls up to an auto-issue in the on-call queue.

19.5 Release pipeline

Image promotion staging → pre-prod → prod.
Canary 5% for 30 minutes; auto-rollback on SLO regression.
Manual approval gate for prod beyond canary (until automation matures).

19.6 Branch strategy & test gates

Branch	Required gates
Feature branches	unit + component + lint + typecheck (locally + on push)
PR open	full per-service + per-frontend pipeline above
`main` merged	nightly suites + image promotion to staging
`release/*`	canary rollout + smoke + release-wave gate

20. Test Environments

Env	Purpose	Backed by	Data	Notes
local	Engineer development & quick iteration	Docker Compose with Postgres, Redis, Pub/Sub emulator, MinIO; or Cloud Code	small synthetic	Engineers run unit + integration locally; opt-in to E2E
CI ephemeral	Per-PR isolated test environment	Cloud Run namespaced + Cloud SQL temporary instances	per-PR seed	Lifecycle managed by CI; teardown after merge or 24h idle
staging	Full production-like, shared	GCP project: ghasi-staging	seeded synthetic, refreshed weekly	Where E2E full + soak + AI eval + DAST run nightly
pre-production	Canary + final QA	GCP project: ghasi-preprod	mirror of staging + canary tenant	Used for release validation
production	Live	GCP project: ghasi-prod	real tenants	Synthetic monitors run; no destructive tests

20.1 Environment safety

Every non-prod environment has a banner on every surface ("STAGING" / "CI" / "PREPROD").
Environments are siloed: secrets, DBs, buckets are namespaced.
Production is forbidden to be used for any test other than synthetic monitors.

20.2 Test data isolation

Each test creates and cleans up its own data; no shared state across tests.
Sandbox tenants prefixed e2e-<runId>- are purged hourly by a janitor job.

21. Synthetic Monitoring

Synthetic monitoring proves that production keeps working from outside, not just inside.

21.1 Probes

Probe	Frequency	What it tests
Uptime	every 1 min from 5 regions	Public health endpoints respond 200
Booking-flow synthetic (web)	every 5 min	Sandbox tenant; full booking flow with sandbox payment
Booking-flow synthetic (mobile)	every 30 min	RN test rig executes booking against sandbox
Sync API synthetic	every 5 min	Run a fake desktop client through pull + push round-trip
AI gateway synthetic	every 10 min	Send a known prompt; assert structured output, latency, cost
Lock vendor synthetic	every 15 min	Issue + revoke a credential against a vendor sandbox
Payment webhook synthetic	every 15 min	Replay a known webhook into the staging endpoint; assert state

21.2 Alerting & routing

All synthetic probe failures route to PagerDuty with a 2-cycle threshold (avoid single-flake noise).
The on-call has a runbook linked from every alert.
Synthetics run against a dedicated synthetic-monitor sandbox tenant that is isolated from real tenants.

22. Release Quality Gates

Each release wave (R1, R2, R3) has hard quality gates. Missing any gate blocks promotion.

22.1 Per-wave gate matrix

Gate	R1	R2	R3
Coverage thresholds met	✓	✓	✓
All P0 E2E green for 7 nights	✓	✓	✓
Offline E2E (1h, 24h) green	✓	✓	✓
Performance baseline within budget	✓	✓	✓
Pen-test pass (no critical/high)	✓	✓	✓
Multi-tenant isolation tests green on every service	✓	✓	✓
Saga compensation tests green	✓	✓	✓
AI eval baseline within tolerance	n/a (R1 limited AI)	✓	✓
Accessibility audit complete (axe + NVDA + manual)	✓	✓	✓
Localisation test complete for active locales	✓	✓	✓
24h soak test passes	✓	✓	✓
Backup/restore drill within last 90d	✓	✓	✓
Lock vendor adapter tests pass for in-scope vendors	TTLock + Wiegand	+ Salto	+ Assa Abloy
Payment adapter tests pass for in-scope rails	Stripe + cash + PayPal	+ MFS	+ region-specific rails
Multi-region failover drill	n/a	n/a	✓

22.2 Sign-off

Engineering Manager + Tech Lead + Security + SRE sign off per release wave.
Sign-off references the gate matrix; missing checks block release.

23. Bug Triage & SLAs

23.1 Severity definitions

Severity	Definition	Examples
S0 — Critical	Service down, data loss, security breach, charge-without-reservation, locked-out guest at door, multi-tenant data leak	global IAM outage, payment double-charge, key revoke fails for revoked credential
S1 — High	Major journey broken for many tenants; significant degradation	booking confirm fails for tenants in region X, sync conflict storm
S2 — Medium	Limited journey impact; workaround exists	UI bug on a single page, AI suggestion incorrect for rare case
S3 — Low	Cosmetic, minor inconvenience, internal-only	typo, low-impact log noise

23.2 SLA targets

Severity	Response time	Resolution target
S0	15 minutes	4 hours
S1	1 hour	24 hours
S2	1 business day	1 week
S3	1 week	scheduled

23.3 Blocker policy

S0 blocks any release; promotion paused until resolved or risk-accepted (only by Eng Manager + Security).
S1 blocks release-wave promotion.
S2 are tracked into the next sprint; can ship if not regression-causing.
S3 are tracked into the backlog.

23.4 Regression policy

Any bug traced to a regression in the last 30 days requires a regression test in the fixing PR.
Regression count per release wave is a tracked metric; targets in OBSERVABILITY.md.

24. Test Ownership

Test ownership maps to responsibility, not bureaucracy.

Surface / asset	Owner	Examples
Service unit + integration + contract tests	Service team	iam-service team owns iam unit, integration, Pact tests
Multi-tenant isolation tests per service	Service team	enforced by CI gate; service owner can't merge without
Saga tests	Owning saga's primary service team (reservation-service for booking saga)	reservation-service team owns booking-saga harness
Sync engine tests	bff-backoffice-service team + Platform	shared ownership; service teams contribute aggregate-specific cases
AI eval harness	Platform AI team	owns the harness; service teams own per-capability YAML cases
Performance	Platform / SRE	k6 scenarios + budgets; service teams contribute SLO refinements
Security (SAST/DAST/SCA/secrets)	Security team	tooling + policies + pen-test coordination
E2E + journey tests	Product / QA + service teams	journey ownership is split by primary persona
Accessibility	Frontend Platform + service teams	tooling + per-component a11y
Localisation	Frontend Platform + content team	bundle correctness + per-locale review
Synthetic monitors	SRE	operates the synthetic sandbox tenant; routes alerts
Test data generator (@ghasi/test-data)	Platform	maintains generators + seeded packs

24.1 RACI summary

Asset	R	A	C	I
Service unit/integration tests	Service team	Tech Lead	QA	Eng Manager
Pact provider/consumer	Service team	Tech Lead	adjacent service teams	Architects
Saga tests	Saga primary service team	Tech Lead	participating teams	Architects
AI eval YAMLs	Service team	AI Platform Lead	AI capability owner	Compliance
Performance	SRE	SRE Lead	Service teams	Eng Manager
Security scans	Security	Security Lead	Service teams	Eng Manager + Compliance
E2E & journeys	QA + Product	Product Lead	Service teams	Eng Manager
A11y	Frontend Platform	Frontend Lead	Service teams	Compliance
Synthetic monitors	SRE	SRE Lead	Service teams	Eng Manager

25. Anti-Patterns

The following are forbidden in this codebase. They appear once because they are common, real, and damaging in production hotel platforms.

25.1 Flaky tests left enabled

A flaky test is a bug. Quarantine it within 24 h with a linked issue; fix or delete within one sprint.
Quarantined tests run in a separate non-blocking suite; persistent quarantine triggers a review.
"Sometimes-passes" is not green. CI treats flakes as failures.

25.2 A single shared dataset across tests

Shared mutable state is a coupling that hides bugs and makes failures non-reproducible.
Every test creates and cleans up its own data; per-test transactional rollback is the default for SQL integration tests.
Sandbox tenants are scoped per test run with a janitor purging within 1 hour.

25.3 Mocking your own production code

Test doubles are for boundaries (vendor SDKs, network, time, randomness).
Do not mock your own application services or domain code; that tests the mock, not the system.
Pact and Testcontainers cover the boundaries you might otherwise be tempted to mock.

25.4 Testing UI implementation details

React Testing Library queries by accessibility role and label, not by class name or component internals.
Snapshot tests are forbidden for non-trivial UIs (they go stale and get rubber-stamped).
Visual regression covers what snapshots pretend to.

25.5 Skipping tenant-isolation tests because "it works"

"It works in staging" is not evidence; the next migration may flip a flag.
Tenant isolation tests are mandatory on every service that touches tenant data; CI fails the job if missing.
A multi-tenant data leak is a company-ending incident in this market segment.

25.6 Skipping offline tests because "the cloud will be up"

The cloud will not be up. The thesis of the product is that the operator continues regardless.
Offline tests are mandatory for the Electron desktop on every offline-capable flow.
A regression that breaks offline behaviour is treated as S0 because it kills the product's defining promise.

25.7 AI without provenance

Every AI artifact carries aiProvenance; missing-provenance commits fail CI via a static check on the AI client surface.
Every irreversible AI action passes through HITL with a decisionId; bypass attempts fail loudly with MELMASTOON.AI.HITL_BYPASS.

25.8 PII in logs / events

Logs and events carry IDs, not PII (no guest names, no PANs, no key credential codes, no JWTs, no lock-pairing secrets).
A static scanner runs on the log shipper; matches block deploys.

25.9 Untyped IDs

All aggregate IDs are branded types (TenantId, ReservationId, KeyCredentialId, …).
Raw string IDs are forbidden in domain and application layers; eslint-no-string-id rule enforces.

25.10 Money as float

All money is bigint micro-units; columns suffixed _micro.
Float arithmetic on money is forbidden; MoneyVO is the only allowed surface.

25.11 last-write-wins on monetary or inventory state

Conflict resolution per aggregate is declared in services/<svc>/SYNC_CONTRACT.md.
Monetary or inventory state is never LWW; deterministic policies (server-wins, operator-decides, additive-only) apply.

25.12 Direct vendor SDK imports outside the owning service

openai, @google-cloud/vertexai, anthropic are forbidden outside ai-orchestrator-service.
TTLock / Salto / Assa Abloy SDKs are forbidden outside lock-integration-service.
Stripe / PayPal / MFS SDKs are forbidden outside payment-gateway-service.
A static check on imports fails CI on violation.

25.13 Cross-service DB joins

Every service owns its data. No cross-service DB reads.
Read models are projected from events.
Static analysis of Postgres role grants asserts no cross-schema grants exist.

25.14 ".only", ".skip", "console.log", "debugger" in committed tests

Pre-commit hook + CI grep catches and blocks.
A skipped test must be removed or quarantined with a tracking issue.

26. Cross-References

Per-service TESTING_STRATEGY.md deep-doc lives at services/<svc>/TESTING_STRATEGY.md — service-specific tooling, harnesses, and edge cases live there. This document is the strategy; the per-service docs are the implementation.
Every story in 07-epics-and-user-stories.md declares Test types; this document defines what each type means and how it runs.
Definition of Done standards/DEFINITION_OF_DONE.md ties test gates to merge gates; both documents stay in lockstep.
Observability spec docs/observability/01-observability.md defines SLOs referenced from §13 and the synthetic monitor topology referenced from §21.
AI architecture docs/08-ai-architecture.md defines the AI provenance contract and the orchestrator surface that the AI eval harness in §11 exercises.
Lock & Key integration docs/09-lock-and-key-integration.md defines the vendor adapter contract that the tests in §9 verify.
Payments docs/10-payments-architecture.md defines the payment adapter contract that the tests in §10 verify.
Desktop spec docs/frontend/desktop/06-desktop-app-specification.md defines the Electron architecture that the tests in §8 + §14.2 + §12.3 verify.

1. Goal & Quality Bar​

1.1 Coverage targets (merge-blocking)​

1.2 Journey & contract coverage (merge-blocking)​

1.3 Quality philosophy​

1.4 Scope​

2. Test Pyramid​

2.1 Volume targets​

2.2 Why this shape​

3. Test Types — Definitions​

4. Tooling Stack​

5. Per-Service Test Pattern​

5.1 What is required per service​

5.2 Mandatory test files (all services)​

5.3 Coverage thresholds in CI​

5.4 Builders, fakes, and clocks​

6. Multi-Tenant Isolation Tests (mandatory)​

6.1 Required test set per service​

6.2 Why not skip — ever​

6.3 Sample isolation test (excerpt)​

7. Saga Tests​

7.1 Sagas in scope​

7.2 Required test cases per saga​

7.3 Saga test harness​

8. Sync & Offline Tests (Electron desktop)​

8.1 Required scenario coverage​

8.2 Tooling​

8.3 SQLCipher integrity tests​

8.4 Auto-update tamper tests​

9. Lock Integration Tests​

9.1 Vendor mocks​

9.2 Generic Wiegand mock encoder​

9.3 Required scenario coverage​

10. Payment Tests​

10.1 Adapter test matrix​

10.2 Required scenarios (cross-adapter)​

10.3 PCI scope verification​

11. AI Eval Harness​

11.1 What the harness does​

11.2 YAML test case shape​

11.3 Required AI capability evals (each as one or more YAML files)​

11.4 Cost guardrails​

11.5 Provenance metadata tests​

11.6 HITL gate enforcement tests​

11.7 Edge-vs-cloud equivalence​

11.8 Regression detection​

12. Frontend Test Strategy​

12.1 Web (Next.js — Consumer + Tenant Booking)​

12.2 Mobile (React Native — Consumer + Tenant Booking + self-check-in)​

12.3 Desktop (Electron — backoffice)​

12.4 Visual regression (cross-surface)​

12.5 Cross-locale tests​

13. Performance Testing​

13.1 k6 scenarios per BFF​

13.2 SLO targets (per-service summary; full table per service OBSERVABILITY.md)​

13.3 Frontend performance budgets​

13.4 Soak tests (24h)​

13.5 Load profile per release wave​

14. Security Testing​

14.1 Categories & tooling​

14.2 Electron-specific security tests​

14.3 Auth & authorization tests​

14.4 Penetration testing​

15. Compliance Testing​

15.1 Tax computation matrix​

15.2 Audit log integrity​

15.3 Data residency tests​

15.4 GDPR-style erasure (Phase 2)​

16. Accessibility Testing​

16.1 Target & tooling​

16.2 Per-surface checklist​

16.3 Locale-aware a11y​

17. Internationalization Testing​

17.1 Required coverage​

17.2 RTL screenshot regression​

17.3 Translation completeness gate​

18. Test Data Strategy​

18.1 @ghasi/test-data synthetic tenant generator​

18.2 Determinism + safety​

18.3 Where to use which dataset​

18.4 Data refresh policy​