11 — Testing & QA Strategy
Companion: 01 Product Overview · 02 Enterprise Architecture · 03 Microservices Catalog · 04 Event-Driven Architecture · 05 API Design · 06 Data Models · 07 Security & Tenancy · 08 AI Architecture · 09 Lock & Key Integration · 10 Payments · Frontend Web/Mobile · Frontend Desktop (Electron) · Definition of Done · 07 Epics & Stories
Stack reminder (non-negotiable). Desktop is Electron (Node 20 main + Chromium renderer + Vite + React + better-sqlite3 + ONNX Runtime Node). Cloud is Google Cloud Platform (Cloud Run, Cloud SQL, Pub/Sub, Memorystore, Cloud Storage, Secret Manager, Vertex AI, BigQuery). There is no Tauri, no AWS, no Azure. Substitutions require an explicit unanimous ADR.
This document is the canonical, implementation-grade testing strategy for Ghasi Melmastoon. Every section is enforced by CI, runbooks, or scheduled jobs. Skipping a category requires an ADR — never an opinion.
1. Goal & Quality Bar
Testing is the load-bearing structure that lets a small team ship a multi-tenant, AI-first, offline-first hotel SaaS into low-resource markets without losing reservations, double-charging cards, leaking guest data, or stranding a guest at a locked door. The quality bar is therefore not "tests pass" — it is evidence that the system behaves correctly under all the operating modes from docs/01-product-overview.md §9, across all 22 services, on every supported surface and locale.
1.1 Coverage targets (merge-blocking)
| Surface / domain | Unit line | Unit branch | Mutation | Integration | Notes |
|---|---|---|---|---|---|
| Domain aggregates (every service) | ≥ 95% | ≥ 95% | ≥ 75% | n/a | /domain/**; pure TS, no I/O |
| Value objects | 100% | 100% | ≥ 85% | n/a | Branded IDs, money, FX, AIProvenance, Locale |
| Domain services | ≥ 90% | ≥ 90% | ≥ 70% | required | Policies, specifications |
| Application use-cases | ≥ 85% | ≥ 85% | ≥ 65% | required | Commands, queries, sagas |
| Infrastructure adapters | ≥ 80% | ≥ 75% | ≥ 60% | required | Repos, vendor adapters |
| payment-gateway-service | 100% | 100% | ≥ 90% | required + sandbox | Money never wrong |
| lock-integration-service | 100% | 100% | ≥ 90% | required + vendor-mock | Locked doors are loud failures |
| iam-service | 100% | 100% | ≥ 90% | required + security | AuthZ bypass = breach |
Sync logic (bff-backoffice-service + per-service sync ports) | 100% | 100% | ≥ 90% | required + chaos | Lost mutations = lost revenue |
| Frontend components | ≥ 80% | ≥ 75% | n/a | n/a | Unit + component |
| BFFs | ≥ 85% | ≥ 80% | ≥ 65% | required | Aggregation correctness |
1.2 Journey & contract coverage (merge-blocking)
- Every P0 journey (J-01 … J-12, J-19, J-20 — see
docs/journeys/01-core-user-journeys.md§2) has an automated end-to-end test that runs on every PR (smoke) and nightly (full happy + failure paths). - Every offline-capable flow has an offline E2E test exercising at least one of: 1h offline, 24h offline, conflict simulation, outbox flush ordering.
- Every service-to-service edge in the context map (see
docs/03-microservices/README.md) has a Pact contract published and verified on every provider PR. - Every event in the schema registry has a producer conformance test and at least one consumer projection test.
- Every saga (booking, cancellation, check-in, check-out, date-change, listing-publish, AI-HITL) has happy-path, every-compensation, idempotent-retry, and partial-failure-injection tests.
1.3 Quality philosophy
Every production incident is a missing test. Every missing test is a missed risk conversation. Every flake left enabled is a slow-motion outage.
Four guiding stances:
- Tests are executable specifications — domain tests read like the ubiquitous language; AC clauses from
07-epics-and-user-stories.mdmap 1:1 to a Given/When/Then. - Shift-left and shift-right equally — pre-merge gates catch regressions; production observability + synthetic monitors catch emergent behaviour; both feed the backlog.
- AI is a first-class testable surface — non-determinism does not excuse non-testing; it raises the bar (regression suites, eval harness, structured-output assertions, cost guardrails).
- Offline is a tier-0 invariant — any test plan that only works online is incomplete and merge-blocked.
1.4 Scope
In-scope: all 22 services in docs/03-microservices/README.md; the four user-facing surfaces (Consumer Web, Consumer Mobile, Tenant Booking Web/Mobile, Electron Desktop) and the platform Control Plane; all event producers/consumers and projections; the desktop sync engine and ONNX edge AI worker; all infra-as-code (Terraform, Helm).
Out-of-scope: third-party vendor SaaS internals (we test our contracts with them, not their systems); manufacturer hardware-level qualification of partner lock encoders.
2. Test Pyramid
The classic pyramid is insufficient for an event-driven, offline-first, AI-native system. Ghasi Melmastoon uses an extended pyramid with orthogonal axes:
┌──────────────────────────┐
│ Chaos / Replay / DR │ weekly · staging · monthly · prod
├──────────────────────────┤
│ E2E Journeys (J-01..22) │ ~250 tests · < 18 min smoke
├──────────────────────────┤
│ Contract (Pact + events) │ ~600 tests · < 3 min
├──────────────────────────┤
│ Integration (Testcont.) │ ~2,200 tests · < 12 min
├──────────────────────────┤
│ Unit (domain) │ ~14,000+ tests · < 4 min parallel
└──────────────────────────┘
Orthogonal axes (apply across multiple tiers):
┌─────────┬────────────┬──────────┬─────────┬──────┬────────┬──────────────┐
│ A11y │ Security │ AI eval │ Offline │ Perf │ Sync │ Localisation │
└─────────┴────────────┴──────────┴─────────┴──────┴────────┴──────────────┘
2.1 Volume targets
| Tier | Approx count target | Wall-clock budget | Frequency |
|---|---|---|---|
| Unit | 12 k – 18 k | < 4 min (parallel) | every commit |
| Integration | 1.5 k – 2.5 k | < 12 min | every PR |
| Contract | 400 – 700 | < 3 min | every PR |
| E2E (smoke) | ≤ 80 critical | < 18 min | every PR |
| E2E (full) | 200 – 250 | < 60 min | nightly |
| Offline | 25 – 40 | < 25 min | nightly |
| Performance (k6) | 30+ scenarios | nightly | nightly + pre-release |
| Chaos | 15+ experiments | weekly staging / monthly prod | weekly / monthly |
| AI eval | 1 k+ prompts × models | on prompt change + nightly | continuous |
| A11y (axe) | every changed surface | < 90 s per surface | every PR |
| Visual regression | every component story + key pages | < 5 min | every PR |
| Security (SAST/DAST/SCA/secrets) | continuous | < 8 min PR + nightly DAST | every PR + nightly |
2.2 Why this shape
- Domain unit tests are cheap, deterministic, and fast — they should dominate by count.
- Integration tests prove our infra wiring (RLS, outbox, Postgres queries) and dominate by value per test but not by count (slower).
- Contract tests are mandatory because the system is event-driven and BFF-fronted; no other layer catches the same class of regression.
- E2E proves journeys, but is too slow and brittle to be the safety net — it is the last line of defence.
- Orthogonal axes (A11y, AI eval, Offline) cut across all tiers because they are correctness concerns, not test categories.
3. Test Types — Definitions
| Type | What it proves | Tooling | Boundary |
|---|---|---|---|
| Unit | A function, value object, or aggregate behaves to spec in isolation. Zero I/O, zero time-dep unless injected. | Vitest (default), Jest where required by tooling | /domain/**, /lib/** |
| Component | A React component renders, handles input, and emits events to spec. | Vitest + Testing Library (React) + jsdom | /components/** |
| Integration | A service module talks to real Postgres + Redis + Pub/Sub emulator and writes/reads correctly. | Vitest + Testcontainers (Postgres 16, Redis 7, Pub/Sub emulator, MinIO/Cloud Storage emulator) | /application/**, /infrastructure/** |
| Contract | A consumer of an HTTP API or event will not break a producer (or vice versa). | Pact (HTTP), Schema Registry conformance (events) | service ⇄ service |
| E2E (web/mobile) | A user can complete a journey through real surfaces against a deployed environment. | Playwright (web), Detox (mobile), shared journey IDs J-NN | full surface |
| E2E (desktop, Electron) | A staff member can run a journey through the Electron app, online and offline. | Playwright Electron + spectron-style harness | Electron renderer + main |
| Offline | An offline scenario produces correct mutations and reconciles correctly on sync. | Playwright Electron + custom sync harness + Toxiproxy | Electron + bff-backoffice |
| Performance | Throughput and latency meet SLOs under realistic load. | k6 + custom scenarios; Lighthouse for web vitals; React Native perf tools | API + frontends |
| Security | Code, dependencies, and runtime surfaces have no known critical/high vulnerabilities. | Snyk (SCA), gitleaks (secrets), CodeQL (SAST), OWASP ZAP (DAST), pen-test pre-launch | every layer |
| A11y | Surfaces meet WCAG 2.2 AA across locales and assistive tech. | axe-core (axe-playwright + axe-react), NVDA smoke (Windows for Electron) | every surface |
| Visual regression | Pixel-diff per component story and key page in LTR + RTL. | Chromatic + Storybook | every component + key pages |
| AI eval | An AI capability meets a behaviour band (correctness, safety, format, cost). | YAML test cases run by the AI eval harness against ai-orchestrator-service | AI gateway |
| Chaos | The system degrades and recovers under network/pod/peripheral faults. | Pumba (containers), Toxiproxy (network), custom Kubernetes/CR fault injection | infra |
| Mutation | Tests actually catch defects (kill mutants), not just exercise lines. | Stryker | /domain/**, /lib/** |
| Compliance | Tax, audit-log, residency, and erasure invariants hold per jurisdiction. | Custom test packs per jurisdiction | pricing/billing/reporting/iam |
| Localisation | RTL/LTR + locale-aware formatting + numerals work end-to-end. | Cross-locale test runner + axe + visual regression | every frontend |
| Synthetic monitor | Production keeps being correct from outside, not just from inside. | Synthetic probes (Cloud Monitoring) + dedicated test tenant | production |
4. Tooling Stack
| Concern | Tool | Why it is the chosen tool |
|---|---|---|
| TypeScript unit | Vitest | Native ESM, Vite-aligned, fast watch, parity with desktop renderer build |
| Where Vitest can't | Jest | Used in legacy or RN packages where ecosystem requires it; no new code added |
| Component / DOM | Testing Library (React, React Native) | Behaviour-first, no implementation-detail snapshots |
| Web E2E | Playwright | Multi-browser, trace + video + screenshot, native parallelism |
| Desktop E2E | Playwright Electron | Drives the real Electron app; can intercept main + renderer |
| Mobile E2E | Detox (RN) | Reliable on physical + emulator; gray-box harness for RN |
| Performance | k6 | TypeScript-friendly, scenario-based, easy to integrate into CI |
| Frontend perf | Lighthouse CI + RUM | Enforced budgets per page on tenant booking + consumer surfaces |
| Contract — HTTP | Pact (consumer + provider, broker hosted) | Consumer-driven contracts; broker integrates with PR gate |
| Contract — events | Schema Registry conformance + golden samples | Backward-compatibility enforced |
| API smoke | Postman / Newman | Hand-curated smoke for shared APIs in staging |
| DAST | OWASP ZAP | Free, scriptable, fits CI; baselines + delta scans |
| SCA | Snyk | Dependencies, container images |
| SAST | CodeQL | First-party, GitHub-hosted, broad TS coverage |
| Secrets | gitleaks | Pre-commit + CI |
| A11y | axe-core (axe-playwright + axe-react) | WCAG 2.2 AA rules; integrated into Playwright |
| Screen reader smoke | NVDA on Windows for Electron; VoiceOver macOS smoke | Manual smoke per release wave |
| Visual regression | Chromatic + Storybook | RTL + LTR stories; review queue with team approval |
| Chaos network | Pumba + Toxiproxy | Container + network-level fault injection |
| Mutation | Stryker | TS-native, CI-friendly |
| AI eval harness | YAML cases + custom runner against ai-orchestrator-service | Provenance-aware, cost-aware, regression-aware |
| Coverage | c8 (V8) + Vitest report | Native, per-package thresholds |
All tooling is wrapped behind workspace npm scripts (
pnpm test:unit,pnpm test:integration, …) and CI workflows. Engineers do not interact with tools directly.
5. Per-Service Test Pattern
Every backend service repository follows the same test layout. Deviations require ADR.
services/<service-name>/
├── src/
│ ├── presentation/
│ ├── application/
│ ├── domain/
│ └── infrastructure/
└── test/
├── unit/ # mirrors src/, fast, no I/O
│ ├── domain/
│ └── application/
├── integration/ # Testcontainers; Postgres + Pub/Sub emulator
│ ├── outbox.spec.ts # mandatory
│ ├── inbox.spec.ts # mandatory
│ ├── tenant-isolation.spec.ts # mandatory
│ └── <use-case>.spec.ts
├── contract/ # Pact provider + consumer
│ ├── provider.spec.ts
│ └── consumer.spec.ts
├── e2e/ # Playwright API runner against staging
│ └── <journey>.spec.ts
└── fixtures/
├── builders/ # aLearner().with...().build() style
└── seeds/
5.1 What is required per service
- Unit tests for every aggregate, every value object, every domain service.
- Integration tests for every use-case (command/query handler) using Testcontainers for Postgres 16, Redis 7, Pub/Sub emulator (
gcr.io/google.com/cloudsdktool/cloud-sdkwithgcloud beta emulators pubsub start), MinIO for Cloud Storage, and Mailpit for email send capture where applicable. - Contract tests for every API the service consumes (Pact consumer) and every API/event the service produces (Pact provider + schema registry conformance).
- E2E tests for every journey the service participates in, run via Playwright API runner against the deployed staging environment.
- Coverage thresholds enforced at the package level in CI per the table in §1.1.
5.2 Mandatory test files (all services)
| File | Purpose |
|---|---|
tenant-isolation.spec.ts | Two tenants seeded with identical entity IDs; cross-tenant reads must return zero rows; direct ID access must be blocked by RLS. |
outbox.spec.ts | Outbox is transactional with the aggregate write; flusher publishes once; restart after kill produces no duplicates. |
inbox.spec.ts | Consumer dedupes by message ID; out-of-order delivery is handled per-aggregate; replays are idempotent. |
idempotency.spec.ts | Duplicate idempotency-key returns the original response within the TTL; mismatched payload returns conflict. |
5.3 Coverage thresholds in CI
vitest.config.ts per package contains:
coverage: {
reporter: ['text', 'lcov', 'html'],
thresholds: {
'src/domain/**': { lines: 95, branches: 95, functions: 95, statements: 95 },
'src/application/**': { lines: 85, branches: 85, functions: 85, statements: 85 },
'src/infrastructure/**': { lines: 80, branches: 75, functions: 80, statements: 80 },
},
exclude: ['**/*.d.ts', '**/migrations/**', '**/openapi.json'],
}
CI fails the job on any miss. Mutation thresholds are enforced separately by Stryker.
5.4 Builders, fakes, and clocks
- Data builders in
__builders__/(e.g.,aReservation().forTenant(tA).withRooms(2).withPolicy('flex24').build()). - Time is injected via a
Clockport; tests useFakeClockat a fixed ISO-8601 instant. - IDs via
IdFactory; tests useSeededIdFactoryfor determinism. - Money as
bigintmicro-units everywhere;MoneyVOensures no float math. - AI via the
AIClientport; tests inject aRecordingAIClientwith golden responses.
6. Multi-Tenant Isolation Tests (mandatory)
Multi-tenancy is enforced at three layers (docs/02-enterprise-architecture.md) — domain, DB (Postgres RLS), and API (request context). Every layer must be tested.
6.1 Required test set per service
| Test | What it proves |
|---|---|
cross_tenant_read_blocked | A query as Tenant A returns zero rows from Tenant B's table even when filters are absent. |
cross_tenant_mutation_blocked | An update of Tenant B's row as Tenant A is rejected by RLS, never silently dropped. |
direct_id_access_blocked | A GET by primary ID belonging to Tenant B as Tenant A returns 404 (not 403, to avoid existence leaks). |
tenant_context_required | Every endpoint rejects requests missing X-Tenant-Id or with a JWT tids mismatch. |
bulk_admin_requires_global_scope | Bulk endpoints require X-Tenant-Scope: global and platform-admin claim. |
event_payload_carries_tenantid | Every event published carries tenantId matching the producing aggregate's tenant. |
consumer_drops_mismatched_tenant | A consumer receiving an event from another tenant context does not project. |
cross_tenant_search_only_via_search_aggregation | Only search-aggregation-service may read across tenants; only over cross_tenant_searchable=true fields. |
rls_bypass_attempt_logged | Setting set role or SET LOCAL row_security = off is forbidden in app code; tests assert the absence at the migration layer; if attempted at runtime, an audit alarm fires. |
service_to_service_propagation | Service A → Service B propagates tenant_id in headers and trace context; B rejects mismatch. |
6.2 Why not skip — ever
Tenant isolation is the single highest-blast-radius defect class in this product. The seven-figure-incident risk justifies the marginal CI minute. Disabling these tests requires a risk-accepted ADR co-signed by Security and the service owner.
6.3 Sample isolation test (excerpt)
describe('reservation-service · tenant isolation', () => {
it('blocks cross-tenant read with identical aggregate IDs', async () => {
const tenantA = aTenant().build();
const tenantB = aTenant().build();
const sharedId = ReservationId.generate();
await seedReservation({ id: sharedId, tenantId: tenantA.id });
await seedReservation({ id: sharedId, tenantId: tenantB.id });
const resA = await api.as(tenantA).get(`/api/v1/reservations/${sharedId}`);
expect(resA.body.tenantId).toBe(tenantA.id);
const resBAsA = await api.as(tenantA).get(`/api/v1/reservations/${sharedId}?explicitTenant=${tenantB.id}`);
expect(resBAsA.status).toBe(404); // never 403, never 200
});
});
7. Saga Tests
Sagas are the integration-correctness backbone. Each named saga in docs/04-event-driven-architecture.md has a dedicated suite.
7.1 Sagas in scope
| Saga | Participants | Compensation chain |
|---|---|---|
| SAGA-MEL-BOOKING-CONFIRM | reservation → inventory(hold) → pricing(quote) → payment-gateway(intent) → reservation(confirm) → lock-integration(issue@check-in) → notification | release hold · refund · revoke key |
| SAGA-MEL-CANCELLATION | reservation → billing(refund-eligibility) → payment-gateway(refund) → inventory(release) → lock-integration(revoke) → notification | reverse refund (rare) · re-allocate |
| SAGA-MEL-CHECKIN | reservation(check-in) → lock-integration(issue) → billing(open-folio) → notification | revoke key on rollback |
| SAGA-MEL-CHECKOUT | reservation(check-out) → lock-integration(revoke) → housekeeping(queue-turnover) → billing(close-folio) → notification | re-open folio · cancel turnover |
| SAGA-MEL-DATE-CHANGE | reservation(modify) → inventory(re-hold) → pricing(re-quote) → billing(adjust) → lock-integration(update) → notification | restore prior dates · refund delta |
| SAGA-MEL-LISTING-PUBLISH | property → theme-config → pricing → search-aggregation → bff-consumer (cache invalidation) | unpublish on partial failure |
| SAGA-MEL-AI-HITL | originating service → ai-orchestrator(propose+provenance) → human review → originating service(commit with decisionId) | discard proposal · audit |
7.2 Required test cases per saga
For every saga, four classes of tests are mandatory:
- Happy path. Each step completes; the final state is reachable; the full chain of events is published in order; the final aggregate carries the expected snapshot.
- Per-step compensation. Inject a failure at step N for each N; the compensation chain runs in reverse; the system returns to a consistent state; the original idempotency key is honoured for retries.
- Idempotent retry. Redeliver each step's event; downstream consumers do not double-apply; outbox publishers do not double-publish.
- Partial failure injection. Combine two faults in different steps (e.g., payment succeeds but lock issuance flakes twice then succeeds); ensure the saga reaches eventual success or a deterministic terminal failure.
7.3 Saga test harness
A shared @ghasi/saga-harness package wraps Pub/Sub emulator + Postgres + a fake-clock + injectable per-step failure switches. A test reads:
runSaga('SAGA-MEL-BOOKING-CONFIRM', {
givenInventoryAvailable: true,
givenQuoteValid: true,
whenPaymentFailsAtStep: 'capture',
thenStateAfterCompensation: 'cancelled_by_compensation',
thenEventsPublished: [
'reservation.held.v1',
'payment.intent.created.v1',
'payment.failed.v1',
'inventory.hold.released.v1',
'reservation.confirm_failed.v1',
],
});
8. Sync & Offline Tests (Electron desktop)
Sync correctness is the second-highest-blast-radius defect class. Offline tests are first-class CI citizens, not ad-hoc.
8.1 Required scenario coverage
| ID | Scenario | Expected behaviour |
|---|---|---|
| O-01 | Desktop offline 1h, 5 reservations modified, then reconnect | All 5 mutations reach server in original order; no duplicates; outbox empty after flush |
| O-02 | Desktop offline 24h, 50 housekeeping updates + 10 walk-ins + 3 cash-on-arrival check-ins, then reconnect | All applied; conflicts surfaced if any per per-aggregate policy; sync completes within 5 minutes on a 1 Mbps link |
| O-03 | Desktop offline 7d (max grace) | App still usable for read + mutate; clear UI banner at 5d/6d/7d boundaries; on reconnect, sync proceeds in pages |
| O-04 | Conflict simulation: rate plan changed both server-side and offline desktop-side | Per-aggregate policy applied; rate-plan policy is "server wins on monetary fields, last-writer-wins on display name"; operator notified for material conflicts |
| O-05 | Conflict: same reservation date-changed by two devices offline | First flusher wins; second device receives 409 conflict, surfaces the diff, lets operator decide |
| O-06 | Outbox flush ordering | Events flush FIFO; a failing event blocks its dependents but not unrelated outbox queues |
| O-07 | Idempotency on retry | Same outbox row resubmitted; server's inbox dedupes; only one application |
| O-08 | Encrypted SQLite integrity | Power-loss mid-transaction; on restart, WAL replays cleanly; no partial commits |
| O-09 | Lock issuance offline (vendor cached creds) | Credential created locally, card encoded, issuance event staged; on reconnect, central record reconciles; cache age limits respected |
| O-10 | Cash-on-arrival check-in offline | Check-in proceeds; cash drawer reflects deposit; reservation transitions; on reconnect, all events reach server in correct order |
| O-11 | Conflict on housekeeping assignment | LWW per policy; the loser's UI updates within 60 s |
| O-12 | Sync paused under low bandwidth | Throttle activates ≤ 256 kbps; large media deferred; small mutations still flush |
| O-13 | Sync recovery shows accurate progress | Operator UI shows "X of Y synced", queue depth, elapsed time |
| O-14 | Operator forces re-sync | Forced re-sync rebuilds local state cleanly; no data loss; takes ≤ N minutes for the configured tenant size |
| O-15 | Migration mid-flight | An app update with schema migration runs cleanly; existing outbox preserved across migration; rollback supported |
8.2 Tooling
- Playwright Electron drives the real Electron app;
BrowserWindowis intercepted to drive renderer interactions,app.relaunch()is exercised,webContents.sessionis examined for storage state. - Toxiproxy sits in front of the sync endpoint to inject latency, packet loss, bandwidth caps.
- Per-test Postgres + Pub/Sub emulator via Testcontainers acts as the cloud side.
- Custom sync harness seeds the desktop SQLite with a known state and runs assertion helpers (
expectOutboxEmpty(),expectAggregateState(),expectConflictRecorded()).
8.3 SQLCipher integrity tests
- KDF (PBKDF2-SHA512 ≥ 256k iterations) verified per release.
- DB key derivation: OS keychain fragment + JWT-bound device fragment; tampering with either renders the DB unreadable.
- Power-loss simulation via
kill -9on the Electron main process during a transaction; on restart, WAL replays cleanly.
8.4 Auto-update tamper tests
- The signed update manifest is fetched, its signature verified against the embedded public key.
- Tampered manifests are rejected (test fixture flips a byte; assert refusal).
- Staged rollouts: a fixture pretends to be a different staging cohort; assert correct cohort assignment.
9. Lock Integration Tests
Lock integration is the most failure-mode-rich subsystem because it spans cloud, desktop, USB/serial peripherals, and remote vendor APIs.
9.1 Vendor mocks
Every vendor adapter (TTLock, Salto, Assa Abloy, generic Wiegand) ships with a vendor-mock that is contract-faithful to the real SDK. The mock supports failure-injection switches:
- vendor down (timeout, 5xx)
- vendor refuses (4xx)
- vendor accepts but no-op (silent failure)
- vendor responds slowly (latency injection)
- vendor sends out-of-order callbacks
- vendor returns invalid signature on callback
9.2 Generic Wiegand mock encoder
For the generic Wiegand path, a USB/serial mock encoder runs in CI as a Node process exposing /dev/ttyUSBmock. Tests:
- Encode card with valid credential
- Encode fails (no card present)
- Encode fails (card error response)
- Re-encode after failure
- Card revoke
9.3 Required scenario coverage
| ID | Scenario | Expected |
|---|---|---|
| L-01 | Issue mobile-key online, vendor up | Credential created, invite SMS dispatched, key reachable on guest's phone |
| L-02 | Issue mobile-key with vendor down | Issuance queues, retry with backoff; UI prompts fallback PIN; queue drains on vendor recovery |
| L-03 | Encode RFID card online | Card encoded, credential persisted, audit recorded |
| L-04 | Encode RFID card offline (cached creds) | Card encoded against cached creds, issuance event staged, syncs on reconnect |
| L-05 | Cached creds expired (> 24h) | Offline issuance refused; clear UI prompt to come online |
| L-06 | Lost-key revoke | Credential transitions to revoked; new credential issued; audit captures both |
| L-07 | Mobile-key invite delivery test | SMS or WhatsApp delivered; if both fail, fallback to email; if all fail, alert |
| L-08 | Clock-skew tolerance | Lock device clock ±5 minutes from server; credential still valid; outside window → reject with clear error |
| L-09 | Vendor anomaly (door opened after revoke) | Anomaly detected by AI; alert raised; door logged with severity |
| L-10 | Vendor secret rotation | Rotation runs; previous secret valid until cutover; failed rotation alerts |
| L-11 | Adapter swap (TTLock → Salto) | Existing reservations re-issue keys via new vendor on next check-in; no double-credentials |
| L-12 | Concurrent issuance for same room | Optimistic lock; second wins or both safely serialised per policy |
| L-13 | Issuance during reservation modification | Atomic update of validUntil; rollback on partial failure |
10. Payment Tests
Payments are the third-highest-blast-radius surface (after IAM and locks). Tests cover correctness, idempotency, PCI scope, and reconciliation.
10.1 Adapter test matrix
| Adapter | Sandbox provider | Required scenarios |
|---|---|---|
| Stripe | Stripe test mode | success · 3DS challenge · capture · partial refund · full refund · chargeback simulation · webhook idempotency · webhook signature failure · FX-snapshot freeze |
| PayPal | PayPal sandbox | success · cancel mid-flow · webhook idempotency · refund |
| Cash-on-arrival | local mock | deposit at confirm · arrival capture · no-show → policy charge · partial deposit |
| MFS (JazzCash, Easypaisa, Fawry, M-Pesa) | provider sandbox or mock | initiate · OTP confirm · callback signature · failure-rail-down |
10.2 Required scenarios (cross-adapter)
| ID | Scenario | Expected |
|---|---|---|
| P-01 | Card payment end-to-end with 3DS | Capture succeeds, reservation held → confirmed, FX snapshot frozen |
| P-02 | Webhook delivered twice | Only one state transition; idempotency-key check passes |
| P-03 | Webhook signature invalid | Rejected with 401; security alert fires |
| P-04 | Payment fails (declined) | Reservation rolls back to held with extended TTL; UI reflects |
| P-05 | Partial refund | Folio shows credit and remaining capture |
| P-06 | Full refund + lock revoke | Refund succeeds, key revoked, reservation cancelled |
| P-07 | Chargeback simulation | Chargeback aggregate created; folio flagged; evidence upload works |
| P-08 | Cash-on-arrival happy path | Reservation confirmed_pending_payment; check-in captures cash; folio reflects |
| P-09 | FX-snapshot freeze | Display currency USD, settlement AFN; folio records both with snapshot timestamp |
| P-10 | PCI scope assertion | No PAN appears in any non-PCI service log; assertion at log-shipper level fails build if violated |
| P-11 | EOD cash drawer reconciliation | Expected vs counted vs variance; variance > X% requires manager approval; events published |
| P-12 | Provider rotation | Switch tenant from Stripe-A to Stripe-B; new bookings route to B; old bookings refundable on A |
10.3 PCI scope verification
A nightly job greps all non-PCI service logs in a sandbox window for PAN-shaped strings and known test card numbers. Any match fails the job and pages security.
11. AI Eval Harness
The AI eval harness is the only acceptable way to gate AI quality. It runs nightly in staging and on every change to a prompt or model in CI.
11.1 What the harness does
- Loads YAML test cases per AI capability.
- Runs each case against
ai-orchestrator-servicein the appropriate environment (staging by default; CI for prompt-changed PRs). - Asserts response shape (structured output), behaviour bands (safety, refusal, helpfulness), and cost guardrails.
- Records
aiProvenancefor every run. - Compares against the latest accepted baseline; significant regression blocks merge or halts the nightly run.
11.2 YAML test case shape
capability: pricing.suggest_rate
version: v3
description: >
Suggest a daily rate for a property given recent bookings, market context,
and a known seasonality flag.
inputs:
property_id: "prop-staging-001"
date_range: "2026-06-15..2026-06-22"
baseline_rate_micro_usd: 75000000
occupancy_last_14d: 0.62
seasonality: "tourist_high"
expectations:
output_schema_valid: true # must parse against pricing.suggest_rate.v3
must_include_fields:
- "suggested_rate_micro_usd"
- "rationale"
- "expected_uplift_pct"
numeric_bounds:
suggested_rate_micro_usd:
min: 60000000
max: 200000000
expected_uplift_pct:
min: -25
max: 80
rationale_quality:
min_words: 20
must_mention_one_of: ["occupancy", "seasonality", "demand"]
safety:
must_not_contain_pii: true
must_not_contain_political: true
provenance:
must_attach: true
required_fields: ["model", "version", "promptId", "traceId"]
cost_budget_micro_usd: 2500
hitl_required: true
11.3 Required AI capability evals (each as one or more YAML files)
| Capability | Files | Notes |
|---|---|---|
| Pricing rate suggestion | pricing.suggest_rate.*.yaml | One per market (AF, TJ, IR, PK, EG) |
| Demand forecast | forecast.occupancy.*.yaml | 14d + 90d horizons |
| Housekeeping order | housekeeping.order.*.yaml | Edge ONNX + cloud equivalence |
| Anomaly detection — bookings | anomaly.booking.*.yaml | False-positive ceiling |
| Anomaly detection — payments | anomaly.payment.*.yaml | Fraud signals |
| Anomaly detection — locks | anomaly.lock.*.yaml | Door-event sequences |
| Upsell recommendation | upsell.recommend.*.yaml | Per persona |
| Guest message draft | message.draft.*.yaml | Per locale (PS, FA, AR, EN, FR, TJK), per context |
| Translation hint | translate.hint.*.yaml | PS↔EN, FA↔EN, AR↔EN |
| Theme contrast adjust | theme.contrast.*.yaml | A11y constraint |
11.4 Cost guardrails
- Each capability has a
cost_budget_micro_usdper call. - Per-tenant daily budget enforced by the orchestrator; exceeding band degrades to lower-cost models.
- The eval harness asserts cost stays within band; cost regression > 20% blocks merge.
11.5 Provenance metadata tests
- Every successful AI response in the harness must carry
aiProvenance = { model, version, promptId, traceId, reviewedBy?, reviewedAt?, local }. - Tests assert presence and well-formedness; missing fields fail the case.
11.6 HITL gate enforcement tests
- For every capability marked
hitl_required: true, a test asserts that committing the originating action without adecisionIdfails withMELMASTOON.AI.HITL_BYPASS. - A test asserts that committing with a
decisionIdsucceeds. - A test asserts that the orchestrator audit log captures the decision with
{decisionId, decisionBy, timestamp}.
11.7 Edge-vs-cloud equivalence
For every capability offered both ways (e.g., housekeeping order, basic forecasting, anomaly heuristics), an equivalence test asserts the edge ONNX result is within a configured tolerance of the cloud Vertex result on a fixed corpus. Drift beyond tolerance triggers a retraining backlog item.
11.8 Regression detection
- Each capability's nightly run produces a metric (e.g., F1 for anomaly classifiers, RMSE for forecasts, BLEU/ROUGE for messages, contrast pass/fail for theme adjustments).
- Metrics are stored; significant regression versus rolling baseline (configurable) opens an automated issue, blocks new prompt promotion, and notifies the owning team.
12. Frontend Test Strategy
Web, mobile, and desktop frontends each have a tailored stack while sharing the design system, i18n bundle, and AI provenance contract.
12.1 Web (Next.js — Consumer + Tenant Booking)
- Unit & component: Vitest + Testing Library (React) + jsdom. Coverage ≥ 80% lines on
app/,components/,hooks/,services/. - E2E: Playwright. Run on every PR (smoke) and nightly (full).
- Visual regression: Chromatic against Storybook stories; mandatory for every component story to ship LTR + RTL variants.
- A11y: axe-playwright on every E2E and per-story axe-storybook in CI.
- Cross-locale: Per-locale snapshots for at least Pashto, Dari, Persian, Arabic, English, Tajik on key screens.
- Performance: Lighthouse CI per page with budgets enforced.
12.2 Mobile (React Native — Consumer + Tenant Booking + self-check-in)
- Unit & component: Jest + Testing Library (RN). Coverage ≥ 80%.
- E2E: Detox on emulators in CI (Android API 30+, iOS 16+).
- Cross-locale: Same locales as web.
- A11y: Native a11y APIs covered; manual VoiceOver/TalkBack smoke per release.
- Performance: RN profiler smoke; cold-start budget per platform.
12.3 Desktop (Electron — backoffice)
- Unit & component: Vitest + Testing Library (React) + jsdom for the renderer.
- E2E: Playwright Electron driving the real packaged app.
- Offline: dedicated Playwright Electron suites under
test/e2e/offline/; see §8. - NVDA smoke (Windows): scripted screen-reader smoke per release wave for the front-desk core flows.
- CSP & contextBridge: unit tests assert the
window.melmastoonsurface is exhaustive and that no other globals leak; CSP violation tests load malicious payloads and assert the renderer rejects them.
12.4 Visual regression (cross-surface)
- Chromatic stories per component in LTR + RTL × at least three locales.
- Story library shared across web and Electron renderer where components are common.
- Manual review queue; tolerance thresholds tuned per component.
12.5 Cross-locale tests
| Surface | Required locales tested in CI |
|---|---|
| Consumer Web | ps, fa, ar, en, tjk, fr |
| Consumer Mobile | ps, fa, ar, en |
| Tenant Booking Web | ps, fa, ar, en, tjk, fr |
| Tenant Booking Mobile | ps, fa, ar, en |
| Electron Desktop | ps, fa, ar, en, tjk |
| Notifications | per template, per locale |
RTL screenshot tests are mandatory on every UI PR; the Definition of Done explicitly enforces this.
13. Performance Testing
13.1 k6 scenarios per BFF
| Scenario | BFF | Goal | Pass criteria |
|---|---|---|---|
| Search storm | bff-consumer-service | Burst-of-search load on meta layer | p95 < 1500 ms · err rate < 0.5% at 100 RPS sustained |
| Booking burst | bff-tenant-booking-service | Concurrent booking flows during a sale | p95 < 1200 ms on quote · < 800 ms on confirm step (excluding 3DS) at 50 RPS |
| Sync flood | bff-backoffice-service | Many desktops reconnecting after a regional outage | p95 sync push < 2 s at 200 concurrent desktops; conflict rate within tolerance |
| AI gateway storm | ai-orchestrator-service | Surge in AI calls | p95 < 3 s · cost within budget; circuit-breakers engage on provider 5xx surge |
| Inventory hot key | inventory-service | Peak demand on one room type | No oversell · throughput within target |
| Webhook storm | payment-gateway-service | Provider replays many webhooks | All processed exactly once · idempotency-key check passes |
13.2 SLO targets (per-service summary; full table per service OBSERVABILITY.md)
| Service | p50 latency | p95 latency | Availability | Notes |
|---|---|---|---|---|
| iam-service | 50 ms | 200 ms | 99.9% | Authn endpoints sub-200 ms |
| reservation-service | 80 ms | 300 ms (read) / 600 ms (write) | 99.9% | Saga end-to-end < 1.5 s p95 |
| payment-gateway-service | 100 ms | 800 ms | 99.95% | Excludes 3DS challenge; webhook p95 < 200 ms |
| lock-integration-service | 100 ms | 500 ms (issue) | 99.9% | Vendor calls p95 budget per vendor SLA |
| bff-tenant-booking-service | 70 ms | 400 ms | 99.9% | Booking funnel pages |
| bff-consumer-service | 70 ms | 400 ms | 99.9% | Meta search pages |
| bff-backoffice-service | 80 ms | 400 ms (read) / 800 ms (sync) | 99.9% | Sync push < 2 s p95 |
| search-aggregation-service | 50 ms | 300 ms | 99.9% | Index reads |
| ai-orchestrator-service | 200 ms | 3000 ms | 99.5% | AI provider latency dominant |
13.3 Frontend performance budgets
| Page | LCP | INP | CLS | JS gzip |
|---|---|---|---|---|
| Consumer meta search list | ≤ 2.5 s | ≤ 200 ms | ≤ 0.1 | ≤ 180 KB |
| Tenant booking landing | ≤ 2.0 s | ≤ 200 ms | ≤ 0.1 | ≤ 200 KB |
| Tenant booking checkout | ≤ 2.0 s | ≤ 200 ms | ≤ 0.05 | ≤ 220 KB |
| Electron desktop dashboard | ≤ 1.5 s (cold) · ≤ 400 ms (warm) | ≤ 100 ms | ≤ 0.05 | n/a |
Budgets enforced by Lighthouse CI on web and the Electron renderer; regressions block merge.
13.4 Soak tests (24h)
- Run nightly against staging.
- 24h sustained mixed-load profile (search + booking + sync + AI + webhook); assert no memory growth, no leak in the desktop sync worker, no Pub/Sub lag.
13.5 Load profile per release wave
| Wave | Tenants | Concurrent desktops | Concurrent booking flows | AI RPS | Notes |
|---|---|---|---|---|---|
| R1 | 50 | 200 | 50 | 5 | Validate staging-scale viability |
| R2 | 250 | 1,000 | 200 | 25 | AI-native operations on |
| R3 | 1,000 | 5,000 | 1,000 | 100 | Globalisation stage |
14. Security Testing
14.1 Categories & tooling
| Category | Tool | Trigger |
|---|---|---|
| SAST | CodeQL | every PR |
| SCA | Snyk | every PR + nightly |
| Secrets | gitleaks | pre-commit + CI |
| DAST | OWASP ZAP | nightly against staging |
| Container scan | Trivy + Snyk Container | on image build |
| Pen-test | external firm | pre-launch + per major release |
| Threat-model review | manual | per ADR introducing new boundary |
14.2 Electron-specific security tests
| Test | What it proves |
|---|---|
contextBridge_surface_enumeration | Only window.melmastoon.* is exposed to the renderer; no other globals leak; ipcRenderer and require are absent |
csp_violation | A malicious script injection attempt is refused by the CSP; the violation is reported to the audit log |
nodeIntegration_off_assertion | nodeIntegration: false, contextIsolation: true, sandbox: true, webSecurity: true on every BrowserWindow |
sqlcipher_kdf | Key derivation uses ≥ 256k iterations of PBKDF2-SHA512; tampering with iterations fails the check |
auto_update_signature_tamper | A fixture flips a byte in the update manifest; the updater refuses to install |
keytar_secrets | Refresh token + DB key fragment never appear in plaintext on disk |
deep_link_validation | melmastoon:// deep links are validated against an allowlist; arbitrary file URIs refused |
serialport_access_scoped | Serial/HID access is mediated through the typed bridge; renderer cannot enumerate devices |
14.3 Auth & authorization tests
- AuthN: token rotation, refresh family invalidation on reuse, MFA enrolment + enforcement, SSO callback validation, device binding cert lifecycle.
- AuthZ: per-role matrix tested across every service (positive + negative); tenant scope enforced on every endpoint; ABAC rules tested where present.
14.4 Penetration testing
- External pen-test pre-Wave R1 launch and every major release thereafter.
- Scope: all surfaces (Consumer Web, Tenant Booking Web, Mobile, Electron Desktop, Control Plane), all public APIs, IAM, payment, lock, sync.
- Findings tracked to closure; blocking severity = critical or high.
15. Compliance Testing
15.1 Tax computation matrix
For each supported jurisdiction (AF, TJ, IR, PK, EG, AE, SA, OM, BH, NP, BD, KE, …), a unit test pack asserts:
- VAT/GST applied at correct rate.
- Tourism levy applied where required.
- Tax-exempt cases (diplomatic, gov-issued, tax-free zones) honoured on proof.
- Snapshot-at-confirm: changes after confirm do not retroactively alter the folio.
15.2 Audit log integrity
- Every state-changing event subscribes to the audit log.
- Daily Merkle anchoring job writes a root and stores it.
- Test asserts inclusion proofs work for entries days/weeks back.
- Tampering with a stored entry breaks proofs.
15.3 Data residency tests
- Cross-region writes are rejected.
- Background jobs respect tenant region.
- Reporting and analytics aggregations stay within region (or are regional-fanned).
15.4 GDPR-style erasure (Phase 2)
- Erasure request flow tested end-to-end against a sandbox tenant.
- PII fields tombstoned; legal/financial records remain (with reference IDs only).
- Erasure certificate generated and verifiable.
16. Accessibility Testing
16.1 Target & tooling
- Target: WCAG 2.2 AA on every user-facing surface, every locale.
- Tooling: axe-core via axe-playwright (every E2E) and axe-storybook (every component story).
- Manual smoke: NVDA on Windows for the Electron desktop core flows; VoiceOver on macOS for the Tenant Booking web on Safari; per release wave.
16.2 Per-surface checklist
| Surface | A11y CI checks | Manual smoke |
|---|---|---|
| Consumer Web | axe-playwright on E2E suite + axe-storybook | per release wave |
| Tenant Booking Web | axe-playwright on E2E + axe-storybook | per release wave |
| Consumer Mobile | RN a11y test props + Detox a11y assertions | per release wave |
| Electron Desktop | axe-playwright (renderer) | NVDA per release wave |
| Notifications (email) | template-level lint (alt text, link text, contrast) | per release wave |
16.3 Locale-aware a11y
- Reading order verified in RTL locales (Pashto, Dari, Persian, Arabic).
- Screen reader pronunciation smoke for at least Pashto + Dari content blocks (NVDA + Espeak voice).
- Mixed-direction text bidi rendering verified.
17. Internationalization Testing
17.1 Required coverage
- Every UI surface tested in at least Pashto + Dari + English in CI; additional locales (Persian, Arabic, French, Tajik) tested per release wave.
- Bidi text rendering tested with mixed PS/EN strings, AR/EN strings, and bidi-isolating UI fragments.
- Locale-aware date formats tested per calendar (Gregorian + Hijri + Solar Hijri presentational variants).
- Number formatting tested per locale (decimal separator, grouping, percent).
- Currency formatting tested per currency (symbol, position, grouping).
- Numeral systems tested: Latin numerals across all locales by default; Arabic-Indic supported on display where tenant prefers; inputs always Latin numerals for safety.
17.2 RTL screenshot regression
- Every component story has a
storyName—rtlvariant. - Chromatic regression catches mirroring defects.
- Logical CSS only (
padding-inline,margin-block,inset-inline-start); per-PR lint rejectspadding-left/rightandmargin-left/right.
17.3 Translation completeness gate
- A CI job parses extracted strings against per-locale bundles; missing keys for active locales emit warnings; missing keys for tenant-default locales fail the job.
18. Test Data Strategy
18.1 @ghasi/test-data synthetic tenant generator
- A monorepo package generates realistic tenants (50+ seeded for staging) deterministically from a seed.
- Outputs:
- Tenant + property hierarchy (single-property, multi-property, chain operator).
- Rooms with realistic room types and counts (10 / 30 / 80 / 200 room properties).
- Rate plans across BAR, weekly, government, corporate, non-refundable.
- Reservations across
held / confirmed / checked_in / checked_out / cancelled / no_showwith realistic distributions. - Folios with charges, payments, refunds matching reservations.
- Housekeeping tasks, maintenance tickets, key credentials.
- Multi-locale content blocks (ps, fa, ar, en, tjk).
18.2 Determinism + safety
- Seed-based; the same seed produces the same tenant pack; safe to recreate.
- PII-free. All names, phones, emails, IDs are synthetic; never derived from real users.
- Sandbox prefixes (
e2e-<runId>-…) for ephemeral test tenants; janitor jobs purge within 1 hour.
18.3 Where to use which dataset
| Environment | Dataset |
|---|---|
| Local dev | small synthetic pack (3 tenants, 30 rooms total) |
| CI ephemeral | per-PR pack created on demand |
| Staging | 50+ tenants seeded on rebuild; refreshed weekly |
| Pre-production | mirror of staging plus a staged-canary tenant |
| Production | no synthetic tenants except dedicated synthetic-monitor sandbox tenant (heavily isolated) |
18.4 Data refresh policy
- Staging refresh weekly, with the option to restore on demand.
- Per-PR ephemeral environments seeded fresh on environment provisioning.
- Production never receives synthetic data outside the sandbox tenant.
19. CI/CD Integration
19.1 Per-service pipeline
lint → typecheck → unit (with coverage) → integration (Testcontainers) → contract (Pact) →
coverage gate → mutation (changed files) → image build → vulnerability scan
Failures in any step block merge. Coverage gate is per-package, per-layer thresholds (see §1.1).
19.2 Per-frontend pipeline
lint → typecheck → unit → component → axe-storybook → visual (Chromatic) → e2e (smoke) →
RUM bundle size assertion
Frontend E2E runs against an ephemeral preview environment provisioned by Cloud Run + per-PR namespacing.
19.3 PR-touched service deeper run
- A PR changing service X triggers:
- Full integration suite for X.
- Pact verification for X as provider against current consumer contracts.
- Pact publication for X as consumer.
- E2E smoke covering journeys X participates in.
- Mutation testing on changed
/domain/**files.
19.4 Nightly pipeline
full E2E (web + mobile + desktop, including offline) →
soak (24h k6 mixed load) →
AI eval suite (against staging ai-orchestrator) →
DAST scan (OWASP ZAP) →
SCA full + container vuln scan →
Chromatic full library →
synthetic monitor results aggregation
A failure rolls up to an auto-issue in the on-call queue.
19.5 Release pipeline
- Image promotion staging → pre-prod → prod.
- Canary 5% for 30 minutes; auto-rollback on SLO regression.
- Manual approval gate for prod beyond canary (until automation matures).
19.6 Branch strategy & test gates
| Branch | Required gates |
|---|---|
| Feature branches | unit + component + lint + typecheck (locally + on push) |
| PR open | full per-service + per-frontend pipeline above |
main merged | nightly suites + image promotion to staging |
release/* | canary rollout + smoke + release-wave gate |
20. Test Environments
| Env | Purpose | Backed by | Data | Notes |
|---|---|---|---|---|
| local | Engineer development & quick iteration | Docker Compose with Postgres, Redis, Pub/Sub emulator, MinIO; or Cloud Code | small synthetic | Engineers run unit + integration locally; opt-in to E2E |
| CI ephemeral | Per-PR isolated test environment | Cloud Run namespaced + Cloud SQL temporary instances | per-PR seed | Lifecycle managed by CI; teardown after merge or 24h idle |
| staging | Full production-like, shared | GCP project: ghasi-staging | seeded synthetic, refreshed weekly | Where E2E full + soak + AI eval + DAST run nightly |
| pre-production | Canary + final QA | GCP project: ghasi-preprod | mirror of staging + canary tenant | Used for release validation |
| production | Live | GCP project: ghasi-prod | real tenants | Synthetic monitors run; no destructive tests |
20.1 Environment safety
- Every non-prod environment has a banner on every surface ("STAGING" / "CI" / "PREPROD").
- Environments are siloed: secrets, DBs, buckets are namespaced.
- Production is forbidden to be used for any test other than synthetic monitors.
20.2 Test data isolation
- Each test creates and cleans up its own data; no shared state across tests.
- Sandbox tenants prefixed
e2e-<runId>-are purged hourly by a janitor job.
21. Synthetic Monitoring
Synthetic monitoring proves that production keeps working from outside, not just inside.
21.1 Probes
| Probe | Frequency | What it tests |
|---|---|---|
| Uptime | every 1 min from 5 regions | Public health endpoints respond 200 |
| Booking-flow synthetic (web) | every 5 min | Sandbox tenant; full booking flow with sandbox payment |
| Booking-flow synthetic (mobile) | every 30 min | RN test rig executes booking against sandbox |
| Sync API synthetic | every 5 min | Run a fake desktop client through pull + push round-trip |
| AI gateway synthetic | every 10 min | Send a known prompt; assert structured output, latency, cost |
| Lock vendor synthetic | every 15 min | Issue + revoke a credential against a vendor sandbox |
| Payment webhook synthetic | every 15 min | Replay a known webhook into the staging endpoint; assert state |
21.2 Alerting & routing
- All synthetic probe failures route to PagerDuty with a 2-cycle threshold (avoid single-flake noise).
- The on-call has a runbook linked from every alert.
- Synthetics run against a dedicated synthetic-monitor sandbox tenant that is isolated from real tenants.
22. Release Quality Gates
Each release wave (R1, R2, R3) has hard quality gates. Missing any gate blocks promotion.
22.1 Per-wave gate matrix
| Gate | R1 | R2 | R3 |
|---|---|---|---|
| Coverage thresholds met | ✓ | ✓ | ✓ |
| All P0 E2E green for 7 nights | ✓ | ✓ | ✓ |
| Offline E2E (1h, 24h) green | ✓ | ✓ | ✓ |
| Performance baseline within budget | ✓ | ✓ | ✓ |
| Pen-test pass (no critical/high) | ✓ | ✓ | ✓ |
| Multi-tenant isolation tests green on every service | ✓ | ✓ | ✓ |
| Saga compensation tests green | ✓ | ✓ | ✓ |
| AI eval baseline within tolerance | n/a (R1 limited AI) | ✓ | ✓ |
| Accessibility audit complete (axe + NVDA + manual) | ✓ | ✓ | ✓ |
| Localisation test complete for active locales | ✓ | ✓ | ✓ |
| 24h soak test passes | ✓ | ✓ | ✓ |
| Backup/restore drill within last 90d | ✓ | ✓ | ✓ |
| Lock vendor adapter tests pass for in-scope vendors | TTLock + Wiegand | + Salto | + Assa Abloy |
| Payment adapter tests pass for in-scope rails | Stripe + cash + PayPal | + MFS | + region-specific rails |
| Multi-region failover drill | n/a | n/a | ✓ |
22.2 Sign-off
- Engineering Manager + Tech Lead + Security + SRE sign off per release wave.
- Sign-off references the gate matrix; missing checks block release.
23. Bug Triage & SLAs
23.1 Severity definitions
| Severity | Definition | Examples |
|---|---|---|
| S0 — Critical | Service down, data loss, security breach, charge-without-reservation, locked-out guest at door, multi-tenant data leak | global IAM outage, payment double-charge, key revoke fails for revoked credential |
| S1 — High | Major journey broken for many tenants; significant degradation | booking confirm fails for tenants in region X, sync conflict storm |
| S2 — Medium | Limited journey impact; workaround exists | UI bug on a single page, AI suggestion incorrect for rare case |
| S3 — Low | Cosmetic, minor inconvenience, internal-only | typo, low-impact log noise |
23.2 SLA targets
| Severity | Response time | Resolution target |
|---|---|---|
| S0 | 15 minutes | 4 hours |
| S1 | 1 hour | 24 hours |
| S2 | 1 business day | 1 week |
| S3 | 1 week | scheduled |
23.3 Blocker policy
- S0 blocks any release; promotion paused until resolved or risk-accepted (only by Eng Manager + Security).
- S1 blocks release-wave promotion.
- S2 are tracked into the next sprint; can ship if not regression-causing.
- S3 are tracked into the backlog.
23.4 Regression policy
- Any bug traced to a regression in the last 30 days requires a regression test in the fixing PR.
- Regression count per release wave is a tracked metric; targets in OBSERVABILITY.md.
24. Test Ownership
Test ownership maps to responsibility, not bureaucracy.
| Surface / asset | Owner | Examples |
|---|---|---|
| Service unit + integration + contract tests | Service team | iam-service team owns iam unit, integration, Pact tests |
| Multi-tenant isolation tests per service | Service team | enforced by CI gate; service owner can't merge without |
| Saga tests | Owning saga's primary service team (reservation-service for booking saga) | reservation-service team owns booking-saga harness |
| Sync engine tests | bff-backoffice-service team + Platform | shared ownership; service teams contribute aggregate-specific cases |
| AI eval harness | Platform AI team | owns the harness; service teams own per-capability YAML cases |
| Performance | Platform / SRE | k6 scenarios + budgets; service teams contribute SLO refinements |
| Security (SAST/DAST/SCA/secrets) | Security team | tooling + policies + pen-test coordination |
| E2E + journey tests | Product / QA + service teams | journey ownership is split by primary persona |
| Accessibility | Frontend Platform + service teams | tooling + per-component a11y |
| Localisation | Frontend Platform + content team | bundle correctness + per-locale review |
| Synthetic monitors | SRE | operates the synthetic sandbox tenant; routes alerts |
| Test data generator (@ghasi/test-data) | Platform | maintains generators + seeded packs |
24.1 RACI summary
| Asset | R | A | C | I |
|---|---|---|---|---|
| Service unit/integration tests | Service team | Tech Lead | QA | Eng Manager |
| Pact provider/consumer | Service team | Tech Lead | adjacent service teams | Architects |
| Saga tests | Saga primary service team | Tech Lead | participating teams | Architects |
| AI eval YAMLs | Service team | AI Platform Lead | AI capability owner | Compliance |
| Performance | SRE | SRE Lead | Service teams | Eng Manager |
| Security scans | Security | Security Lead | Service teams | Eng Manager + Compliance |
| E2E & journeys | QA + Product | Product Lead | Service teams | Eng Manager |
| A11y | Frontend Platform | Frontend Lead | Service teams | Compliance |
| Synthetic monitors | SRE | SRE Lead | Service teams | Eng Manager |
25. Anti-Patterns
The following are forbidden in this codebase. They appear once because they are common, real, and damaging in production hotel platforms.
25.1 Flaky tests left enabled
- A flaky test is a bug. Quarantine it within 24 h with a linked issue; fix or delete within one sprint.
- Quarantined tests run in a separate non-blocking suite; persistent quarantine triggers a review.
- "Sometimes-passes" is not green. CI treats flakes as failures.
25.2 A single shared dataset across tests
- Shared mutable state is a coupling that hides bugs and makes failures non-reproducible.
- Every test creates and cleans up its own data; per-test transactional rollback is the default for SQL integration tests.
- Sandbox tenants are scoped per test run with a janitor purging within 1 hour.
25.3 Mocking your own production code
- Test doubles are for boundaries (vendor SDKs, network, time, randomness).
- Do not mock your own application services or domain code; that tests the mock, not the system.
- Pact and Testcontainers cover the boundaries you might otherwise be tempted to mock.
25.4 Testing UI implementation details
- React Testing Library queries by accessibility role and label, not by class name or component internals.
- Snapshot tests are forbidden for non-trivial UIs (they go stale and get rubber-stamped).
- Visual regression covers what snapshots pretend to.
25.5 Skipping tenant-isolation tests because "it works"
- "It works in staging" is not evidence; the next migration may flip a flag.
- Tenant isolation tests are mandatory on every service that touches tenant data; CI fails the job if missing.
- A multi-tenant data leak is a company-ending incident in this market segment.
25.6 Skipping offline tests because "the cloud will be up"
- The cloud will not be up. The thesis of the product is that the operator continues regardless.
- Offline tests are mandatory for the Electron desktop on every offline-capable flow.
- A regression that breaks offline behaviour is treated as S0 because it kills the product's defining promise.
25.7 AI without provenance
- Every AI artifact carries
aiProvenance; missing-provenance commits fail CI via a static check on the AI client surface. - Every irreversible AI action passes through HITL with a
decisionId; bypass attempts fail loudly withMELMASTOON.AI.HITL_BYPASS.
25.8 PII in logs / events
- Logs and events carry IDs, not PII (no guest names, no PANs, no key credential codes, no JWTs, no lock-pairing secrets).
- A static scanner runs on the log shipper; matches block deploys.
25.9 Untyped IDs
- All aggregate IDs are branded types (
TenantId,ReservationId,KeyCredentialId, …). - Raw
stringIDs are forbidden in domain and application layers;eslint-no-string-idrule enforces.
25.10 Money as float
- All money is
bigintmicro-units; columns suffixed_micro. - Float arithmetic on money is forbidden;
MoneyVOis the only allowed surface.
25.11 last-write-wins on monetary or inventory state
- Conflict resolution per aggregate is declared in
services/<svc>/SYNC_CONTRACT.md. - Monetary or inventory state is never LWW; deterministic policies (server-wins, operator-decides, additive-only) apply.
25.12 Direct vendor SDK imports outside the owning service
openai,@google-cloud/vertexai,anthropicare forbidden outsideai-orchestrator-service.- TTLock / Salto / Assa Abloy SDKs are forbidden outside
lock-integration-service. - Stripe / PayPal / MFS SDKs are forbidden outside
payment-gateway-service. - A static check on imports fails CI on violation.
25.13 Cross-service DB joins
- Every service owns its data. No cross-service DB reads.
- Read models are projected from events.
- Static analysis of Postgres role grants asserts no cross-schema grants exist.
25.14 ".only", ".skip", "console.log", "debugger" in committed tests
- Pre-commit hook + CI grep catches and blocks.
- A skipped test must be removed or quarantined with a tracking issue.
26. Cross-References
- Per-service
TESTING_STRATEGY.mddeep-doc lives atservices/<svc>/TESTING_STRATEGY.md— service-specific tooling, harnesses, and edge cases live there. This document is the strategy; the per-service docs are the implementation. - Every story in
07-epics-and-user-stories.mddeclaresTest types; this document defines what each type means and how it runs. - Definition of Done
standards/DEFINITION_OF_DONE.mdties test gates to merge gates; both documents stay in lockstep. - Observability spec
docs/observability/01-observability.mddefines SLOs referenced from §13 and the synthetic monitor topology referenced from §21. - AI architecture
docs/08-ai-architecture.mddefines the AI provenance contract and the orchestrator surface that the AI eval harness in §11 exercises. - Lock & Key integration
docs/09-lock-and-key-integration.mddefines the vendor adapter contract that the tests in §9 verify. - Payments
docs/10-payments-architecture.mddefines the payment adapter contract that the tests in §10 verify. - Desktop spec
docs/frontend/desktop/06-desktop-app-specification.mddefines the Electron architecture that the tests in §8 + §14.2 + §12.3 verify.