Skip to main content

11 — Testing & QA Strategy

Companion: 01 Product Overview · 02 Enterprise Architecture · 03 Microservices Catalog · 04 Event-Driven Architecture · 05 API Design · 06 Data Models · 07 Security & Tenancy · 08 AI Architecture · 09 Lock & Key Integration · 10 Payments · Frontend Web/Mobile · Frontend Desktop (Electron) · Definition of Done · 07 Epics & Stories

Stack reminder (non-negotiable). Desktop is Electron (Node 20 main + Chromium renderer + Vite + React + better-sqlite3 + ONNX Runtime Node). Cloud is Google Cloud Platform (Cloud Run, Cloud SQL, Pub/Sub, Memorystore, Cloud Storage, Secret Manager, Vertex AI, BigQuery). There is no Tauri, no AWS, no Azure. Substitutions require an explicit unanimous ADR.

This document is the canonical, implementation-grade testing strategy for Ghasi Melmastoon. Every section is enforced by CI, runbooks, or scheduled jobs. Skipping a category requires an ADR — never an opinion.


1. Goal & Quality Bar

Testing is the load-bearing structure that lets a small team ship a multi-tenant, AI-first, offline-first hotel SaaS into low-resource markets without losing reservations, double-charging cards, leaking guest data, or stranding a guest at a locked door. The quality bar is therefore not "tests pass" — it is evidence that the system behaves correctly under all the operating modes from docs/01-product-overview.md §9, across all 22 services, on every supported surface and locale.

1.1 Coverage targets (merge-blocking)

Surface / domainUnit lineUnit branchMutationIntegrationNotes
Domain aggregates (every service)≥ 95%≥ 95%≥ 75%n/a/domain/**; pure TS, no I/O
Value objects100%100%≥ 85%n/aBranded IDs, money, FX, AIProvenance, Locale
Domain services≥ 90%≥ 90%≥ 70%requiredPolicies, specifications
Application use-cases≥ 85%≥ 85%≥ 65%requiredCommands, queries, sagas
Infrastructure adapters≥ 80%≥ 75%≥ 60%requiredRepos, vendor adapters
payment-gateway-service100%100%≥ 90%required + sandboxMoney never wrong
lock-integration-service100%100%≥ 90%required + vendor-mockLocked doors are loud failures
iam-service100%100%≥ 90%required + securityAuthZ bypass = breach
Sync logic (bff-backoffice-service + per-service sync ports)100%100%≥ 90%required + chaosLost mutations = lost revenue
Frontend components≥ 80%≥ 75%n/an/aUnit + component
BFFs≥ 85%≥ 80%≥ 65%requiredAggregation correctness

1.2 Journey & contract coverage (merge-blocking)

  • Every P0 journey (J-01 … J-12, J-19, J-20 — see docs/journeys/01-core-user-journeys.md §2) has an automated end-to-end test that runs on every PR (smoke) and nightly (full happy + failure paths).
  • Every offline-capable flow has an offline E2E test exercising at least one of: 1h offline, 24h offline, conflict simulation, outbox flush ordering.
  • Every service-to-service edge in the context map (see docs/03-microservices/README.md) has a Pact contract published and verified on every provider PR.
  • Every event in the schema registry has a producer conformance test and at least one consumer projection test.
  • Every saga (booking, cancellation, check-in, check-out, date-change, listing-publish, AI-HITL) has happy-path, every-compensation, idempotent-retry, and partial-failure-injection tests.

1.3 Quality philosophy

Every production incident is a missing test. Every missing test is a missed risk conversation. Every flake left enabled is a slow-motion outage.

Four guiding stances:

  1. Tests are executable specifications — domain tests read like the ubiquitous language; AC clauses from 07-epics-and-user-stories.md map 1:1 to a Given/When/Then.
  2. Shift-left and shift-right equally — pre-merge gates catch regressions; production observability + synthetic monitors catch emergent behaviour; both feed the backlog.
  3. AI is a first-class testable surface — non-determinism does not excuse non-testing; it raises the bar (regression suites, eval harness, structured-output assertions, cost guardrails).
  4. Offline is a tier-0 invariant — any test plan that only works online is incomplete and merge-blocked.

1.4 Scope

In-scope: all 22 services in docs/03-microservices/README.md; the four user-facing surfaces (Consumer Web, Consumer Mobile, Tenant Booking Web/Mobile, Electron Desktop) and the platform Control Plane; all event producers/consumers and projections; the desktop sync engine and ONNX edge AI worker; all infra-as-code (Terraform, Helm).

Out-of-scope: third-party vendor SaaS internals (we test our contracts with them, not their systems); manufacturer hardware-level qualification of partner lock encoders.


2. Test Pyramid

The classic pyramid is insufficient for an event-driven, offline-first, AI-native system. Ghasi Melmastoon uses an extended pyramid with orthogonal axes:

┌──────────────────────────┐
│ Chaos / Replay / DR │ weekly · staging · monthly · prod
├──────────────────────────┤
│ E2E Journeys (J-01..22) │ ~250 tests · < 18 min smoke
├──────────────────────────┤
│ Contract (Pact + events) │ ~600 tests · < 3 min
├──────────────────────────┤
│ Integration (Testcont.) │ ~2,200 tests · < 12 min
├──────────────────────────┤
│ Unit (domain) │ ~14,000+ tests · < 4 min parallel
└──────────────────────────┘

Orthogonal axes (apply across multiple tiers):
┌─────────┬────────────┬──────────┬─────────┬──────┬────────┬──────────────┐
│ A11y │ Security │ AI eval │ Offline │ Perf │ Sync │ Localisation │
└─────────┴────────────┴──────────┴─────────┴──────┴────────┴──────────────┘

2.1 Volume targets

TierApprox count targetWall-clock budgetFrequency
Unit12 k – 18 k< 4 min (parallel)every commit
Integration1.5 k – 2.5 k< 12 minevery PR
Contract400 – 700< 3 minevery PR
E2E (smoke)≤ 80 critical< 18 minevery PR
E2E (full)200 – 250< 60 minnightly
Offline25 – 40< 25 minnightly
Performance (k6)30+ scenariosnightlynightly + pre-release
Chaos15+ experimentsweekly staging / monthly prodweekly / monthly
AI eval1 k+ prompts × modelson prompt change + nightlycontinuous
A11y (axe)every changed surface< 90 s per surfaceevery PR
Visual regressionevery component story + key pages< 5 minevery PR
Security (SAST/DAST/SCA/secrets)continuous< 8 min PR + nightly DASTevery PR + nightly

2.2 Why this shape

  • Domain unit tests are cheap, deterministic, and fast — they should dominate by count.
  • Integration tests prove our infra wiring (RLS, outbox, Postgres queries) and dominate by value per test but not by count (slower).
  • Contract tests are mandatory because the system is event-driven and BFF-fronted; no other layer catches the same class of regression.
  • E2E proves journeys, but is too slow and brittle to be the safety net — it is the last line of defence.
  • Orthogonal axes (A11y, AI eval, Offline) cut across all tiers because they are correctness concerns, not test categories.

3. Test Types — Definitions

TypeWhat it provesToolingBoundary
UnitA function, value object, or aggregate behaves to spec in isolation. Zero I/O, zero time-dep unless injected.Vitest (default), Jest where required by tooling/domain/**, /lib/**
ComponentA React component renders, handles input, and emits events to spec.Vitest + Testing Library (React) + jsdom/components/**
IntegrationA service module talks to real Postgres + Redis + Pub/Sub emulator and writes/reads correctly.Vitest + Testcontainers (Postgres 16, Redis 7, Pub/Sub emulator, MinIO/Cloud Storage emulator)/application/**, /infrastructure/**
ContractA consumer of an HTTP API or event will not break a producer (or vice versa).Pact (HTTP), Schema Registry conformance (events)service ⇄ service
E2E (web/mobile)A user can complete a journey through real surfaces against a deployed environment.Playwright (web), Detox (mobile), shared journey IDs J-NNfull surface
E2E (desktop, Electron)A staff member can run a journey through the Electron app, online and offline.Playwright Electron + spectron-style harnessElectron renderer + main
OfflineAn offline scenario produces correct mutations and reconciles correctly on sync.Playwright Electron + custom sync harness + ToxiproxyElectron + bff-backoffice
PerformanceThroughput and latency meet SLOs under realistic load.k6 + custom scenarios; Lighthouse for web vitals; React Native perf toolsAPI + frontends
SecurityCode, dependencies, and runtime surfaces have no known critical/high vulnerabilities.Snyk (SCA), gitleaks (secrets), CodeQL (SAST), OWASP ZAP (DAST), pen-test pre-launchevery layer
A11ySurfaces meet WCAG 2.2 AA across locales and assistive tech.axe-core (axe-playwright + axe-react), NVDA smoke (Windows for Electron)every surface
Visual regressionPixel-diff per component story and key page in LTR + RTL.Chromatic + Storybookevery component + key pages
AI evalAn AI capability meets a behaviour band (correctness, safety, format, cost).YAML test cases run by the AI eval harness against ai-orchestrator-serviceAI gateway
ChaosThe system degrades and recovers under network/pod/peripheral faults.Pumba (containers), Toxiproxy (network), custom Kubernetes/CR fault injectioninfra
MutationTests actually catch defects (kill mutants), not just exercise lines.Stryker/domain/**, /lib/**
ComplianceTax, audit-log, residency, and erasure invariants hold per jurisdiction.Custom test packs per jurisdictionpricing/billing/reporting/iam
LocalisationRTL/LTR + locale-aware formatting + numerals work end-to-end.Cross-locale test runner + axe + visual regressionevery frontend
Synthetic monitorProduction keeps being correct from outside, not just from inside.Synthetic probes (Cloud Monitoring) + dedicated test tenantproduction

4. Tooling Stack

ConcernToolWhy it is the chosen tool
TypeScript unitVitestNative ESM, Vite-aligned, fast watch, parity with desktop renderer build
Where Vitest can'tJestUsed in legacy or RN packages where ecosystem requires it; no new code added
Component / DOMTesting Library (React, React Native)Behaviour-first, no implementation-detail snapshots
Web E2EPlaywrightMulti-browser, trace + video + screenshot, native parallelism
Desktop E2EPlaywright ElectronDrives the real Electron app; can intercept main + renderer
Mobile E2EDetox (RN)Reliable on physical + emulator; gray-box harness for RN
Performancek6TypeScript-friendly, scenario-based, easy to integrate into CI
Frontend perfLighthouse CI + RUMEnforced budgets per page on tenant booking + consumer surfaces
Contract — HTTPPact (consumer + provider, broker hosted)Consumer-driven contracts; broker integrates with PR gate
Contract — eventsSchema Registry conformance + golden samplesBackward-compatibility enforced
API smokePostman / NewmanHand-curated smoke for shared APIs in staging
DASTOWASP ZAPFree, scriptable, fits CI; baselines + delta scans
SCASnykDependencies, container images
SASTCodeQLFirst-party, GitHub-hosted, broad TS coverage
SecretsgitleaksPre-commit + CI
A11yaxe-core (axe-playwright + axe-react)WCAG 2.2 AA rules; integrated into Playwright
Screen reader smokeNVDA on Windows for Electron; VoiceOver macOS smokeManual smoke per release wave
Visual regressionChromatic + StorybookRTL + LTR stories; review queue with team approval
Chaos networkPumba + ToxiproxyContainer + network-level fault injection
MutationStrykerTS-native, CI-friendly
AI eval harnessYAML cases + custom runner against ai-orchestrator-serviceProvenance-aware, cost-aware, regression-aware
Coveragec8 (V8) + Vitest reportNative, per-package thresholds

All tooling is wrapped behind workspace npm scripts (pnpm test:unit, pnpm test:integration, …) and CI workflows. Engineers do not interact with tools directly.


5. Per-Service Test Pattern

Every backend service repository follows the same test layout. Deviations require ADR.

services/<service-name>/
├── src/
│ ├── presentation/
│ ├── application/
│ ├── domain/
│ └── infrastructure/
└── test/
├── unit/ # mirrors src/, fast, no I/O
│ ├── domain/
│ └── application/
├── integration/ # Testcontainers; Postgres + Pub/Sub emulator
│ ├── outbox.spec.ts # mandatory
│ ├── inbox.spec.ts # mandatory
│ ├── tenant-isolation.spec.ts # mandatory
│ └── <use-case>.spec.ts
├── contract/ # Pact provider + consumer
│ ├── provider.spec.ts
│ └── consumer.spec.ts
├── e2e/ # Playwright API runner against staging
│ └── <journey>.spec.ts
└── fixtures/
├── builders/ # aLearner().with...().build() style
└── seeds/

5.1 What is required per service

  • Unit tests for every aggregate, every value object, every domain service.
  • Integration tests for every use-case (command/query handler) using Testcontainers for Postgres 16, Redis 7, Pub/Sub emulator (gcr.io/google.com/cloudsdktool/cloud-sdk with gcloud beta emulators pubsub start), MinIO for Cloud Storage, and Mailpit for email send capture where applicable.
  • Contract tests for every API the service consumes (Pact consumer) and every API/event the service produces (Pact provider + schema registry conformance).
  • E2E tests for every journey the service participates in, run via Playwright API runner against the deployed staging environment.
  • Coverage thresholds enforced at the package level in CI per the table in §1.1.

5.2 Mandatory test files (all services)

FilePurpose
tenant-isolation.spec.tsTwo tenants seeded with identical entity IDs; cross-tenant reads must return zero rows; direct ID access must be blocked by RLS.
outbox.spec.tsOutbox is transactional with the aggregate write; flusher publishes once; restart after kill produces no duplicates.
inbox.spec.tsConsumer dedupes by message ID; out-of-order delivery is handled per-aggregate; replays are idempotent.
idempotency.spec.tsDuplicate idempotency-key returns the original response within the TTL; mismatched payload returns conflict.

5.3 Coverage thresholds in CI

vitest.config.ts per package contains:

coverage: {
reporter: ['text', 'lcov', 'html'],
thresholds: {
'src/domain/**': { lines: 95, branches: 95, functions: 95, statements: 95 },
'src/application/**': { lines: 85, branches: 85, functions: 85, statements: 85 },
'src/infrastructure/**': { lines: 80, branches: 75, functions: 80, statements: 80 },
},
exclude: ['**/*.d.ts', '**/migrations/**', '**/openapi.json'],
}

CI fails the job on any miss. Mutation thresholds are enforced separately by Stryker.

5.4 Builders, fakes, and clocks

  • Data builders in __builders__/ (e.g., aReservation().forTenant(tA).withRooms(2).withPolicy('flex24').build()).
  • Time is injected via a Clock port; tests use FakeClock at a fixed ISO-8601 instant.
  • IDs via IdFactory; tests use SeededIdFactory for determinism.
  • Money as bigint micro-units everywhere; MoneyVO ensures no float math.
  • AI via the AIClient port; tests inject a RecordingAIClient with golden responses.

6. Multi-Tenant Isolation Tests (mandatory)

Multi-tenancy is enforced at three layers (docs/02-enterprise-architecture.md) — domain, DB (Postgres RLS), and API (request context). Every layer must be tested.

6.1 Required test set per service

TestWhat it proves
cross_tenant_read_blockedA query as Tenant A returns zero rows from Tenant B's table even when filters are absent.
cross_tenant_mutation_blockedAn update of Tenant B's row as Tenant A is rejected by RLS, never silently dropped.
direct_id_access_blockedA GET by primary ID belonging to Tenant B as Tenant A returns 404 (not 403, to avoid existence leaks).
tenant_context_requiredEvery endpoint rejects requests missing X-Tenant-Id or with a JWT tids mismatch.
bulk_admin_requires_global_scopeBulk endpoints require X-Tenant-Scope: global and platform-admin claim.
event_payload_carries_tenantidEvery event published carries tenantId matching the producing aggregate's tenant.
consumer_drops_mismatched_tenantA consumer receiving an event from another tenant context does not project.
cross_tenant_search_only_via_search_aggregationOnly search-aggregation-service may read across tenants; only over cross_tenant_searchable=true fields.
rls_bypass_attempt_loggedSetting set role or SET LOCAL row_security = off is forbidden in app code; tests assert the absence at the migration layer; if attempted at runtime, an audit alarm fires.
service_to_service_propagationService A → Service B propagates tenant_id in headers and trace context; B rejects mismatch.

6.2 Why not skip — ever

Tenant isolation is the single highest-blast-radius defect class in this product. The seven-figure-incident risk justifies the marginal CI minute. Disabling these tests requires a risk-accepted ADR co-signed by Security and the service owner.

6.3 Sample isolation test (excerpt)

describe('reservation-service · tenant isolation', () => {
it('blocks cross-tenant read with identical aggregate IDs', async () => {
const tenantA = aTenant().build();
const tenantB = aTenant().build();
const sharedId = ReservationId.generate();
await seedReservation({ id: sharedId, tenantId: tenantA.id });
await seedReservation({ id: sharedId, tenantId: tenantB.id });

const resA = await api.as(tenantA).get(`/api/v1/reservations/${sharedId}`);
expect(resA.body.tenantId).toBe(tenantA.id);

const resBAsA = await api.as(tenantA).get(`/api/v1/reservations/${sharedId}?explicitTenant=${tenantB.id}`);
expect(resBAsA.status).toBe(404); // never 403, never 200
});
});

7. Saga Tests

Sagas are the integration-correctness backbone. Each named saga in docs/04-event-driven-architecture.md has a dedicated suite.

7.1 Sagas in scope

SagaParticipantsCompensation chain
SAGA-MEL-BOOKING-CONFIRMreservation → inventory(hold) → pricing(quote) → payment-gateway(intent) → reservation(confirm) → lock-integration(issue@check-in) → notificationrelease hold · refund · revoke key
SAGA-MEL-CANCELLATIONreservation → billing(refund-eligibility) → payment-gateway(refund) → inventory(release) → lock-integration(revoke) → notificationreverse refund (rare) · re-allocate
SAGA-MEL-CHECKINreservation(check-in) → lock-integration(issue) → billing(open-folio) → notificationrevoke key on rollback
SAGA-MEL-CHECKOUTreservation(check-out) → lock-integration(revoke) → housekeeping(queue-turnover) → billing(close-folio) → notificationre-open folio · cancel turnover
SAGA-MEL-DATE-CHANGEreservation(modify) → inventory(re-hold) → pricing(re-quote) → billing(adjust) → lock-integration(update) → notificationrestore prior dates · refund delta
SAGA-MEL-LISTING-PUBLISHproperty → theme-config → pricing → search-aggregation → bff-consumer (cache invalidation)unpublish on partial failure
SAGA-MEL-AI-HITLoriginating service → ai-orchestrator(propose+provenance) → human review → originating service(commit with decisionId)discard proposal · audit

7.2 Required test cases per saga

For every saga, four classes of tests are mandatory:

  1. Happy path. Each step completes; the final state is reachable; the full chain of events is published in order; the final aggregate carries the expected snapshot.
  2. Per-step compensation. Inject a failure at step N for each N; the compensation chain runs in reverse; the system returns to a consistent state; the original idempotency key is honoured for retries.
  3. Idempotent retry. Redeliver each step's event; downstream consumers do not double-apply; outbox publishers do not double-publish.
  4. Partial failure injection. Combine two faults in different steps (e.g., payment succeeds but lock issuance flakes twice then succeeds); ensure the saga reaches eventual success or a deterministic terminal failure.

7.3 Saga test harness

A shared @ghasi/saga-harness package wraps Pub/Sub emulator + Postgres + a fake-clock + injectable per-step failure switches. A test reads:

runSaga('SAGA-MEL-BOOKING-CONFIRM', {
givenInventoryAvailable: true,
givenQuoteValid: true,
whenPaymentFailsAtStep: 'capture',
thenStateAfterCompensation: 'cancelled_by_compensation',
thenEventsPublished: [
'reservation.held.v1',
'payment.intent.created.v1',
'payment.failed.v1',
'inventory.hold.released.v1',
'reservation.confirm_failed.v1',
],
});

8. Sync & Offline Tests (Electron desktop)

Sync correctness is the second-highest-blast-radius defect class. Offline tests are first-class CI citizens, not ad-hoc.

8.1 Required scenario coverage

IDScenarioExpected behaviour
O-01Desktop offline 1h, 5 reservations modified, then reconnectAll 5 mutations reach server in original order; no duplicates; outbox empty after flush
O-02Desktop offline 24h, 50 housekeeping updates + 10 walk-ins + 3 cash-on-arrival check-ins, then reconnectAll applied; conflicts surfaced if any per per-aggregate policy; sync completes within 5 minutes on a 1 Mbps link
O-03Desktop offline 7d (max grace)App still usable for read + mutate; clear UI banner at 5d/6d/7d boundaries; on reconnect, sync proceeds in pages
O-04Conflict simulation: rate plan changed both server-side and offline desktop-sidePer-aggregate policy applied; rate-plan policy is "server wins on monetary fields, last-writer-wins on display name"; operator notified for material conflicts
O-05Conflict: same reservation date-changed by two devices offlineFirst flusher wins; second device receives 409 conflict, surfaces the diff, lets operator decide
O-06Outbox flush orderingEvents flush FIFO; a failing event blocks its dependents but not unrelated outbox queues
O-07Idempotency on retrySame outbox row resubmitted; server's inbox dedupes; only one application
O-08Encrypted SQLite integrityPower-loss mid-transaction; on restart, WAL replays cleanly; no partial commits
O-09Lock issuance offline (vendor cached creds)Credential created locally, card encoded, issuance event staged; on reconnect, central record reconciles; cache age limits respected
O-10Cash-on-arrival check-in offlineCheck-in proceeds; cash drawer reflects deposit; reservation transitions; on reconnect, all events reach server in correct order
O-11Conflict on housekeeping assignmentLWW per policy; the loser's UI updates within 60 s
O-12Sync paused under low bandwidthThrottle activates ≤ 256 kbps; large media deferred; small mutations still flush
O-13Sync recovery shows accurate progressOperator UI shows "X of Y synced", queue depth, elapsed time
O-14Operator forces re-syncForced re-sync rebuilds local state cleanly; no data loss; takes ≤ N minutes for the configured tenant size
O-15Migration mid-flightAn app update with schema migration runs cleanly; existing outbox preserved across migration; rollback supported

8.2 Tooling

  • Playwright Electron drives the real Electron app; BrowserWindow is intercepted to drive renderer interactions, app.relaunch() is exercised, webContents.session is examined for storage state.
  • Toxiproxy sits in front of the sync endpoint to inject latency, packet loss, bandwidth caps.
  • Per-test Postgres + Pub/Sub emulator via Testcontainers acts as the cloud side.
  • Custom sync harness seeds the desktop SQLite with a known state and runs assertion helpers (expectOutboxEmpty(), expectAggregateState(), expectConflictRecorded()).

8.3 SQLCipher integrity tests

  • KDF (PBKDF2-SHA512 ≥ 256k iterations) verified per release.
  • DB key derivation: OS keychain fragment + JWT-bound device fragment; tampering with either renders the DB unreadable.
  • Power-loss simulation via kill -9 on the Electron main process during a transaction; on restart, WAL replays cleanly.

8.4 Auto-update tamper tests

  • The signed update manifest is fetched, its signature verified against the embedded public key.
  • Tampered manifests are rejected (test fixture flips a byte; assert refusal).
  • Staged rollouts: a fixture pretends to be a different staging cohort; assert correct cohort assignment.

9. Lock Integration Tests

Lock integration is the most failure-mode-rich subsystem because it spans cloud, desktop, USB/serial peripherals, and remote vendor APIs.

9.1 Vendor mocks

Every vendor adapter (TTLock, Salto, Assa Abloy, generic Wiegand) ships with a vendor-mock that is contract-faithful to the real SDK. The mock supports failure-injection switches:

  • vendor down (timeout, 5xx)
  • vendor refuses (4xx)
  • vendor accepts but no-op (silent failure)
  • vendor responds slowly (latency injection)
  • vendor sends out-of-order callbacks
  • vendor returns invalid signature on callback

9.2 Generic Wiegand mock encoder

For the generic Wiegand path, a USB/serial mock encoder runs in CI as a Node process exposing /dev/ttyUSBmock. Tests:

  • Encode card with valid credential
  • Encode fails (no card present)
  • Encode fails (card error response)
  • Re-encode after failure
  • Card revoke

9.3 Required scenario coverage

IDScenarioExpected
L-01Issue mobile-key online, vendor upCredential created, invite SMS dispatched, key reachable on guest's phone
L-02Issue mobile-key with vendor downIssuance queues, retry with backoff; UI prompts fallback PIN; queue drains on vendor recovery
L-03Encode RFID card onlineCard encoded, credential persisted, audit recorded
L-04Encode RFID card offline (cached creds)Card encoded against cached creds, issuance event staged, syncs on reconnect
L-05Cached creds expired (> 24h)Offline issuance refused; clear UI prompt to come online
L-06Lost-key revokeCredential transitions to revoked; new credential issued; audit captures both
L-07Mobile-key invite delivery testSMS or WhatsApp delivered; if both fail, fallback to email; if all fail, alert
L-08Clock-skew toleranceLock device clock ±5 minutes from server; credential still valid; outside window → reject with clear error
L-09Vendor anomaly (door opened after revoke)Anomaly detected by AI; alert raised; door logged with severity
L-10Vendor secret rotationRotation runs; previous secret valid until cutover; failed rotation alerts
L-11Adapter swap (TTLock → Salto)Existing reservations re-issue keys via new vendor on next check-in; no double-credentials
L-12Concurrent issuance for same roomOptimistic lock; second wins or both safely serialised per policy
L-13Issuance during reservation modificationAtomic update of validUntil; rollback on partial failure

10. Payment Tests

Payments are the third-highest-blast-radius surface (after IAM and locks). Tests cover correctness, idempotency, PCI scope, and reconciliation.

10.1 Adapter test matrix

AdapterSandbox providerRequired scenarios
StripeStripe test modesuccess · 3DS challenge · capture · partial refund · full refund · chargeback simulation · webhook idempotency · webhook signature failure · FX-snapshot freeze
PayPalPayPal sandboxsuccess · cancel mid-flow · webhook idempotency · refund
Cash-on-arrivallocal mockdeposit at confirm · arrival capture · no-show → policy charge · partial deposit
MFS (JazzCash, Easypaisa, Fawry, M-Pesa)provider sandbox or mockinitiate · OTP confirm · callback signature · failure-rail-down

10.2 Required scenarios (cross-adapter)

IDScenarioExpected
P-01Card payment end-to-end with 3DSCapture succeeds, reservation held → confirmed, FX snapshot frozen
P-02Webhook delivered twiceOnly one state transition; idempotency-key check passes
P-03Webhook signature invalidRejected with 401; security alert fires
P-04Payment fails (declined)Reservation rolls back to held with extended TTL; UI reflects
P-05Partial refundFolio shows credit and remaining capture
P-06Full refund + lock revokeRefund succeeds, key revoked, reservation cancelled
P-07Chargeback simulationChargeback aggregate created; folio flagged; evidence upload works
P-08Cash-on-arrival happy pathReservation confirmed_pending_payment; check-in captures cash; folio reflects
P-09FX-snapshot freezeDisplay currency USD, settlement AFN; folio records both with snapshot timestamp
P-10PCI scope assertionNo PAN appears in any non-PCI service log; assertion at log-shipper level fails build if violated
P-11EOD cash drawer reconciliationExpected vs counted vs variance; variance > X% requires manager approval; events published
P-12Provider rotationSwitch tenant from Stripe-A to Stripe-B; new bookings route to B; old bookings refundable on A

10.3 PCI scope verification

A nightly job greps all non-PCI service logs in a sandbox window for PAN-shaped strings and known test card numbers. Any match fails the job and pages security.


11. AI Eval Harness

The AI eval harness is the only acceptable way to gate AI quality. It runs nightly in staging and on every change to a prompt or model in CI.

11.1 What the harness does

  • Loads YAML test cases per AI capability.
  • Runs each case against ai-orchestrator-service in the appropriate environment (staging by default; CI for prompt-changed PRs).
  • Asserts response shape (structured output), behaviour bands (safety, refusal, helpfulness), and cost guardrails.
  • Records aiProvenance for every run.
  • Compares against the latest accepted baseline; significant regression blocks merge or halts the nightly run.

11.2 YAML test case shape

capability: pricing.suggest_rate
version: v3
description: >
Suggest a daily rate for a property given recent bookings, market context,
and a known seasonality flag.
inputs:
property_id: "prop-staging-001"
date_range: "2026-06-15..2026-06-22"
baseline_rate_micro_usd: 75000000
occupancy_last_14d: 0.62
seasonality: "tourist_high"
expectations:
output_schema_valid: true # must parse against pricing.suggest_rate.v3
must_include_fields:
- "suggested_rate_micro_usd"
- "rationale"
- "expected_uplift_pct"
numeric_bounds:
suggested_rate_micro_usd:
min: 60000000
max: 200000000
expected_uplift_pct:
min: -25
max: 80
rationale_quality:
min_words: 20
must_mention_one_of: ["occupancy", "seasonality", "demand"]
safety:
must_not_contain_pii: true
must_not_contain_political: true
provenance:
must_attach: true
required_fields: ["model", "version", "promptId", "traceId"]
cost_budget_micro_usd: 2500
hitl_required: true

11.3 Required AI capability evals (each as one or more YAML files)

CapabilityFilesNotes
Pricing rate suggestionpricing.suggest_rate.*.yamlOne per market (AF, TJ, IR, PK, EG)
Demand forecastforecast.occupancy.*.yaml14d + 90d horizons
Housekeeping orderhousekeeping.order.*.yamlEdge ONNX + cloud equivalence
Anomaly detection — bookingsanomaly.booking.*.yamlFalse-positive ceiling
Anomaly detection — paymentsanomaly.payment.*.yamlFraud signals
Anomaly detection — locksanomaly.lock.*.yamlDoor-event sequences
Upsell recommendationupsell.recommend.*.yamlPer persona
Guest message draftmessage.draft.*.yamlPer locale (PS, FA, AR, EN, FR, TJK), per context
Translation hinttranslate.hint.*.yamlPS↔EN, FA↔EN, AR↔EN
Theme contrast adjusttheme.contrast.*.yamlA11y constraint

11.4 Cost guardrails

  • Each capability has a cost_budget_micro_usd per call.
  • Per-tenant daily budget enforced by the orchestrator; exceeding band degrades to lower-cost models.
  • The eval harness asserts cost stays within band; cost regression > 20% blocks merge.

11.5 Provenance metadata tests

  • Every successful AI response in the harness must carry aiProvenance = { model, version, promptId, traceId, reviewedBy?, reviewedAt?, local }.
  • Tests assert presence and well-formedness; missing fields fail the case.

11.6 HITL gate enforcement tests

  • For every capability marked hitl_required: true, a test asserts that committing the originating action without a decisionId fails with MELMASTOON.AI.HITL_BYPASS.
  • A test asserts that committing with a decisionId succeeds.
  • A test asserts that the orchestrator audit log captures the decision with {decisionId, decisionBy, timestamp}.

11.7 Edge-vs-cloud equivalence

For every capability offered both ways (e.g., housekeeping order, basic forecasting, anomaly heuristics), an equivalence test asserts the edge ONNX result is within a configured tolerance of the cloud Vertex result on a fixed corpus. Drift beyond tolerance triggers a retraining backlog item.

11.8 Regression detection

  • Each capability's nightly run produces a metric (e.g., F1 for anomaly classifiers, RMSE for forecasts, BLEU/ROUGE for messages, contrast pass/fail for theme adjustments).
  • Metrics are stored; significant regression versus rolling baseline (configurable) opens an automated issue, blocks new prompt promotion, and notifies the owning team.

12. Frontend Test Strategy

Web, mobile, and desktop frontends each have a tailored stack while sharing the design system, i18n bundle, and AI provenance contract.

12.1 Web (Next.js — Consumer + Tenant Booking)

  • Unit & component: Vitest + Testing Library (React) + jsdom. Coverage ≥ 80% lines on app/, components/, hooks/, services/.
  • E2E: Playwright. Run on every PR (smoke) and nightly (full).
  • Visual regression: Chromatic against Storybook stories; mandatory for every component story to ship LTR + RTL variants.
  • A11y: axe-playwright on every E2E and per-story axe-storybook in CI.
  • Cross-locale: Per-locale snapshots for at least Pashto, Dari, Persian, Arabic, English, Tajik on key screens.
  • Performance: Lighthouse CI per page with budgets enforced.

12.2 Mobile (React Native — Consumer + Tenant Booking + self-check-in)

  • Unit & component: Jest + Testing Library (RN). Coverage ≥ 80%.
  • E2E: Detox on emulators in CI (Android API 30+, iOS 16+).
  • Cross-locale: Same locales as web.
  • A11y: Native a11y APIs covered; manual VoiceOver/TalkBack smoke per release.
  • Performance: RN profiler smoke; cold-start budget per platform.

12.3 Desktop (Electron — backoffice)

  • Unit & component: Vitest + Testing Library (React) + jsdom for the renderer.
  • E2E: Playwright Electron driving the real packaged app.
  • Offline: dedicated Playwright Electron suites under test/e2e/offline/; see §8.
  • NVDA smoke (Windows): scripted screen-reader smoke per release wave for the front-desk core flows.
  • CSP & contextBridge: unit tests assert the window.melmastoon surface is exhaustive and that no other globals leak; CSP violation tests load malicious payloads and assert the renderer rejects them.

12.4 Visual regression (cross-surface)

  • Chromatic stories per component in LTR + RTL × at least three locales.
  • Story library shared across web and Electron renderer where components are common.
  • Manual review queue; tolerance thresholds tuned per component.

12.5 Cross-locale tests

SurfaceRequired locales tested in CI
Consumer Webps, fa, ar, en, tjk, fr
Consumer Mobileps, fa, ar, en
Tenant Booking Webps, fa, ar, en, tjk, fr
Tenant Booking Mobileps, fa, ar, en
Electron Desktopps, fa, ar, en, tjk
Notificationsper template, per locale

RTL screenshot tests are mandatory on every UI PR; the Definition of Done explicitly enforces this.


13. Performance Testing

13.1 k6 scenarios per BFF

ScenarioBFFGoalPass criteria
Search stormbff-consumer-serviceBurst-of-search load on meta layerp95 < 1500 ms · err rate < 0.5% at 100 RPS sustained
Booking burstbff-tenant-booking-serviceConcurrent booking flows during a salep95 < 1200 ms on quote · < 800 ms on confirm step (excluding 3DS) at 50 RPS
Sync floodbff-backoffice-serviceMany desktops reconnecting after a regional outagep95 sync push < 2 s at 200 concurrent desktops; conflict rate within tolerance
AI gateway stormai-orchestrator-serviceSurge in AI callsp95 < 3 s · cost within budget; circuit-breakers engage on provider 5xx surge
Inventory hot keyinventory-servicePeak demand on one room typeNo oversell · throughput within target
Webhook stormpayment-gateway-serviceProvider replays many webhooksAll processed exactly once · idempotency-key check passes

13.2 SLO targets (per-service summary; full table per service OBSERVABILITY.md)

Servicep50 latencyp95 latencyAvailabilityNotes
iam-service50 ms200 ms99.9%Authn endpoints sub-200 ms
reservation-service80 ms300 ms (read) / 600 ms (write)99.9%Saga end-to-end < 1.5 s p95
payment-gateway-service100 ms800 ms99.95%Excludes 3DS challenge; webhook p95 < 200 ms
lock-integration-service100 ms500 ms (issue)99.9%Vendor calls p95 budget per vendor SLA
bff-tenant-booking-service70 ms400 ms99.9%Booking funnel pages
bff-consumer-service70 ms400 ms99.9%Meta search pages
bff-backoffice-service80 ms400 ms (read) / 800 ms (sync)99.9%Sync push < 2 s p95
search-aggregation-service50 ms300 ms99.9%Index reads
ai-orchestrator-service200 ms3000 ms99.5%AI provider latency dominant

13.3 Frontend performance budgets

PageLCPINPCLSJS gzip
Consumer meta search list≤ 2.5 s≤ 200 ms≤ 0.1≤ 180 KB
Tenant booking landing≤ 2.0 s≤ 200 ms≤ 0.1≤ 200 KB
Tenant booking checkout≤ 2.0 s≤ 200 ms≤ 0.05≤ 220 KB
Electron desktop dashboard≤ 1.5 s (cold) · ≤ 400 ms (warm)≤ 100 ms≤ 0.05n/a

Budgets enforced by Lighthouse CI on web and the Electron renderer; regressions block merge.

13.4 Soak tests (24h)

  • Run nightly against staging.
  • 24h sustained mixed-load profile (search + booking + sync + AI + webhook); assert no memory growth, no leak in the desktop sync worker, no Pub/Sub lag.

13.5 Load profile per release wave

WaveTenantsConcurrent desktopsConcurrent booking flowsAI RPSNotes
R150200505Validate staging-scale viability
R22501,00020025AI-native operations on
R31,0005,0001,000100Globalisation stage

14. Security Testing

14.1 Categories & tooling

CategoryToolTrigger
SASTCodeQLevery PR
SCASnykevery PR + nightly
Secretsgitleakspre-commit + CI
DASTOWASP ZAPnightly against staging
Container scanTrivy + Snyk Containeron image build
Pen-testexternal firmpre-launch + per major release
Threat-model reviewmanualper ADR introducing new boundary

14.2 Electron-specific security tests

TestWhat it proves
contextBridge_surface_enumerationOnly window.melmastoon.* is exposed to the renderer; no other globals leak; ipcRenderer and require are absent
csp_violationA malicious script injection attempt is refused by the CSP; the violation is reported to the audit log
nodeIntegration_off_assertionnodeIntegration: false, contextIsolation: true, sandbox: true, webSecurity: true on every BrowserWindow
sqlcipher_kdfKey derivation uses ≥ 256k iterations of PBKDF2-SHA512; tampering with iterations fails the check
auto_update_signature_tamperA fixture flips a byte in the update manifest; the updater refuses to install
keytar_secretsRefresh token + DB key fragment never appear in plaintext on disk
deep_link_validationmelmastoon:// deep links are validated against an allowlist; arbitrary file URIs refused
serialport_access_scopedSerial/HID access is mediated through the typed bridge; renderer cannot enumerate devices

14.3 Auth & authorization tests

  • AuthN: token rotation, refresh family invalidation on reuse, MFA enrolment + enforcement, SSO callback validation, device binding cert lifecycle.
  • AuthZ: per-role matrix tested across every service (positive + negative); tenant scope enforced on every endpoint; ABAC rules tested where present.

14.4 Penetration testing

  • External pen-test pre-Wave R1 launch and every major release thereafter.
  • Scope: all surfaces (Consumer Web, Tenant Booking Web, Mobile, Electron Desktop, Control Plane), all public APIs, IAM, payment, lock, sync.
  • Findings tracked to closure; blocking severity = critical or high.

15. Compliance Testing

15.1 Tax computation matrix

For each supported jurisdiction (AF, TJ, IR, PK, EG, AE, SA, OM, BH, NP, BD, KE, …), a unit test pack asserts:

  • VAT/GST applied at correct rate.
  • Tourism levy applied where required.
  • Tax-exempt cases (diplomatic, gov-issued, tax-free zones) honoured on proof.
  • Snapshot-at-confirm: changes after confirm do not retroactively alter the folio.

15.2 Audit log integrity

  • Every state-changing event subscribes to the audit log.
  • Daily Merkle anchoring job writes a root and stores it.
  • Test asserts inclusion proofs work for entries days/weeks back.
  • Tampering with a stored entry breaks proofs.

15.3 Data residency tests

  • Cross-region writes are rejected.
  • Background jobs respect tenant region.
  • Reporting and analytics aggregations stay within region (or are regional-fanned).

15.4 GDPR-style erasure (Phase 2)

  • Erasure request flow tested end-to-end against a sandbox tenant.
  • PII fields tombstoned; legal/financial records remain (with reference IDs only).
  • Erasure certificate generated and verifiable.

16. Accessibility Testing

16.1 Target & tooling

  • Target: WCAG 2.2 AA on every user-facing surface, every locale.
  • Tooling: axe-core via axe-playwright (every E2E) and axe-storybook (every component story).
  • Manual smoke: NVDA on Windows for the Electron desktop core flows; VoiceOver on macOS for the Tenant Booking web on Safari; per release wave.

16.2 Per-surface checklist

SurfaceA11y CI checksManual smoke
Consumer Webaxe-playwright on E2E suite + axe-storybookper release wave
Tenant Booking Webaxe-playwright on E2E + axe-storybookper release wave
Consumer MobileRN a11y test props + Detox a11y assertionsper release wave
Electron Desktopaxe-playwright (renderer)NVDA per release wave
Notifications (email)template-level lint (alt text, link text, contrast)per release wave

16.3 Locale-aware a11y

  • Reading order verified in RTL locales (Pashto, Dari, Persian, Arabic).
  • Screen reader pronunciation smoke for at least Pashto + Dari content blocks (NVDA + Espeak voice).
  • Mixed-direction text bidi rendering verified.

17. Internationalization Testing

17.1 Required coverage

  • Every UI surface tested in at least Pashto + Dari + English in CI; additional locales (Persian, Arabic, French, Tajik) tested per release wave.
  • Bidi text rendering tested with mixed PS/EN strings, AR/EN strings, and bidi-isolating UI fragments.
  • Locale-aware date formats tested per calendar (Gregorian + Hijri + Solar Hijri presentational variants).
  • Number formatting tested per locale (decimal separator, grouping, percent).
  • Currency formatting tested per currency (symbol, position, grouping).
  • Numeral systems tested: Latin numerals across all locales by default; Arabic-Indic supported on display where tenant prefers; inputs always Latin numerals for safety.

17.2 RTL screenshot regression

  • Every component story has a storyName—rtl variant.
  • Chromatic regression catches mirroring defects.
  • Logical CSS only (padding-inline, margin-block, inset-inline-start); per-PR lint rejects padding-left/right and margin-left/right.

17.3 Translation completeness gate

  • A CI job parses extracted strings against per-locale bundles; missing keys for active locales emit warnings; missing keys for tenant-default locales fail the job.

18. Test Data Strategy

18.1 @ghasi/test-data synthetic tenant generator

  • A monorepo package generates realistic tenants (50+ seeded for staging) deterministically from a seed.
  • Outputs:
    • Tenant + property hierarchy (single-property, multi-property, chain operator).
    • Rooms with realistic room types and counts (10 / 30 / 80 / 200 room properties).
    • Rate plans across BAR, weekly, government, corporate, non-refundable.
    • Reservations across held / confirmed / checked_in / checked_out / cancelled / no_show with realistic distributions.
    • Folios with charges, payments, refunds matching reservations.
    • Housekeeping tasks, maintenance tickets, key credentials.
    • Multi-locale content blocks (ps, fa, ar, en, tjk).

18.2 Determinism + safety

  • Seed-based; the same seed produces the same tenant pack; safe to recreate.
  • PII-free. All names, phones, emails, IDs are synthetic; never derived from real users.
  • Sandbox prefixes (e2e-<runId>-…) for ephemeral test tenants; janitor jobs purge within 1 hour.

18.3 Where to use which dataset

EnvironmentDataset
Local devsmall synthetic pack (3 tenants, 30 rooms total)
CI ephemeralper-PR pack created on demand
Staging50+ tenants seeded on rebuild; refreshed weekly
Pre-productionmirror of staging plus a staged-canary tenant
Productionno synthetic tenants except dedicated synthetic-monitor sandbox tenant (heavily isolated)

18.4 Data refresh policy

  • Staging refresh weekly, with the option to restore on demand.
  • Per-PR ephemeral environments seeded fresh on environment provisioning.
  • Production never receives synthetic data outside the sandbox tenant.

19. CI/CD Integration

19.1 Per-service pipeline

lint → typecheck → unit (with coverage) → integration (Testcontainers) → contract (Pact) →
coverage gate → mutation (changed files) → image build → vulnerability scan

Failures in any step block merge. Coverage gate is per-package, per-layer thresholds (see §1.1).

19.2 Per-frontend pipeline

lint → typecheck → unit → component → axe-storybook → visual (Chromatic) → e2e (smoke) →
RUM bundle size assertion

Frontend E2E runs against an ephemeral preview environment provisioned by Cloud Run + per-PR namespacing.

19.3 PR-touched service deeper run

  • A PR changing service X triggers:
    • Full integration suite for X.
    • Pact verification for X as provider against current consumer contracts.
    • Pact publication for X as consumer.
    • E2E smoke covering journeys X participates in.
    • Mutation testing on changed /domain/** files.

19.4 Nightly pipeline

full E2E (web + mobile + desktop, including offline) →
soak (24h k6 mixed load) →
AI eval suite (against staging ai-orchestrator) →
DAST scan (OWASP ZAP) →
SCA full + container vuln scan →
Chromatic full library →
synthetic monitor results aggregation

A failure rolls up to an auto-issue in the on-call queue.

19.5 Release pipeline

  • Image promotion staging → pre-prod → prod.
  • Canary 5% for 30 minutes; auto-rollback on SLO regression.
  • Manual approval gate for prod beyond canary (until automation matures).

19.6 Branch strategy & test gates

BranchRequired gates
Feature branchesunit + component + lint + typecheck (locally + on push)
PR openfull per-service + per-frontend pipeline above
main mergednightly suites + image promotion to staging
release/*canary rollout + smoke + release-wave gate

20. Test Environments

EnvPurposeBacked byDataNotes
localEngineer development & quick iterationDocker Compose with Postgres, Redis, Pub/Sub emulator, MinIO; or Cloud Codesmall syntheticEngineers run unit + integration locally; opt-in to E2E
CI ephemeralPer-PR isolated test environmentCloud Run namespaced + Cloud SQL temporary instancesper-PR seedLifecycle managed by CI; teardown after merge or 24h idle
stagingFull production-like, sharedGCP project: ghasi-stagingseeded synthetic, refreshed weeklyWhere E2E full + soak + AI eval + DAST run nightly
pre-productionCanary + final QAGCP project: ghasi-preprodmirror of staging + canary tenantUsed for release validation
productionLiveGCP project: ghasi-prodreal tenantsSynthetic monitors run; no destructive tests

20.1 Environment safety

  • Every non-prod environment has a banner on every surface ("STAGING" / "CI" / "PREPROD").
  • Environments are siloed: secrets, DBs, buckets are namespaced.
  • Production is forbidden to be used for any test other than synthetic monitors.

20.2 Test data isolation

  • Each test creates and cleans up its own data; no shared state across tests.
  • Sandbox tenants prefixed e2e-<runId>- are purged hourly by a janitor job.

21. Synthetic Monitoring

Synthetic monitoring proves that production keeps working from outside, not just inside.

21.1 Probes

ProbeFrequencyWhat it tests
Uptimeevery 1 min from 5 regionsPublic health endpoints respond 200
Booking-flow synthetic (web)every 5 minSandbox tenant; full booking flow with sandbox payment
Booking-flow synthetic (mobile)every 30 minRN test rig executes booking against sandbox
Sync API syntheticevery 5 minRun a fake desktop client through pull + push round-trip
AI gateway syntheticevery 10 minSend a known prompt; assert structured output, latency, cost
Lock vendor syntheticevery 15 minIssue + revoke a credential against a vendor sandbox
Payment webhook syntheticevery 15 minReplay a known webhook into the staging endpoint; assert state

21.2 Alerting & routing

  • All synthetic probe failures route to PagerDuty with a 2-cycle threshold (avoid single-flake noise).
  • The on-call has a runbook linked from every alert.
  • Synthetics run against a dedicated synthetic-monitor sandbox tenant that is isolated from real tenants.

22. Release Quality Gates

Each release wave (R1, R2, R3) has hard quality gates. Missing any gate blocks promotion.

22.1 Per-wave gate matrix

GateR1R2R3
Coverage thresholds met
All P0 E2E green for 7 nights
Offline E2E (1h, 24h) green
Performance baseline within budget
Pen-test pass (no critical/high)
Multi-tenant isolation tests green on every service
Saga compensation tests green
AI eval baseline within tolerancen/a (R1 limited AI)
Accessibility audit complete (axe + NVDA + manual)
Localisation test complete for active locales
24h soak test passes
Backup/restore drill within last 90d
Lock vendor adapter tests pass for in-scope vendorsTTLock + Wiegand+ Salto+ Assa Abloy
Payment adapter tests pass for in-scope railsStripe + cash + PayPal+ MFS+ region-specific rails
Multi-region failover drilln/an/a

22.2 Sign-off

  • Engineering Manager + Tech Lead + Security + SRE sign off per release wave.
  • Sign-off references the gate matrix; missing checks block release.

23. Bug Triage & SLAs

23.1 Severity definitions

SeverityDefinitionExamples
S0 — CriticalService down, data loss, security breach, charge-without-reservation, locked-out guest at door, multi-tenant data leakglobal IAM outage, payment double-charge, key revoke fails for revoked credential
S1 — HighMajor journey broken for many tenants; significant degradationbooking confirm fails for tenants in region X, sync conflict storm
S2 — MediumLimited journey impact; workaround existsUI bug on a single page, AI suggestion incorrect for rare case
S3 — LowCosmetic, minor inconvenience, internal-onlytypo, low-impact log noise

23.2 SLA targets

SeverityResponse timeResolution target
S015 minutes4 hours
S11 hour24 hours
S21 business day1 week
S31 weekscheduled

23.3 Blocker policy

  • S0 blocks any release; promotion paused until resolved or risk-accepted (only by Eng Manager + Security).
  • S1 blocks release-wave promotion.
  • S2 are tracked into the next sprint; can ship if not regression-causing.
  • S3 are tracked into the backlog.

23.4 Regression policy

  • Any bug traced to a regression in the last 30 days requires a regression test in the fixing PR.
  • Regression count per release wave is a tracked metric; targets in OBSERVABILITY.md.

24. Test Ownership

Test ownership maps to responsibility, not bureaucracy.

Surface / assetOwnerExamples
Service unit + integration + contract testsService teamiam-service team owns iam unit, integration, Pact tests
Multi-tenant isolation tests per serviceService teamenforced by CI gate; service owner can't merge without
Saga testsOwning saga's primary service team (reservation-service for booking saga)reservation-service team owns booking-saga harness
Sync engine testsbff-backoffice-service team + Platformshared ownership; service teams contribute aggregate-specific cases
AI eval harnessPlatform AI teamowns the harness; service teams own per-capability YAML cases
PerformancePlatform / SREk6 scenarios + budgets; service teams contribute SLO refinements
Security (SAST/DAST/SCA/secrets)Security teamtooling + policies + pen-test coordination
E2E + journey testsProduct / QA + service teamsjourney ownership is split by primary persona
AccessibilityFrontend Platform + service teamstooling + per-component a11y
LocalisationFrontend Platform + content teambundle correctness + per-locale review
Synthetic monitorsSREoperates the synthetic sandbox tenant; routes alerts
Test data generator (@ghasi/test-data)Platformmaintains generators + seeded packs

24.1 RACI summary

AssetRACI
Service unit/integration testsService teamTech LeadQAEng Manager
Pact provider/consumerService teamTech Leadadjacent service teamsArchitects
Saga testsSaga primary service teamTech Leadparticipating teamsArchitects
AI eval YAMLsService teamAI Platform LeadAI capability ownerCompliance
PerformanceSRESRE LeadService teamsEng Manager
Security scansSecuritySecurity LeadService teamsEng Manager + Compliance
E2E & journeysQA + ProductProduct LeadService teamsEng Manager
A11yFrontend PlatformFrontend LeadService teamsCompliance
Synthetic monitorsSRESRE LeadService teamsEng Manager

25. Anti-Patterns

The following are forbidden in this codebase. They appear once because they are common, real, and damaging in production hotel platforms.

25.1 Flaky tests left enabled

  • A flaky test is a bug. Quarantine it within 24 h with a linked issue; fix or delete within one sprint.
  • Quarantined tests run in a separate non-blocking suite; persistent quarantine triggers a review.
  • "Sometimes-passes" is not green. CI treats flakes as failures.

25.2 A single shared dataset across tests

  • Shared mutable state is a coupling that hides bugs and makes failures non-reproducible.
  • Every test creates and cleans up its own data; per-test transactional rollback is the default for SQL integration tests.
  • Sandbox tenants are scoped per test run with a janitor purging within 1 hour.

25.3 Mocking your own production code

  • Test doubles are for boundaries (vendor SDKs, network, time, randomness).
  • Do not mock your own application services or domain code; that tests the mock, not the system.
  • Pact and Testcontainers cover the boundaries you might otherwise be tempted to mock.

25.4 Testing UI implementation details

  • React Testing Library queries by accessibility role and label, not by class name or component internals.
  • Snapshot tests are forbidden for non-trivial UIs (they go stale and get rubber-stamped).
  • Visual regression covers what snapshots pretend to.

25.5 Skipping tenant-isolation tests because "it works"

  • "It works in staging" is not evidence; the next migration may flip a flag.
  • Tenant isolation tests are mandatory on every service that touches tenant data; CI fails the job if missing.
  • A multi-tenant data leak is a company-ending incident in this market segment.

25.6 Skipping offline tests because "the cloud will be up"

  • The cloud will not be up. The thesis of the product is that the operator continues regardless.
  • Offline tests are mandatory for the Electron desktop on every offline-capable flow.
  • A regression that breaks offline behaviour is treated as S0 because it kills the product's defining promise.

25.7 AI without provenance

  • Every AI artifact carries aiProvenance; missing-provenance commits fail CI via a static check on the AI client surface.
  • Every irreversible AI action passes through HITL with a decisionId; bypass attempts fail loudly with MELMASTOON.AI.HITL_BYPASS.

25.8 PII in logs / events

  • Logs and events carry IDs, not PII (no guest names, no PANs, no key credential codes, no JWTs, no lock-pairing secrets).
  • A static scanner runs on the log shipper; matches block deploys.

25.9 Untyped IDs

  • All aggregate IDs are branded types (TenantId, ReservationId, KeyCredentialId, …).
  • Raw string IDs are forbidden in domain and application layers; eslint-no-string-id rule enforces.

25.10 Money as float

  • All money is bigint micro-units; columns suffixed _micro.
  • Float arithmetic on money is forbidden; MoneyVO is the only allowed surface.

25.11 last-write-wins on monetary or inventory state

  • Conflict resolution per aggregate is declared in services/<svc>/SYNC_CONTRACT.md.
  • Monetary or inventory state is never LWW; deterministic policies (server-wins, operator-decides, additive-only) apply.

25.12 Direct vendor SDK imports outside the owning service

  • openai, @google-cloud/vertexai, anthropic are forbidden outside ai-orchestrator-service.
  • TTLock / Salto / Assa Abloy SDKs are forbidden outside lock-integration-service.
  • Stripe / PayPal / MFS SDKs are forbidden outside payment-gateway-service.
  • A static check on imports fails CI on violation.

25.13 Cross-service DB joins

  • Every service owns its data. No cross-service DB reads.
  • Read models are projected from events.
  • Static analysis of Postgres role grants asserts no cross-schema grants exist.

25.14 ".only", ".skip", "console.log", "debugger" in committed tests

  • Pre-commit hook + CI grep catches and blocks.
  • A skipped test must be removed or quarantined with a tracking issue.

26. Cross-References