Skip to main content

02 — Enterprise Architecture

Companion: 01 Product Overview · 03 Microservices Catalog · 04 Event-Driven Architecture · 05 API Design · 07 Security & Tenancy · 08 AI Architecture · 09 Lock & Key Integration · 12 Desktop Spec · ADR-0001 Core Stack · ADR-0002 Multi-Tenancy · ADR-0003 Electron Offline-First · ADR-0004 Lock Abstraction

This document is the canonical enterprise-architecture view of Ghasi Melmastoon. It defines business architecture, application layering, bounded contexts, microservice topology, BFF strategy, multi-tenancy, the event backbone, the offline-sync model, AI placement, GCP reference topology, security posture, resilience and cost posture, compliance, and the anti-patterns we explicitly forbid. All downstream documents (per-service bundles, ADRs, frontend, journeys, testing) defer to this document for strategic architecture statements.


1. Business Architecture

1.1 Stakeholders & Personas

SidePersonaPrimary Goals
GuestIndependent TravelerFind a room near a place, in budget, on dates; book without leaving the platform; pay how the local market allows
GuestBusiness TravelerBook repeatedly with the same tenant; folio receipt; smooth check-in even on poor network
TenantHotel OwnerOnboard property fast; control branding; see revenue truthfully across channels and currencies
TenantGeneral ManagerDaily KPIs; AI insights; staff productivity; intervention on at-risk reservations
TenantFront DeskCheck-in / out; walk-in booking; key issuance; folio updates even when offline
TenantHousekeepingReceive room assignments; flip status; raise maintenance tickets
TenantMaintenanceTriage tickets; close work orders with parts/labor
TenantFinance / AdminReconcile cash + card + MFS; invoices; tax exports; payouts
TenantChain OperatorCross-property views; central rate strategy; consolidated reporting
InternalPlatform AdminTenant lifecycle; billing reconciliation; abuse handling
InternalCompliance OfficerAudit logs; AI provenance review; data subject requests; lock-credential isolation audits
PartnerLock VendorPluggable adapter implementing the LockPort; no platform code changes per vendor
PartnerPayment ProviderPluggable adapter via payment-gateway-service; cash, card, MFS, PayPal, VISA

1.2 Value Streams

The platform is organized around four value streams. Each is a sequence of capabilities; each capability is owned by one or two bounded contexts.

  1. Discover → Book (Guest stream) — meta-search, filter, compare, map, select, render tenant theme, choose room/rate, capture details, pay (PayPal / Visa / Debit / cash-on-arrival / MFS), confirm.
  2. Operate → Sync → Optimize (Backoffice stream) — open shift, work the day from the Electron desktop app (offline if needed), check guests in, dispatch housekeeping, reconcile folios, sync deltas back to the cloud, surface AI insights for the next shift.
  3. Configure → Theme → Launch (Tenant onboarding stream) — create tenant, attach property, define rooms / rate plans / policies, customize theme tokens, choose layout presets, content blocks, locales (RTL/LTR), payment + lock vendors, go live.
  4. Audit → Comply (Platform stream) — capture immutable audit log, surface AI provenance, run data-residency proofs, fulfill GDPR DSARs, isolate lock credentials, isolate PCI scope, anchor tamper evidence daily.

1.3 Capability Map (top-level)

┌───────────────────────────────────────────────────────────────────────────────┐
│ Identity & Access │ Tenant & Property Mgmt │ Catalog & Discovery │ Pricing │
│ Reservations │ Inventory & Availability│ Housekeeping │ Mainten. │
│ Folio & Billing │ Payments (multi-rail) │ Lock & Key │ Comms │
│ Theming & Content │ Search Aggregation │ AI Orchestration │ Sync │
│ Reporting & BI │ File / Media │ Notifications │ Audit │
└───────────────────────────────────────────────────────────────────────────────┘

Every capability maps to one owning context (no shared aggregates) and is delivered by one owning microservice (no shared schemas). Cross-capability work happens through events, never through cross-service DB joins.


2. Application Architecture (Clean / Hexagonal)

Every backend microservice follows the same Clean / Hexagonal layering. Frontend apps follow the analogous mirror.

┌──────────────────────────── Presentation ─────────────────────────────────┐
│ NestJS Controllers · WebSocket / SSE handlers · Sync HTTP handlers │
│ Problem+JSON error shaping · OpenAPI emission │
└──────────────┬────────────────────────────────────────────────────────────┘

┌──────────────▼ Application ───────────────────────────────────────────────┐
│ Use-Cases · Command / Query handlers · DTOs · Mappers │
│ Ports: AIClient · EventPublisher · Clock · IdGenerator · LockPort │
│ PaymentPort · NotificationPort · SyncClient · IdentityResolver │
└──────────────┬────────────────────────────────────────────────────────────┘

┌──────────────▼ Domain (pure TypeScript) ──────────────────────────────────┐
│ Aggregates · Entities · Value Objects · Domain Events · Domain Services │
│ Invariants enforced here; zero framework imports; zero I/O │
└──────────────┬────────────────────────────────────────────────────────────┘

┌──────────────▼ Infrastructure ────────────────────────────────────────────┐
│ Postgres repos (Drizzle / pg) · Pub/Sub adapter · Cloud Storage adapter │
│ Redis (Memorystore) cache · Outbox publisher · Saga engine │
│ Vendor adapters: TTLock · Salto · Assa Abloy · Generic Wiegand │
│ Vendor adapters: PayPal · Stripe · MFS · Cash · Vertex AI · ONNX Runtime │
└───────────────────────────────────────────────────────────────────────────┘

Strict dependency rule — outer layers depend inward only:

  • Presentation → Application
  • Application → Domain
  • Infrastructure → Application (for ports) and Domain (for types only)
  • Domain depends on nothing.

This is enforced in CI by dependency-graph analysis. Any import of NestJS, Pub/Sub client, pg, axios, fetch, process.env, or vendor SDKs from inside /domain fails the build.

Per-service module structure (every service identical):

services/<service-name>/
├── src/
│ ├── presentation/ # controllers, ws, sse, health
│ ├── application/
│ │ ├── ports/ # interfaces only
│ │ ├── use-cases/
│ │ └── dto/
│ ├── domain/ # pure TS — no framework imports
│ │ ├── aggregates/
│ │ ├── entities/
│ │ ├── value-objects/
│ │ ├── events/
│ │ └── services/
│ └── infrastructure/
│ ├── adapters/ # vendor SDKs live here, only here
│ ├── repositories/
│ ├── outbox/
│ ├── pubsub/
│ ├── http-clients/
│ └── config/
├── test/ # unit (domain), integration (testcontainers), contract (pact)
├── openapi.json # emitted from controllers
└── module.ts # NestJS DI wiring (composition root)

The frontend mirror (Next.js, React Native, Electron renderer):

app/ → presentation (routes, components)
hooks/ + services/ → application (use-cases, queries, mutations)
lib/domain/ → pure TS domain models — no React, no fetch
lib/adapters/ → HTTP clients, IndexedDB / SQLite, sync, AI client

3. Bounded Contexts

Ghasi Melmastoon is decomposed into 20 bounded contexts grouped by domain class. The full DDD treatment (aggregates, invariants, ubiquitous language, conflict policies) lives in the per-service deep docs.

3.1 Context List

#ContextOne-line purpose
1Identity & AccessAuthenticate users; issue + rotate JWTs; SSO (OIDC/SAML); WebAuthn; device binding
2TenantTenant lifecycle, plan, settings, RBAC roles, memberships, data residency pin
3PropertyProperties, room types, rooms, amenities, geo, photos, policies
4ReservationThe authoritative booking aggregate (state machine, holds, modifications, cancellations)
5InventoryPer-room-per-night availability and allocation; oversell guards
6PricingRate plans, BAR / weekly / corporate, dynamic price hints, currency conversion
7HousekeepingTasks, assignments, room status transitions, SLAs
8MaintenanceWork orders, tickets, parts/labor, escalation
9Folio & BillingFolio ledger per reservation; charges, taxes, refunds; immutable financial entries
10PaymentsMulti-rail payment orchestration: PayPal, Visa/Debit, cash-on-arrival, MFS
11Lock & KeyVendor-agnostic key lifecycle (issue / update / revoke / suspend) over LockPort
12Notification & CommunicationEmail / SMS / push / WhatsApp via pluggable adapters; preference management
13Theme & ContentTenant branding tokens, layout presets, content blocks, RTL/LTR defaults
14Search & DiscoveryCross-tenant meta-search index (the only legitimate cross-tenant read path)
15AI ServicesThe single AI gateway: cloud (Vertex AI) + edge (ONNX Runtime); prompts; provenance; HITL
16File & MediaImage upload, optimization, signed URLs, virus scan, invoice PDFs
17ReportingTemplated reports, exports, scheduled deliveries
18AnalyticsAggregations, dashboards, occupancy, revenue, channel mix, forecasts
19Audit & ComplianceImmutable append-only log; daily Merkle anchoring; DSAR fulfilment orchestration
20Offline SyncThe single sync protocol consumed by the Electron desktop app

Domain class: Core (Reservation, Inventory, Pricing, Lock & Key, AI Services, Offline Sync, Search), Supporting (Property, Housekeeping, Maintenance, Folio & Billing, Theme & Content, Reporting, Analytics), Generic (Identity, Tenant, Payments, Notification, File & Media, Audit).

3.2 Context Map (relationships)

Every cross-context relationship is one of: SK (Shared Kernel — value objects only), CS (Customer/Supplier), CF (Conformist), ACL (Anti-Corruption Layer), OHS (Open Host Service), PL (Published Language), PA (Partnership). No "direct dependency" outside this taxonomy.

┌──────────────┐
│ Tenant │ SK: TenantId, OrgUnitId
└──────┬───────┘

┌───────────────┐ │ ┌──────────────────┐
│ Identity │─── CF ──┼── CF ───│ Notification │
│ (JWT, SSO) │ │ └──────────────────┘
└───────┬───────┘ │
│ │
┌──────┴───────┐ ┌────▼────────┐ ┌──────────────────┐
│ Property │ │ Theme & │ │ File & Media │
│ (CS → Inv) │ │ Content │ │ (CS via PL) │
└──────┬───────┘ │ (PL) │ └──────────────────┘
│ └────┬────────┘
┌──────▼────────┐ │
│ Inventory │◄───────┴── consumed by Tenant Booking + Search
│ (CS → Resv) │
└──────┬────────┘

┌──────▼────────┐ ┌──────────────┐ ┌──────────────────┐
│ Reservation │◄───│ Pricing │ │ AI Services │
│ (Core + │ CS │ (CS → Resv) │ │ (OHS + PL via │
│ saga hub) │ └──────────────┘ │ AIProvenance) │
└──┬─────┬───┬──┘ └──────────────────┘
│ │ │ ▲
┌───────▼┐ ┌─▼─────────┐ ┌──────────────┐ │
│ Folio │ │ Lock & │ │ Housekeeping │ ── OHS ───┘
│ (CS, │ │ Key │ │ + Maintenance│
│ ACL) │ │ (OHS, PL) │ └──────┬───────┘
└────┬───┘ └─────┬─────┘ │
│ │ │
┌────▼─────┐ ┌───▼────────────┐ │
│ Payments │ │ Vendor adapters│ │
│ (ACL) │ │ TTLock / Salto │ │
└──────────┘ │ AssaAbloy / │ │
│ Generic │ │
└────────────────┘ │

┌──────────────────┐ ┌──────────▼────────────┐
│ Search & Disco │◄──┤ Reservation + │ projection only
│ (cross-tenant │CS │ Inventory + Pricing │
│ read model) │ └───────────────────────┘
└──────────────────┘

┌──────────────────┐ ┌──────────▼────────────┐
│ Reporting + │◄──┤ All write contexts │ projection only
│ Analytics (CS) │ └───────────────────────┘
└──────────────────┘

┌──────────────────┐ │
│ Offline Sync │── OHS + CF ───┘ one protocol, every replicable context conforms
│ (cross-cutting) │
└──────────────────┘

┌──────────────────┐ │
│ Audit & │── append-only consumer of every domain event
│ Compliance │
└──────────────────┘

Key relationship notes:

  • External lock vendors → Lock & Key is an ACL: vendor SDK types never leak past the adapter. Internal shape is a single KeyCredential value object regardless of vendor.
  • External payment processors → Payments is an ACL for the same reason.
  • Vertex AI / ONNX Runtime → AI Services is an ACL; downstream services see only AICompletion, Embedding, AIArtifact, and AIProvenance.
  • Search & Discovery is the only context that legitimately reads across tenants, and it does so via projected read models — never through cross-tenant joins.

4. Microservices Topology

22 microservices, each owning one bounded context (or a slice of one). All NestJS, all TypeScript, all on Cloud Run. Grouped here by capability cluster.

4.1 Identity & Tenancy

ServiceOwnsConsumesProduces
iam-serviceUsers, sessions, credentials, JWT/JWKS, MFA, WebAuthn, device bindingstenant.created.v1iam.user.created.v1, iam.session.issued.v1, iam.device.bound.v1
tenant-serviceTenant, plan, settings, OrgUnit, Role, Membership, residency piniam.user.created.v1tenant.created.v1, tenant.role.granted.v1, tenant.settings.changed.v1

4.2 Property

ServiceOwnsConsumesProduces
property-serviceProperty, RoomType, Room, Amenity, geo, photos, policiestenant.created.v1property.created.v1, property.room.added.v1, property.room.archived.v1

4.3 Reservations & Inventory

ServiceOwnsConsumesProduces
reservation-serviceReservation aggregate, holds, state machine, modificationsinventory.allocated.v1, payment.captured.v1, lock.key.issued.v1reservation.held.v1, reservation.confirmed.v1, reservation.cancelled.v1, reservation.checkout.v1, reservation.dates_changed.v1
inventory-servicePer-room-per-night availability; allocation ledgerproperty.room.added.v1, reservation.confirmed.v1, reservation.cancelled.v1inventory.allocated.v1, inventory.released.v1, inventory.oversell_blocked.v1

4.4 Pricing

ServiceOwnsConsumesProduces
pricing-serviceRate plans, calendars, currency conversion, dynamic price hintsproperty.created.v1, ai.pricing_hint.completed.v1pricing.rate_plan.changed.v1, pricing.calendar.updated.v1

4.5 Operations

ServiceOwnsConsumesProduces
housekeeping-serviceTasks, assignments, room-status flow, SLAsreservation.checkout.v1, property.room.added.v1housekeeping.task.assigned.v1, housekeeping.room.status_changed.v1
maintenance-serviceWork orders, parts, labor, escalationhousekeeping.task.flagged.v1maintenance.ticket.opened.v1, maintenance.ticket.closed.v1

4.6 Finance

ServiceOwnsConsumesProduces
billing-serviceFolio, charges, taxes, refunds, invoicesreservation.confirmed.v1, payment.captured.v1, payment.refunded.v1folio.charge.posted.v1, folio.refund.posted.v1, invoice.issued.v1
payment-gateway-servicePayment intents; PayPal / Visa / cash-on-arrival / MFS adaptersfolio.charge.posted.v1payment.intent.created.v1, payment.captured.v1, payment.refunded.v1, payment.failed.v1

4.7 Communication

ServiceOwnsConsumesProduces
notification-serviceTemplates, preferences, multi-channel send (email / SMS / push / WhatsApp)reservation.confirmed.v1, payment.failed.v1, lock.key.issued.v1notification.delivered.v1, notification.bounced.v1

4.8 AI

ServiceOwnsConsumesProduces
ai-orchestrator-serviceThe single AI gateway: completion, embedding, vision, moderation, TTS; cloud (Vertex AI) + edge dispatch; provenance; HITL records(called via port AIClient by every service)ai.gateway.call.completed.v1, ai.pricing_hint.completed.v1, ai.anomaly.detected.v1, ai.inference.local.completed.v1

4.9 Theming

ServiceOwnsConsumesProduces
theme-config-serviceBranding tokens, layout presets, content blocks, locale defaults, RTL/LTRtenant.created.v1, tenant.settings.changed.v1theme.preset.published.v1, theme.tokens.changed.v1

4.10 Lock

ServiceOwnsConsumesProduces
lock-integration-serviceThe LockPort; vendor adapters (TTLock, Salto, Assa Abloy, Generic Wiegand); key lifecycle sagareservation.confirmed.v1, reservation.cancelled.v1, reservation.checkout.v1, reservation.dates_changed.v1lock.key.issued.v1, lock.key.updated.v1, lock.key.revoked.v1, lock.key.suspended.v1, lock.vendor.error.v1
ServiceOwnsConsumesProduces
search-aggregation-serviceCross-tenant searchable read model; geo + availability + price; pgvector for semanticproperty.created.v1, inventory.allocated.v1, pricing.calendar.updated.v1search.index.refreshed.v1

4.12 Cross-Cutting

ServiceOwnsConsumesProduces
file-storage-serviceUpload, optimization, signed URLs, virus scan, PDF render(called via port)file.uploaded.v1, file.scanned.v1
reporting-serviceTemplated reports, scheduled exports(consumes everything via projections)report.generated.v1
analytics-serviceAggregations, dashboards, occupancy, revenue, channel mix(consumes everything via projections)analytics.snapshot.computed.v1
audit-serviceImmutable append-only audit log; Merkle daily anchor; DSAR orchestration(consumes every *.v1)audit.dsar.fulfilled.v1, audit.merkle.anchored.v1
sync-serviceThe single `/sync/v1/pullpush` protocol; conflict resolution; outbox cursors(consumes every replicable event)

4.13 BFFs (Backends-for-Frontend)

ServiceOwnsConsumesProduces
bff-consumer-serviceMeta layer aggregation; map-friendly bounding-box queries; cross-tenant pricing rollupssearch.index.refreshed.v1 (read)none (read-only)
bff-tenant-booking-servicePer-tenant booking funnel: room/rate/dates/guests/payment; theme injectiontheme.tokens.changed.v1 (read)bff.booking.intent.captured.v1
bff-backoffice-serviceElectron desktop session shape; sync façade; AI surface; dashboard compositionsync.cursor.advanced.v1 (read)bff.backoffice.session.opened.v1

Topology total: 2 + 1 + 2 + 1 + 2 + 2 + 1 + 1 + 1 + 1 + 1 + 5 + 3 = 22 services.


5. Three BFFs

We deliberately ship three Backends-for-Frontend rather than one. Each surface has different audience semantics, different latency budgets, different security posture, and different data shapes; collapsing them into one BFF either over-fetches or under-protects.

BFFAudienceAuth surfaceHot read shapeWhy it cannot be merged
bff-consumer-serviceAnonymous + low-trust guests on web/mobileAnonymous + soft sessionCross-tenant geo-bounded availability + min/max price + amenity facetsCross-tenant by design; aggressive cache; never sees a tenant JWT; never touches lock/payment internals
bff-tenant-booking-serviceGuest scoped to one tenant; in-funnelAnonymous + booking session token; payment-scoped JWT at checkoutPer-tenant theme + room/rate/availability + funnel stateTenant-scoped cache; payment redirect orchestration; theme hot-injection; one tenant per request
bff-backoffice-serviceAuthenticated staff; long-lived; offline-firstTenant JWT + device binding + role+attribute checksPer-staff dashboard; sync façade; AI surface; lock controlStrong auth; full RBAC/ABAC; sync protocol surface; never exposed to anonymous traffic; talks to lock-integration-service

A single BFF would either (a) leak privileged shapes to anonymous consumers, (b) bloat anonymous responses with staff-only fields, or (c) force one cache policy on three incompatible workloads. Three BFFs let each one be tuned, rate-limited, and scaled independently on Cloud Run, and let edge security policy at Kong/Cloud Armor differ per surface.


6. Multi-Tenancy Architecture

Multi-tenancy is enforced at four layers; every layer is a defense in depth.

LayerMechanismFailure mode if breached
API edgeJWT carries tenant_id; gateway sets RequestContext.tenant; deny if absent on tenant-scoped routesRequest rejected at gateway
ApplicationUse-cases require TenantId parameter; cross-tenant references rejected at constructorDomainError.CrossTenant thrown before persistence
DomainTenantId is a value object on every aggregate root; aggregates refuse construction without itDomain layer refuses to materialize
Databasetenant_id column + Postgres Row-Level Security policies; PgBouncer-init sets app.tenant_id; service accounts cannot bypass except for declared ops jobsEven a bug in app code cannot leak rows

6.1 Hybrid Tenancy Model (canonical — see ADR-0002)

ServiceModelWhy
Most services (20 of 22)Shared schema + tenant_id + RLSOperationally sane at 1000+ tenants; cheap; standard tooling
billing-serviceSchema-per-tenantEasier financial audit; clean tenant export/delete on offboarding; per-tenant retention pins
payment-gateway-serviceSchema-per-tenantPCI scope minimization per tenant; vendor token isolation; simpler regulatory per-tenant proofs

Cross-tenant queries go only through search-aggregation-service, which holds projected read models (no PII, no payment details). No other service has cross-tenant read paths.

6.2 Tenant Context Propagation

JWT ──► Kong (validates signature, copies `tenant_id` into X-Tenant-Id)
──► NestJS RequestContext middleware (binds AsyncLocalStorage)
──► Use-case receives TenantId param explicitly (not implicit)
──► Repository sets `app.tenant_id` on Postgres connection (via PgBouncer init or session_authorization)
──► RLS policies filter every row
──► Outbox writer validates payload `tenantId` matches request context before commit

6.3 Cross-Tenant Safe Paths

The only legitimate cross-tenant reads are:

  1. search-aggregation-service — projects from property-service, inventory-service, pricing-service events; serves the consumer meta layer.
  2. Platform-admin reports under explicit elevation (audit logged, one-time tokens).
  3. audit-service — append-only by design; reads gated by compliance role and DSAR scope.

Everything else — including chain operators — uses tenant-scoped APIs and joins client-side under explicit per-tenant tokens.


7. Event-Driven Backbone

7.1 Backbone

GCP Pub/Sub is the messaging substrate. Subject pattern: {service}.{aggregate}.{event}.v{N}. Examples:

reservation.confirmed.v1
inventory.allocated.v1
payment.captured.v1
lock.key.issued.v1
ai.gateway.call.completed.v1
sync.cursor.advanced.v1

7.2 Transactional Outbox

Every write that produces an event uses the outbox pattern:

BEGIN
UPDATE reservation SET state='confirmed' WHERE ...;
INSERT INTO outbox (id, topic, payload, headers, created_at) VALUES (...);
COMMIT

A separate publisher process drains the outbox to Pub/Sub. This guarantees at-least-once delivery; consumers are required to be idempotent (keyed by eventId + content hash).

7.3 Saga: Booking → Inventory → Payment → Key Issuance

The booking saga is the most-trafficked multi-service orchestration in the platform.

[Guest hits Pay]


bff-tenant-booking-service ──► reservation-service

│ (1) reservation.held.v1 (TTL 10 min)

inventory-service ──► inventory.allocated.v1

│ (2) charge intent

payment-gateway-service ──► payment.captured.v1 / payment.failed.v1

├─ on captured ──► reservation.confirmed.v1
│ │
│ ▼
│ lock-integration-service ──► lock.key.issued.v1
│ │
│ ▼
│ notification-service ──► email/SMS confirmation + key

└─ on failed ──► inventory.released.v1 + reservation.cancelled.v1

Compensation paths are first-class. Every forward step has a declared compensation (release inventory, void payment intent, revoke key). The orchestrator (reservation-service) tracks saga state in its own DB; saga state is never stored in another service.

7.4 Dead-Letter & Replay

Every Pub/Sub subscription has a DLQ. Messages fail-forward into the DLQ after configured retry exhaustion (exponential with jitter, max 5 attempts). The DLQ is monitored; a replay tool re-publishes after fix. Consumers idempotency keys ensure replay is safe.


8. Sync & Offline Architecture

The Electron desktop app is offline-first, not "offline-tolerant". The sync engine is owned by sync-service and consumed via /sync/v1/pull|push. (Web/mobile guest surfaces are online-first with PWA browse cache; only the desktop app is true offline-first.)

8.1 Protocol Surface

POST /sync/v1/pull
Headers: Authorization, X-Tenant-Id, X-Device-Id, X-Sync-Cursor
Body: { scopes: ['reservation','room','folio_draft', ...], maxBatch: 500 }
Response: { deltas: [...], nextCursor: '...', heartbeat: ... }

POST /sync/v1/push
Headers: Authorization, X-Tenant-Id, X-Device-Id, Idempotency-Key
Body: { mutations: [{ clientMutationId, aggregate, op, payload, baseVersion, vectorClock }] }
Response: { results: [{ clientMutationId, status, serverState? , conflict? }] }

8.2 Per-Aggregate Conflict Policy

Every replicable aggregate declares exactly one policy. The matrix below is canonical for Melmastoon; new aggregates must be added here.

AggregatePolicyWhy
Reservation (state)server_authoritativeBooking state is server-issued
Reservation (notes)lww by updatedAtFree-text notes rarely conflict
Folio chargesappend_onlyMoney never overwrites; only adds
Folio paymentsappend_onlySame
Inventoryserver_authoritativeAllocation must come from authority; client cannot decide
Room.statusmax-of by status priority clean<dirty<OOO<OOS then by timestampWorse status wins; protects guest experience
HousekeepingTask.assignmentlww by updatedAtAssignment churn is fine
HousekeepingTask.completionappend_onlyAudit trail
MaintenanceTicket eventsappend_onlyAudit
KeyCredential lifecycleserver_authoritativeLock vendor + saga is the source
Guest profilelww by updatedAtMulti-device staff edits
StaffScheduleserver_authoritativeManager-issued
Notification.preferenceslwwUser-set
AI insight ackappend_onlyAppend-only acknowledgement log

Money and inventory never use last-write-wins. This is non-negotiable.

8.3 Device Binding & Cursors

  • Every device pairs with the platform via iam-service; pairing produces a DeviceId and a device-bound key (stored in OS keychain via keytar).
  • Sync cursors are scoped to (tenantId, userId, deviceId, scope) and persisted in Firestore (cheap, regional, low-ops).
  • Outbox cursors per device are stored in Firestore as well; the desktop app's local SQLite outbox is the source of truth pre-sync, Firestore is the post-sync record.

8.4 Failure Modes

FailureBehavior
Network drop during pullCursor unchanged; resume on next attempt
Network drop during pushIdempotency key + clientMutationId make resend safe
ConflictRecorded in sync.conflicts; surfaced in UI for human-resolvable cases (notes, profile); auto-resolved per policy otherwise
Device clock skewServer overrides timestamps for lww decisions; vector clocks for ordering
Stale device (revoked or re-paired)Pull returns 401 with re-pair instructions; local DB stays read-only until re-paired

9. AI Architecture

ai-orchestrator-service is the single AI gateway. No service calls Vertex AI, OpenAI, or any model directly. Every service depends on ports/AIClient, whose default adapter routes to ai-orchestrator-service.

9.1 Routing

caller (any service or BFF)
│ AIClient.complete({prompt, model, capability, tenantId, ...})

ai-orchestrator-service

├─ pre-call: moderation, PII redaction, budget check, prompt-version pin

├─ route decision:
│ - cloud (Vertex AI) for heavy LLM, vision, embeddings, TTS
│ - edge (ONNX Runtime Node on the desktop main process) for offline / low-latency / sensitive-data

├─ post-call: moderation on output, provenance stamping, cost recording

└─ emit: ai.gateway.call.completed.v1 (with full provenance)

9.2 Edge Inference on Electron

The Electron desktop app ships with a small set of ONNX models executed by ONNX Runtime Node in the main process (Node 20). Renderer never has model bytes; the renderer requests inference via the preload-exposed window.melmastoon.ai.infer(...) IPC. Models are signed; tampering invalidates the signature and the model refuses to load.

Edge-eligible capabilities:

  • Housekeeping order optimization (small TSP-like model)
  • Anomaly heuristics (booking, payment, lock)
  • Demand smoothing for next 7 days
  • Image quality scoring for property photo upload

Edge inference still emits ai.inference.local.completed.v1 to the local outbox; on next sync the event is replayed for audit.

9.3 Provenance & HITL

Every AI artifact persists with aiProvenance:

interface AIProvenance {
model: string; // e.g. 'gemini-1.5-pro' or 'melmastoon-edge-anomaly-v3.onnx'
version?: string;
promptId?: string;
promptVersion?: SemVer;
traceId: string; // W3C traceparent
decisionId?: string; // ties to HITL acceptance
local: boolean; // true if edge
generatedAt: ISODate;
reviewedBy?: UserId;
reviewedAt?: ISODate;
cost?: { microUSD: number; tokens?: { in: number; out: number } };
safety: { input: SafetyVerdict; output: SafetyVerdict };
cacheHit: boolean;
}

HITL gates are required for any irreversible or guest-facing AI action: pricing publish, anomaly-driven cancellation, AI-suggested overbook resolution, AI-drafted guest communication. The action stays in draft_ai state until a human with authority promotes it.


10. GCP Reference Topology

Internet

┌──────────────────▼──────────────────┐
│ Cloud Load Balancer + Cloud Armor │ WAF, DDoS, geo-rules
└──────────────────┬──────────────────┘

┌──────────────────▼──────────────────┐
│ Kong Gateway │ TLS, JWT, rate limit, headers
└──────────────────┬──────────────────┘

┌─────────────────┬──────────┴──────────┬─────────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Next.js │ │ bff-consumer │ │ bff-tenant- │ │ bff-back- │
│ SSR │ │ │ │ booking │ │ office │
│ (Cloud │ │ (Cloud Run) │ │ (Cloud Run) │ │ (Cloud Run) │
│ Run) │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
└──────────┘ │ │ │
└─────────┬──────────┴──────┬───────────┘
▼ ▼
┌─────────────────────────────────┐
│ Cloud Run microservices x 19 │
│ (NestJS, autoscaled, scale-to- │
│ zero on low-traffic services) │
└────┬────────┬────────┬──────────┘
│ │ │
┌──────────────────┘ │ └─────────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌────────────────┐ ┌─────────────────┐
│ Cloud SQL │ │ Pub/Sub │ │ Cloud Storage │
│ (Postgres, │ │ (topics + │ │ (media, PDFs, │
│ per-svc │ │ DLQ + sub.) │ │ theme assets) │
│ schemas; │ └────────────────┘ └─────────────────┘
│ RLS) │
└─────────────┘ │ │
│ │ │
│ │ │
┌─────────────┐ ┌────────────────┐ ┌─────────────────┐
│ Memorystore │ │ Firestore │ │ Vertex AI │
│ (Redis; │ │ (sync cursors, │ │ (LLM, vision, │
│ cache, │ │ outbox marks, │ │ embeddings, │
│ rate- │ │ device pair) │ │ TTS) called │
│ limit) │ └────────────────┘ │ ONLY by ai- │
└─────────────┘ │ orchestrator │
└─────────────────┘

┌─────────────────────────────────┐
│ Secret Manager (all secrets) │
│ KMS (envelope keys)│
│ Cloud Logging / Monitoring / │
│ Trace (OTel everywhere) │
│ Artifact Registry (images) │
└─────────────────────────────────┘

Notes:

  • Cloud Run gives us scale-to-zero for low-traffic services (maintenance-service, reporting-service, audit-service).
  • pgvector runs as an extension inside Cloud SQL; no separate vector DB.
  • All east-west service traffic stays inside the VPC; only Kong is on the public LB.
  • Secret Manager fronts every secret; rotated per security policy. No secrets in env files committed to git, no secrets in client bundles.

11. Security Posture

LayerControl
TransportTLS 1.3 everywhere; HSTS at edge; mTLS between Cloud Run services where supported, otherwise short-lived service-account JWTs
AuthNJWT (15-min access, 30-day rotating refresh); OIDC/SAML SSO for chain operators; WebAuthn opt-in; magic link for low-trust onboarding
AuthZRBAC (coarse) + ABAC (fine: tenantId, propertyId, role, data_residency)
Tenant isolationRLS on every table; PgBouncer init sets app.tenant_id; cross-tenant references rejected at domain layer
At-rest encryptionAES-256 via Google-managed KMS by default; per-tenant CMEK on Plus plan; per-device key for desktop SQLite
SecretsSecret Manager only; rotated; never in client bundles; never in code; access audited
Lock credentialsIsolated namespace inside lock-integration-service; encrypted at rest with a separate KMS key; never logged; never exposed via any other service's API
Payment dataPCI scope minimized to payment-gateway-service; tokens only past the boundary; PAN never persisted; schema-per-tenant audit isolation
DesktopOS keychain via keytar for refresh token + device key; SQLite encrypted with device-derived key; nodeIntegration: false, contextIsolation: true, signed installer + signed updates via electron-updater
AuditImmutable append-only log (audit-service); daily Merkle root anchored externally for tamper evidence
AIPre/post moderation; PII redaction before any cloud-bound payload; per-tenant AI budget; HITL gates for irreversible actions

Full detail in 07 Security & Tenancy.


12. Resilience & Performance

ConcernMechanism
Service-to-service failureCircuit breaker per port; retry with exponential backoff + jitter; fail-fast budget per request
BackpressurePub/Sub buffers absorb spikes; consumer concurrency capped per service; flow-control tokens for sync push
Hot-read latencyMemorystore (Redis) for tenant settings, theme tokens, room/rate snapshots; per-tenant cache namespaces
Edge cacheCloud CDN for static + tenant theme assets; signed URLs for media
Booking funnelIdempotency keys on every mutation; reservation hold TTL 10 min; saga compensations always declared
Searchpgvector + GIN indexes on hot facets; bounding-box geo queries; per-tenant cardinality cap on facets
Desktop syncPer-aggregate conflict policy; batch size cap; resumable cursors; exponential backoff on 5xx
Graceful degradationBooking flow keeps working with cash-on-arrival when payment-gateway-service is degraded; check-in keeps working with offline key issuance queue when lock-integration-service is degraded; AI features degrade silently to non-AI defaults when ai-orchestrator-service is degraded

13. Cost Posture

The platform must be cheap enough for an 8-room independent guesthouse to afford. Every architectural choice is cost-conscious.

LeverChoice
ComputeCloud Run with scale-to-zero on low-traffic services; min-instances only on hot-path services (reservation-service, bff-tenant-booking-service, iam-service)
DatabaseCloud SQL right-sized; HA only on hot services; per-service schema avoids monolith bloat
CacheMemorystore Standard tier (no HA premium) on most services; HA only where staleness is unacceptable
MessagingPub/Sub with batch publish; consumer ack deadline tuned per service; DLQ retention 7 days default
StorageCloud Storage Standard for hot media; Nearline for invoices > 90 days; Coldline for compliance archives
Vectorpgvector inside Cloud SQL (no separate vector DB billing)
AITiered routing in ai-orchestrator-service: edge first when feasible; cheaper Vertex AI models first; per-tenant budget caps; per-feature quotas; soft-degrade before hard-stop
ImageImage optimization on upload (file-storage-service); WebP / AVIF; multi-resolution variants generated once
NetworkLow-bandwidth payload contracts: gzip + brotli; field-level projection on hot endpoints; sync deltas, not full snapshots

14. Compliance & Data Residency

AreaPosture
GDPRLawful basis recorded per processing activity; DSAR (export, erasure, portability) implemented as a saga via audit-service gdpr.subject_request.received.v1
Data residencyPer-tenant residency pin in tenant-service; default region preference: me-central1 / closest GCP region serving Afghanistan; cross-region replication only with explicit opt-in
Audit immutabilityaudit-service is append-only; daily Merkle root anchored to a public timestamp authority
Lock credentialsIsolated KMS key; isolated DB namespace; isolated logs (none); isolated incident-response procedure
Payment dataPCI scope limited to payment-gateway-service; schema-per-tenant; PAN never persisted (tokens only); annual scope review
AIPer-feature risk classification; HITL where required; provenance retained 7 years
AccessibilityWCAG 2.2 AA on all guest + tenant surfaces; backoffice meets AA on critical operational paths
Local regulationTax engine pluggable per jurisdiction; invoice numbering compliant per market (e.g., sequential per Iranian or Afghan rules where applicable)

15. Anti-patterns We Explicitly Avoid

Anti-patternWhy it's bannedWhat we do instead
Synchronous cross-service chains > 2 hopsCouples availability; hides cascading failureAsync via Pub/Sub + saga; orchestrator in the owning context
Shared database across servicesCouples deploys; blurs ownership; defeats RLSPer-service schema; events project read models
Last-write-wins for money or inventorySilent data loss; impossible to auditappend_only for money; server_authoritative for inventory; max-of for status
Vendor-locked lock APIsStrands tenants on one hardware vendorSingle LockPort; vendor adapters live in infrastructure (see ADR-0004)
Vendor-locked AI APIsSame trap, plus model riskAll AI through ai-orchestrator-service; provider rotation behind the gateway
Electron with nodeIntegration: trueRenderer can require('child_process'); XSS becomes RCEnodeIntegration: false, contextIsolation: true, narrow contextBridge only (see ADR-0003)
Secrets in client bundlesPublic by definitionSecret Manager only; clients receive short-lived tokens, never raw credentials
Cross-tenant joins anywhere except search-aggregation-serviceOne bug = one breachRLS at DB; cross-tenant reads only via projected read models
AI calls outside ai-orchestrator-serviceBypasses moderation, provenance, budgetAll AI via ports/AIClient; CI dependency check forbids vendor SDK imports outside ai-orchestrator-service
DB-per-tenant for the entire estate1000+ databases is operationally untenableHybrid: shared schema + RLS for most; schema-per-tenant only for billing-service and payment-gateway-service
Frontend that talks directly to microservicesCouples client to internal topology; defeats BFFThree BFFs; clients only call BFFs and the sync surface
Free-text RTL/LTR detection at runtime per-componentInconsistent direction; layout bugsLocale → direction at app shell; logical CSS properties only

Cross-references: per-service deep docs live in services/<service-name>/. ADRs live in docs/architecture/. The next document in the strategic set is 04 Event-Driven Architecture.