ADR 0001: Core architecture & tech stack (Ghasi Melmastoon)

Status

Accepted — 2026-04-22.

Context

Ghasi Melmastoon must ship four user-facing surfaces (consumer meta web, consumer mobile, tenant-branded booking web/mobile, AI-first offline-first backoffice desktop) plus 22 backend services, in a market that demands cost-discipline, low-bandwidth tolerance, RTL/LTR parity, cash-heavy payments, and pluggable lock hardware. The team is small, hires from a JS/TS-heavy talent pool in our target geographies, and must be productive immediately with AI coding tools (Cursor, Copilot, Claude). We need one tech-stack decision that simultaneously protects velocity, operability, and architectural cleanliness for the next 5 years.

This ADR locks in the foundational technology choices on which every other ADR and per-service spec depends. Subsequent ADRs (multi-tenancy model, Electron offline-first, lock-integration abstraction) extend this baseline.

Design forces:

One language across the estate to maximize cross-team mobility, code review, shared libraries (@ghasi/ui-melmastoon, @ghasi/domain-money, @ghasi/sync-client), and AI-tool effectiveness.
Clean Architecture / DDD to keep the domain layer pure and replaceable, and to scale the codebase past 22 services without service-fragmentation chaos.
Event-driven to enable offline sync, audit, replay, and cross-service decoupling.
GCP-first because Vertex AI is co-located, GCP cost is competitive at our scale, and our target region has acceptable GCP latency.
Three BFFs because anonymous-consumer, in-funnel-guest, and authenticated-staff surfaces have incompatible auth, cache, and shape requirements.
Electron for desktop because the staff-facing line-of-business app needs offline-first SQLite, ONNX edge inference, OS keychain integration, signed installers, vendor SDKs (locks, peripherals) shipped as Node bindings, and a hiring profile that aligns with the rest of the team.

Decision

We adopt the following stack and architectural pattern across the entire platform.

1. Language

TypeScript everywhere. Backend (NestJS), frontend web (Next.js), frontend mobile (React Native), frontend desktop (Electron renderer + main), shared packages, infrastructure tooling (CDKTF / Pulumi), CI scripts. No polyglot. Domain layers are pure TS with zero framework imports.

2. Backend

NestJS microservices, one per bounded context (22 services).
Clean / Hexagonal Architecture in every service: presentation → application → domain → infrastructure with strict inward-only dependency rule. Domain is framework-free.
Drizzle (or pg directly for hot paths) for Postgres access; no ORM in the domain layer, repositories live in infrastructure and return domain entities.
Zod for DTO validation at the presentation/application boundary; never inside the domain.

3. Frontend (web)

Next.js (App Router) + TailwindCSS + React Query + Zustand for the consumer meta layer and the tenant booking experience.
PWA shell for browse caching on consumer surfaces (not full offline — only browse cache + last-results cache).
Per-tenant theming applied at runtime via theme-config-service (CSS variables + token bundle, no per-tenant build).

4. Frontend (mobile)

React Native (single consumer app, multi-tenant aware). Shares @ghasi/ui-melmastoon design tokens and i18n bundle with web.
Native modules only for: payments SDKs, push notifications, deep links, geo, secure-store.

5. Frontend (desktop backoffice)

Electron (Node 20 main process + Chromium renderer) + Vite + React + better-sqlite3 local store + ONNX Runtime Node for offline AI inference + electron-builder + electron-updater + keytar for OS keychain.
Strict process model: main process owns OS, lock SDKs, keychain, AI inference, sync worker; renderer is React-only with nodeIntegration: false, contextIsolation: true; preload exposes a narrow typed window.melmastoon via contextBridge.
Detail and rationale in ADR-0003.

6. Cloud

Google Cloud Platform. Cloud Run for compute, Cloud SQL (Postgres) for OLTP, Pub/Sub for messaging, Cloud Storage for media + invoices + theme assets, Memorystore (Redis) for cache + rate-limit, Firestore for sync cursors + outbox marks + device-pair state, Vertex AI for cloud LLM/embeddings/vision/TTS, Secret Manager for secrets, KMS for envelope encryption, Cloud Logging/Monitoring/Trace for OTel.
Kong as the north-south gateway (TLS, JWT validation, rate limit, header propagation).
pgvector as a Postgres extension; no separate vector DB.

7. Architecture pattern

Event-driven microservices on Pub/Sub. Subject pattern {service}.{aggregate}.{event}.v{N}. Transactional outbox per service. At-least-once delivery; idempotent consumers.
Saga orchestration for multi-service flows (notably booking → inventory → payment → key issuance), with declared compensations.
No synchronous cross-service chain longer than 2 hops.
Three BFFs — bff-consumer-service, bff-tenant-booking-service, bff-backoffice-service — because the surfaces have incompatible auth, cache, and shape requirements (full justification in 02 Enterprise Architecture §5).
Hybrid multi-tenancy: shared schema + tenant_id + Postgres RLS for most services; schema-per-tenant for billing-service and payment-gateway-service (detail in ADR-0002).
Single AI gateway (ai-orchestrator-service) is the only egress to Vertex AI or any external model provider; edge inference on Electron via ONNX Runtime Node.

Alternatives Considered

Alternative	Why rejected
Polyglot stack (Go for hot services, Python for AI, TS for frontend)	Inflates onboarding cost; fragments shared libraries; hurts AI-tool effectiveness across the estate; small team cannot afford context-switching tax. The 5-10% perf upside on hot services does not justify the org cost.
Tauri desktop instead of Electron	Lock vendors (TTLock, Salto, Assa Abloy) ship Node bindings, not Rust crates. `better-sqlite3` and ONNX Runtime Node are first-class Node ecosystem. `electron-builder` + `electron-updater` give one-click signed installers across Windows/macOS/Linux that hotel IT can deploy with no extra toolchain. Hiring profile in target markets favors JS/TS over Rust. The Tauri binary-size win is real but does not move the needle for a staff-installed line-of-business app. Detail in ADR-0003.
Single BFF for all surfaces	The three surfaces have incompatible auth (anonymous vs. tenant-scoped vs. authenticated-staff), incompatible cache policy (cross-tenant aggressive vs. tenant-namespaced vs. user-scoped), and incompatible response shapes (denormalized listings vs. funnel state vs. operational dashboard). Collapsing them either over-fetches or under-protects.
AWS instead of GCP	Vertex AI proximity matters for our AI-first thesis; GCP cost is competitive at our scale; Cloud Run's scale-to-zero is a better fit for our long-tail services than Fargate. AWS is not strictly worse but offers no advantage that justifies switching.
Firebase-only backend	Operational complexity at our service count + multi-tenancy + RLS requirements; Firestore alone cannot do the relational + RLS work that 22 services need. Firestore stays in scope for sync cursors only.
Monolith with modules	Defeats the offline-first sync surface (a monolith cannot expose per-aggregate sync semantics cleanly), defeats per-service deploy cadence, and locks the team into one runtime decision for 5 years. The 22-service split mirrors team ownership and bounded contexts.
GraphQL on the wire	REST + BFF gives us better cache headers, simpler edge tooling at Kong, and a smaller dependency surface. GraphQL is not banned for internal exploration but is not the contract.
Custom ORM in domain	Defeats Clean Architecture; couples domain to schema. We use repositories in infrastructure and pure entities in domain.

Consequences

Positive

One language across the team; one shared library set; one AI-tool prompt context.
Clean Architecture lets us swap NestJS, Postgres, Pub/Sub, or Vertex AI without touching the domain — each is one ADR away from being replaceable.
Three BFFs let each surface scale, cache, and authenticate independently.
Event-driven backbone gives us audit, replay, offline sync, and decoupling for free.
Electron desktop has the mature signed-installer + auto-update story that hotel IT departments expect, plus access to the Node ecosystem of lock-vendor SDKs.
GCP + Vertex AI + Cloud Run scale-to-zero is cost-aligned with our target market.

Negative

22 services is a lot of operational surface for a small team; we mitigate via the per-service template (docs/standards/SERVICE_TEMPLATE.md), shared CI, and shared infrastructure modules.
Electron binaries are larger than Tauri; we accept this — desktop install size is not the constraint, network reliability is.
Hybrid tenancy (shared + schema-per-tenant for finance services) means two RLS strategies to operate; documented in ADR-0002.
TypeScript-only foreclose certain perf optimizations; we accept this and re-evaluate per service if a hot path proves it cannot meet SLO in TS.
Vendor-rotation cost for AI is real; we mitigate by funneling all AI through ai-orchestrator-service so rotation is a one-place change.

Risk register

Risk	Likelihood	Impact	Mitigation
TS hot-path performance shortfall on a specific service	Low	Medium	Per-service SLO; if breached, escalate to ADR for that single service rather than estate-wide language change
Cloud Run cold-start hurting hot-path UX	Medium	Medium	`min-instances` on hot services (`reservation-service`, `bff-tenant-booking-service`, `iam-service`); scale-to-zero only on long-tail
Vendor-rotation pain on AI provider	Medium	Medium	All AI through `ai-orchestrator-service`; rotation is a one-place adapter swap
Vendor-rotation pain on lock vendor	Medium	High	`LockPort` abstraction (see ADR-0004)
GCP region outage	Low	High	Daily Cloud SQL backups; quarterly DR drill; per-tenant residency pin
Electron security regression	Medium	High	Renderer hardening enforced in CI (see ADR-0003)
Pub/Sub at-least-once → duplicate events	High	Low	Consumer-side idempotency keyed by `eventId` + content hash; required, not optional

Operational expectations

Per-service SLO is documented in each service's SERVICE_READINESS.md. Default targets: 99.9% availability, p95 < 250ms for hot reads, p95 < 800ms for hot writes, sync push p95 < 1.5s for batches ≤ 100 mutations.
Per-service runbook is required for production sign-off; lives in services/<service-name>/RUNBOOK.md.
OTel everywhere — every service emits traces, metrics, and logs; traceparent propagated through HTTP, Pub/Sub headers, and SSE.
Schema registry — every Pub/Sub subject has a JSON Schema in packages/event-schemas/; CI enforces presence and backward compatibility.

Compliance

Every service must follow the layered structure under services/<service-name>/src/{presentation,application,domain,infrastructure}.
Every service must emit OpenAPI to services/<service-name>/openapi.json; CI enforces presence.
Every service must declare its conflict policy per replicable aggregate; CI enforces presence in SYNC_CONTRACT.md.
Every backend mutation endpoint must accept Idempotency-Key.
No service may import a vendor AI SDK except ai-orchestrator-service; CI enforces.
No service may import a lock vendor SDK except lock-integration-service; CI enforces.
The Electron desktop app must ship with nodeIntegration: false, contextIsolation: true, signed installer, signed auto-update; CI enforces via electron-builder config lint.
All cross-context references in code must use the TenantId value object; CI enforces via type lint.

Status​

Context​

Decision​

1. Language​

2. Backend​

3. Frontend (web)​

4. Frontend (mobile)​

5. Frontend (desktop backoffice)​

6. Cloud​

7. Architecture pattern​

Alternatives Considered​

Consequences​

Positive​

Negative​

Risk register​

Operational expectations​

Compliance​

References​