ADR 0002: Multi-tenancy model (Ghasi Melmastoon)

Status

Accepted — 2026-04-22.

Context

Ghasi Melmastoon is a multi-tenant hotel SaaS that must serve everything from a 12-room independent guesthouse to a 30-property regional chain — at the same time, on the same infrastructure, without leaking one tenant's data into another's response. We expect 1000+ tenants within 24 months and 10,000+ at the long horizon. Some tenants (chains) will demand stronger isolation guarantees, especially around finance and payments, where audit, regulatory scope, and offboarding (data export + erasure) become first-order requirements.

We have to choose one authoritative tenancy model that the entire estate conforms to, with documented exceptions where a stricter model is justified. This ADR locks that choice in and is referenced by every per-service spec.

Design forces:

Operability — a small SRE team cannot manage thousands of databases or thousands of schemas without breaking under migration toil.
Cost — per-tenant DB or per-tenant schema multiplied by 1000+ tenants is prohibitive at our target price point.
Security — RLS or stronger isolation must hold even when application code has bugs.
Auditability — finance and payments must support per-tenant export, per-tenant erasure, per-tenant retention pin, and per-tenant PCI scope review with minimum operational overhead.
Cross-tenant search — the consumer meta layer must read across tenants for discovery, but only via projected read models, never via cross-tenant joins.
Migration ergonomics — a tenant must be able to upgrade isolation tier without an application code change.

Decision

We adopt a hybrid multi-tenancy model:

1. Default for all services: shared schema + `tenant_id` + Postgres RLS

Every table in every service has tenant_id UUID NOT NULL as a first-class column with an index appropriate to the access pattern.
Postgres Row-Level Security is enabled on every table with policy USING (tenant_id = current_setting('app.tenant_id')::uuid).
Application sets app.tenant_id per request from the JWT claim, on the connection, before any query runs.
Connection pooling: PgBouncer in transaction mode with a connection-init that sets app.tenant_id on every checkout. Service accounts cannot bypass RLS except for declared ops jobs that explicitly elevate (audited).
Domain layer also enforces tenant invariants: every aggregate root carries tenantId: TenantId, every cross-aggregate reference is checked at constructor time, every repository method takes tenantId as an explicit parameter.
Outbox writers validate that the event payload's tenantId matches the request context before commit.

This applies to 20 of 22 services, including: iam-service, tenant-service, property-service, reservation-service, inventory-service, pricing-service, housekeeping-service, maintenance-service, notification-service, lock-integration-service, theme-config-service, file-storage-service, reporting-service, analytics-service, audit-service, ai-orchestrator-service, search-aggregation-service, and the three BFFs.

2. Exception: schema-per-tenant for `billing-service` and `payment-gateway-service`

These two services hold financial state and payment artifacts. They use schema-per-tenant isolation:

One Postgres schema per tenant inside the same database cluster: tenant_<uuid>_billing, tenant_<uuid>_payments.
Schemas are provisioned on tenant creation (saga in tenant-service) and torn down on tenant offboarding (DSAR-erase saga in audit-service).
The application sets search_path per request from the JWT claim. RLS is still enabled inside each schema as defense in depth, but the primary isolation boundary is the schema itself.
This gives us:
- Cleaner financial audit — auditors can query a single schema without RLS context-setting and be sure they see only one tenant.
- Simpler tenant export/delete on offboarding — pg_dump one schema, drop one schema; no row-by-row delete with the attendant index churn or risk of leftover rows.
- Per-tenant PCI scope — a tenant whose payment data lives in its own schema is a defensible PCI boundary; cross-tenant payment-data leakage is structurally impossible inside Postgres.
- Per-tenant retention pins — a tenant on a 7-year retention plan and another on 1-year plan can coexist without per-row retention bookkeeping.

3. Cross-tenant queries: only via `search-aggregation-service`

The consumer meta layer must read across tenants. It does so only through search-aggregation-service, which holds projected read models built from property.*, inventory.*, and pricing.* events. The projection contains:

No PII.
No payment data.
No lock credentials.
No internal financial detail (only public-facing prices).

No other service in the estate has cross-tenant read paths. Chain operators with multiple tenants get aggregated views by querying each tenant context client-side under per-tenant tokens.

4. Migration path between tiers

If a tenant grows enough to demand isolation beyond shared-schema-with-RLS for a service that defaults to shared schema, we promote them to a dedicated schema (or in the extreme case, a dedicated database) without an application-code change. The application's repository layer receives the schema name from tenant-service settings; the contract with the domain layer does not change. This ADR explicitly preserves that future option.

Alternatives Considered

Alternative	Why rejected
Full schema-per-tenant for the entire estate	At 1000+ tenants, 22 services × 1000+ schemas = 22,000+ schemas to migrate, monitor, vacuum, and back up. Migration toil dominates the SRE roadmap. We retain the pattern only where the financial-audit benefit exceeds the operational cost.
DB-per-tenant for the entire estate	Same problem, an order of magnitude worse. Cost (Cloud SQL instances), connection-pool fragmentation, monitoring sprawl, and DR complexity are all linear in tenant count. We do not adopt this; we keep it as a future option for one-off enterprise contracts.
Shared schema with no RLS, application-only enforcement	One missed `WHERE tenant_id = ?` clause in any of 22 services becomes a tenant breach. RLS is the second line of defense that turns an application bug from a breach into a query that returns zero rows. We do not ship without RLS.
Shared schema with RLS for finance services too	Workable but loses the audit, offboarding, and PCI-scope wins. Auditors and regulators are easier to satisfy with structural isolation than with RLS-policy proofs. The marginal cost (two extra schemas per tenant) is small and concentrated in two services.
Logical replication of finance to a per-tenant DB	Adds a new failure mode (replication lag) to the most sensitive data. We prefer keeping the authoritative store schema-isolated rather than replication-isolated.
Tenant-aware proxy (e.g. Citus / partitioned tables)	Adds infrastructure dependency without solving the audit/offboarding/PCI questions. We prefer the simpler PgBouncer-init + RLS pattern for the shared tier.

Consequences

Positive

One canonical model across most of the estate (shared + RLS) keeps SRE and developer mental model simple.
Two services (billing-service, payment-gateway-service) with schema-per-tenant give us defensible audit, offboarding, and PCI posture without paying that operational cost across the whole platform.
Cross-tenant reads are structurally limited to one service (search-aggregation-service) — one place to audit, one place to harden.
Migration to stricter isolation per tenant is possible without application changes.

Negative

Two operational tenancy patterns to maintain (shared+RLS and schema-per-tenant). Mitigated by isolating the schema-per-tenant pattern to two services with shared tooling.
Schema-per-tenant for finance means tenant-creation and tenant-deletion are sagas, not single-row writes. Mitigated by the existing tenant-lifecycle saga in tenant-service.
RLS adds a per-query overhead and forces every connection to set app.tenant_id correctly. Mitigated by PgBouncer-init and per-service integration tests that assert RLS is honored.
Per-tenant Postgres role + grant management for the schema-per-tenant tier requires automation; we already need this for tenant lifecycle.

Operational notes

PgBouncer connection-init. Every Cloud SQL pool runs an init script:
```
SET app.tenant_id = '<uuid from JWT claim>';
```
for the shared-tier services, and:
```
SET search_path = 'tenant_<uuid>_billing','public';
```
for billing-service and payment-gateway-service. The init runs on every connection checkout in transaction mode.

RLS policy template (used in every shared-tier table):

ALTER TABLE <name> ENABLE ROW LEVEL SECURITY;
ALTER TABLE <name> FORCE ROW LEVEL SECURITY;
CREATE POLICY <name>_tenant_isolation ON <name>
  USING (tenant_id = current_setting('app.tenant_id')::uuid)
  WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);

The FORCE ROW LEVEL SECURITY clause closes the table-owner bypass; the only legitimate bypass is via a separately credentialed ops-role used for declared maintenance jobs.

Schema-per-tenant provisioning saga (tenant-service): on tenant.created.v1, the saga executes templated migrations (tenant_<uuid>_billing, tenant_<uuid>_payments) and emits tenant.finance_schemas.provisioned.v1. billing-service and payment-gateway-service start serving the tenant only after this event.
Schema-per-tenant offboarding saga (DSAR erasure, owned by audit-service): on confirmed erasure request, the saga pg_dumps the tenant's finance schemas to encrypted Cloud Storage (proof of erasure window), then drops the schemas, then emits tenant.finance_schemas.dropped.v1. Shared-tier services receive the same erasure event and run scoped row-level deletes under a service account that elevates app.tenant_id to the target tenant.

Tenant tier escalation matrix

From	To	Trigger	Mechanism
Shared schema (default)	Dedicated schema (per service)	Enterprise contract demand or audit constraint	`tenant-service` settings flip a tier flag; per-service repos route by tier; data is migrated row-by-row in a one-shot saga
Dedicated schema	Dedicated database	Regulatory or geographic isolation requirement	`tenant-service` issues a tier-3 flag; per-service infra provisions the new Cloud SQL instance; logical replication seeds it; cutover with read-only window

The application contract — tenantId parameter on every repository method — is invariant across tiers. No domain code changes when a tenant moves between tiers.

Compliance

Every table in every shared-schema service must declare tenant_id UUID NOT NULL and an RLS policy. CI lint enforces.
Every repository method must accept tenantId as an explicit parameter; no implicit ALS-only repositories. CI enforces via type lint.
Every PgBouncer pool must run the connection-init that sets app.tenant_id. Infra CI enforces.
billing-service and payment-gateway-service migrations must be schema-aware (templated per tenant); migration tooling enforces.
search-aggregation-service is the only service permitted to query across tenant_id boundaries. Code review checklist enforces; periodic dependency-graph audit re-verifies.
Tenant lifecycle saga in tenant-service must provision finance schemas atomically with tenant creation. Integration tests assert.
Tenant offboarding saga must drop finance schemas as part of the GDPR erasure flow. Integration tests assert.

Status​

Context​

Decision​

1. Default for all services: shared schema + tenant_id + Postgres RLS​

2. Exception: schema-per-tenant for billing-service and payment-gateway-service​

3. Cross-tenant queries: only via search-aggregation-service​

4. Migration path between tiers​

Alternatives Considered​

Consequences​

Positive​

Negative​

Operational notes​

Tenant tier escalation matrix​

Compliance​

References​