Skip to main content

ADR 0002: Multi-tenancy model (Ghasi Melmastoon)

Status

Accepted — 2026-04-22.

Context

Ghasi Melmastoon is a multi-tenant hotel SaaS that must serve everything from a 12-room independent guesthouse to a 30-property regional chain — at the same time, on the same infrastructure, without leaking one tenant's data into another's response. We expect 1000+ tenants within 24 months and 10,000+ at the long horizon. Some tenants (chains) will demand stronger isolation guarantees, especially around finance and payments, where audit, regulatory scope, and offboarding (data export + erasure) become first-order requirements.

We have to choose one authoritative tenancy model that the entire estate conforms to, with documented exceptions where a stricter model is justified. This ADR locks that choice in and is referenced by every per-service spec.

Design forces:

  • Operability — a small SRE team cannot manage thousands of databases or thousands of schemas without breaking under migration toil.
  • Cost — per-tenant DB or per-tenant schema multiplied by 1000+ tenants is prohibitive at our target price point.
  • Security — RLS or stronger isolation must hold even when application code has bugs.
  • Auditability — finance and payments must support per-tenant export, per-tenant erasure, per-tenant retention pin, and per-tenant PCI scope review with minimum operational overhead.
  • Cross-tenant search — the consumer meta layer must read across tenants for discovery, but only via projected read models, never via cross-tenant joins.
  • Migration ergonomics — a tenant must be able to upgrade isolation tier without an application code change.

Decision

We adopt a hybrid multi-tenancy model:

1. Default for all services: shared schema + tenant_id + Postgres RLS

  • Every table in every service has tenant_id UUID NOT NULL as a first-class column with an index appropriate to the access pattern.
  • Postgres Row-Level Security is enabled on every table with policy USING (tenant_id = current_setting('app.tenant_id')::uuid).
  • Application sets app.tenant_id per request from the JWT claim, on the connection, before any query runs.
  • Connection pooling: PgBouncer in transaction mode with a connection-init that sets app.tenant_id on every checkout. Service accounts cannot bypass RLS except for declared ops jobs that explicitly elevate (audited).
  • Domain layer also enforces tenant invariants: every aggregate root carries tenantId: TenantId, every cross-aggregate reference is checked at constructor time, every repository method takes tenantId as an explicit parameter.
  • Outbox writers validate that the event payload's tenantId matches the request context before commit.

This applies to 20 of 22 services, including: iam-service, tenant-service, property-service, reservation-service, inventory-service, pricing-service, housekeeping-service, maintenance-service, notification-service, lock-integration-service, theme-config-service, file-storage-service, reporting-service, analytics-service, audit-service, ai-orchestrator-service, search-aggregation-service, and the three BFFs.

2. Exception: schema-per-tenant for billing-service and payment-gateway-service

These two services hold financial state and payment artifacts. They use schema-per-tenant isolation:

  • One Postgres schema per tenant inside the same database cluster: tenant_<uuid>_billing, tenant_<uuid>_payments.
  • Schemas are provisioned on tenant creation (saga in tenant-service) and torn down on tenant offboarding (DSAR-erase saga in audit-service).
  • The application sets search_path per request from the JWT claim. RLS is still enabled inside each schema as defense in depth, but the primary isolation boundary is the schema itself.
  • This gives us:
    • Cleaner financial audit — auditors can query a single schema without RLS context-setting and be sure they see only one tenant.
    • Simpler tenant export/delete on offboardingpg_dump one schema, drop one schema; no row-by-row delete with the attendant index churn or risk of leftover rows.
    • Per-tenant PCI scope — a tenant whose payment data lives in its own schema is a defensible PCI boundary; cross-tenant payment-data leakage is structurally impossible inside Postgres.
    • Per-tenant retention pins — a tenant on a 7-year retention plan and another on 1-year plan can coexist without per-row retention bookkeeping.

3. Cross-tenant queries: only via search-aggregation-service

The consumer meta layer must read across tenants. It does so only through search-aggregation-service, which holds projected read models built from property.*, inventory.*, and pricing.* events. The projection contains:

  • No PII.
  • No payment data.
  • No lock credentials.
  • No internal financial detail (only public-facing prices).

No other service in the estate has cross-tenant read paths. Chain operators with multiple tenants get aggregated views by querying each tenant context client-side under per-tenant tokens.

4. Migration path between tiers

If a tenant grows enough to demand isolation beyond shared-schema-with-RLS for a service that defaults to shared schema, we promote them to a dedicated schema (or in the extreme case, a dedicated database) without an application-code change. The application's repository layer receives the schema name from tenant-service settings; the contract with the domain layer does not change. This ADR explicitly preserves that future option.

Alternatives Considered

AlternativeWhy rejected
Full schema-per-tenant for the entire estateAt 1000+ tenants, 22 services × 1000+ schemas = 22,000+ schemas to migrate, monitor, vacuum, and back up. Migration toil dominates the SRE roadmap. We retain the pattern only where the financial-audit benefit exceeds the operational cost.
DB-per-tenant for the entire estateSame problem, an order of magnitude worse. Cost (Cloud SQL instances), connection-pool fragmentation, monitoring sprawl, and DR complexity are all linear in tenant count. We do not adopt this; we keep it as a future option for one-off enterprise contracts.
Shared schema with no RLS, application-only enforcementOne missed WHERE tenant_id = ? clause in any of 22 services becomes a tenant breach. RLS is the second line of defense that turns an application bug from a breach into a query that returns zero rows. We do not ship without RLS.
Shared schema with RLS for finance services tooWorkable but loses the audit, offboarding, and PCI-scope wins. Auditors and regulators are easier to satisfy with structural isolation than with RLS-policy proofs. The marginal cost (two extra schemas per tenant) is small and concentrated in two services.
Logical replication of finance to a per-tenant DBAdds a new failure mode (replication lag) to the most sensitive data. We prefer keeping the authoritative store schema-isolated rather than replication-isolated.
Tenant-aware proxy (e.g. Citus / partitioned tables)Adds infrastructure dependency without solving the audit/offboarding/PCI questions. We prefer the simpler PgBouncer-init + RLS pattern for the shared tier.

Consequences

Positive

  • One canonical model across most of the estate (shared + RLS) keeps SRE and developer mental model simple.
  • Two services (billing-service, payment-gateway-service) with schema-per-tenant give us defensible audit, offboarding, and PCI posture without paying that operational cost across the whole platform.
  • Cross-tenant reads are structurally limited to one service (search-aggregation-service) — one place to audit, one place to harden.
  • Migration to stricter isolation per tenant is possible without application changes.

Negative

  • Two operational tenancy patterns to maintain (shared+RLS and schema-per-tenant). Mitigated by isolating the schema-per-tenant pattern to two services with shared tooling.
  • Schema-per-tenant for finance means tenant-creation and tenant-deletion are sagas, not single-row writes. Mitigated by the existing tenant-lifecycle saga in tenant-service.
  • RLS adds a per-query overhead and forces every connection to set app.tenant_id correctly. Mitigated by PgBouncer-init and per-service integration tests that assert RLS is honored.
  • Per-tenant Postgres role + grant management for the schema-per-tenant tier requires automation; we already need this for tenant lifecycle.

Operational notes

  • PgBouncer connection-init. Every Cloud SQL pool runs an init script:

    SET app.tenant_id = '<uuid from JWT claim>';

    for the shared-tier services, and:

    SET search_path = 'tenant_<uuid>_billing','public';

    for billing-service and payment-gateway-service. The init runs on every connection checkout in transaction mode.

  • RLS policy template (used in every shared-tier table):

    ALTER TABLE <name> ENABLE ROW LEVEL SECURITY;
    ALTER TABLE <name> FORCE ROW LEVEL SECURITY;
    CREATE POLICY <name>_tenant_isolation ON <name>
    USING (tenant_id = current_setting('app.tenant_id')::uuid)
    WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);

    The FORCE ROW LEVEL SECURITY clause closes the table-owner bypass; the only legitimate bypass is via a separately credentialed ops-role used for declared maintenance jobs.

  • Schema-per-tenant provisioning saga (tenant-service): on tenant.created.v1, the saga executes templated migrations (tenant_<uuid>_billing, tenant_<uuid>_payments) and emits tenant.finance_schemas.provisioned.v1. billing-service and payment-gateway-service start serving the tenant only after this event.

  • Schema-per-tenant offboarding saga (DSAR erasure, owned by audit-service): on confirmed erasure request, the saga pg_dumps the tenant's finance schemas to encrypted Cloud Storage (proof of erasure window), then drops the schemas, then emits tenant.finance_schemas.dropped.v1. Shared-tier services receive the same erasure event and run scoped row-level deletes under a service account that elevates app.tenant_id to the target tenant.

Tenant tier escalation matrix

FromToTriggerMechanism
Shared schema (default)Dedicated schema (per service)Enterprise contract demand or audit constrainttenant-service settings flip a tier flag; per-service repos route by tier; data is migrated row-by-row in a one-shot saga
Dedicated schemaDedicated databaseRegulatory or geographic isolation requirementtenant-service issues a tier-3 flag; per-service infra provisions the new Cloud SQL instance; logical replication seeds it; cutover with read-only window

The application contract — tenantId parameter on every repository method — is invariant across tiers. No domain code changes when a tenant moves between tiers.

Compliance

  • Every table in every shared-schema service must declare tenant_id UUID NOT NULL and an RLS policy. CI lint enforces.
  • Every repository method must accept tenantId as an explicit parameter; no implicit ALS-only repositories. CI enforces via type lint.
  • Every PgBouncer pool must run the connection-init that sets app.tenant_id. Infra CI enforces.
  • billing-service and payment-gateway-service migrations must be schema-aware (templated per tenant); migration tooling enforces.
  • search-aggregation-service is the only service permitted to query across tenant_id boundaries. Code review checklist enforces; periodic dependency-graph audit re-verifies.
  • Tenant lifecycle saga in tenant-service must provision finance schemas atomically with tenant creation. Integration tests assert.
  • Tenant offboarding saga must drop finance schemas as part of the GDPR erasure flow. Integration tests assert.

References