ADR 0002: Multi-tenancy model (Ghasi Melmastoon)
Status
Accepted — 2026-04-22.
Context
Ghasi Melmastoon is a multi-tenant hotel SaaS that must serve everything from a 12-room independent guesthouse to a 30-property regional chain — at the same time, on the same infrastructure, without leaking one tenant's data into another's response. We expect 1000+ tenants within 24 months and 10,000+ at the long horizon. Some tenants (chains) will demand stronger isolation guarantees, especially around finance and payments, where audit, regulatory scope, and offboarding (data export + erasure) become first-order requirements.
We have to choose one authoritative tenancy model that the entire estate conforms to, with documented exceptions where a stricter model is justified. This ADR locks that choice in and is referenced by every per-service spec.
Design forces:
- Operability — a small SRE team cannot manage thousands of databases or thousands of schemas without breaking under migration toil.
- Cost — per-tenant DB or per-tenant schema multiplied by 1000+ tenants is prohibitive at our target price point.
- Security — RLS or stronger isolation must hold even when application code has bugs.
- Auditability — finance and payments must support per-tenant export, per-tenant erasure, per-tenant retention pin, and per-tenant PCI scope review with minimum operational overhead.
- Cross-tenant search — the consumer meta layer must read across tenants for discovery, but only via projected read models, never via cross-tenant joins.
- Migration ergonomics — a tenant must be able to upgrade isolation tier without an application code change.
Decision
We adopt a hybrid multi-tenancy model:
1. Default for all services: shared schema + tenant_id + Postgres RLS
- Every table in every service has
tenant_id UUID NOT NULLas a first-class column with an index appropriate to the access pattern. - Postgres Row-Level Security is enabled on every table with policy
USING (tenant_id = current_setting('app.tenant_id')::uuid). - Application sets
app.tenant_idper request from the JWT claim, on the connection, before any query runs. - Connection pooling: PgBouncer in transaction mode with a connection-init that sets
app.tenant_idon every checkout. Service accounts cannot bypass RLS except for declared ops jobs that explicitly elevate (audited). - Domain layer also enforces tenant invariants: every aggregate root carries
tenantId: TenantId, every cross-aggregate reference is checked at constructor time, every repository method takestenantIdas an explicit parameter. - Outbox writers validate that the event payload's
tenantIdmatches the request context before commit.
This applies to 20 of 22 services, including: iam-service, tenant-service, property-service, reservation-service, inventory-service, pricing-service, housekeeping-service, maintenance-service, notification-service, lock-integration-service, theme-config-service, file-storage-service, reporting-service, analytics-service, audit-service, ai-orchestrator-service, search-aggregation-service, and the three BFFs.
2. Exception: schema-per-tenant for billing-service and payment-gateway-service
These two services hold financial state and payment artifacts. They use schema-per-tenant isolation:
- One Postgres schema per tenant inside the same database cluster:
tenant_<uuid>_billing,tenant_<uuid>_payments. - Schemas are provisioned on tenant creation (saga in
tenant-service) and torn down on tenant offboarding (DSAR-erase saga inaudit-service). - The application sets
search_pathper request from the JWT claim. RLS is still enabled inside each schema as defense in depth, but the primary isolation boundary is the schema itself. - This gives us:
- Cleaner financial audit — auditors can query a single schema without RLS context-setting and be sure they see only one tenant.
- Simpler tenant export/delete on offboarding —
pg_dumpone schema, drop one schema; no row-by-row delete with the attendant index churn or risk of leftover rows. - Per-tenant PCI scope — a tenant whose payment data lives in its own schema is a defensible PCI boundary; cross-tenant payment-data leakage is structurally impossible inside Postgres.
- Per-tenant retention pins — a tenant on a 7-year retention plan and another on 1-year plan can coexist without per-row retention bookkeeping.
3. Cross-tenant queries: only via search-aggregation-service
The consumer meta layer must read across tenants. It does so only through search-aggregation-service, which holds projected read models built from property.*, inventory.*, and pricing.* events. The projection contains:
- No PII.
- No payment data.
- No lock credentials.
- No internal financial detail (only public-facing prices).
No other service in the estate has cross-tenant read paths. Chain operators with multiple tenants get aggregated views by querying each tenant context client-side under per-tenant tokens.
4. Migration path between tiers
If a tenant grows enough to demand isolation beyond shared-schema-with-RLS for a service that defaults to shared schema, we promote them to a dedicated schema (or in the extreme case, a dedicated database) without an application-code change. The application's repository layer receives the schema name from tenant-service settings; the contract with the domain layer does not change. This ADR explicitly preserves that future option.
Alternatives Considered
| Alternative | Why rejected |
|---|---|
| Full schema-per-tenant for the entire estate | At 1000+ tenants, 22 services × 1000+ schemas = 22,000+ schemas to migrate, monitor, vacuum, and back up. Migration toil dominates the SRE roadmap. We retain the pattern only where the financial-audit benefit exceeds the operational cost. |
| DB-per-tenant for the entire estate | Same problem, an order of magnitude worse. Cost (Cloud SQL instances), connection-pool fragmentation, monitoring sprawl, and DR complexity are all linear in tenant count. We do not adopt this; we keep it as a future option for one-off enterprise contracts. |
| Shared schema with no RLS, application-only enforcement | One missed WHERE tenant_id = ? clause in any of 22 services becomes a tenant breach. RLS is the second line of defense that turns an application bug from a breach into a query that returns zero rows. We do not ship without RLS. |
| Shared schema with RLS for finance services too | Workable but loses the audit, offboarding, and PCI-scope wins. Auditors and regulators are easier to satisfy with structural isolation than with RLS-policy proofs. The marginal cost (two extra schemas per tenant) is small and concentrated in two services. |
| Logical replication of finance to a per-tenant DB | Adds a new failure mode (replication lag) to the most sensitive data. We prefer keeping the authoritative store schema-isolated rather than replication-isolated. |
| Tenant-aware proxy (e.g. Citus / partitioned tables) | Adds infrastructure dependency without solving the audit/offboarding/PCI questions. We prefer the simpler PgBouncer-init + RLS pattern for the shared tier. |
Consequences
Positive
- One canonical model across most of the estate (shared + RLS) keeps SRE and developer mental model simple.
- Two services (
billing-service,payment-gateway-service) with schema-per-tenant give us defensible audit, offboarding, and PCI posture without paying that operational cost across the whole platform. - Cross-tenant reads are structurally limited to one service (
search-aggregation-service) — one place to audit, one place to harden. - Migration to stricter isolation per tenant is possible without application changes.
Negative
- Two operational tenancy patterns to maintain (shared+RLS and schema-per-tenant). Mitigated by isolating the schema-per-tenant pattern to two services with shared tooling.
- Schema-per-tenant for finance means tenant-creation and tenant-deletion are sagas, not single-row writes. Mitigated by the existing tenant-lifecycle saga in
tenant-service. - RLS adds a per-query overhead and forces every connection to set
app.tenant_idcorrectly. Mitigated by PgBouncer-init and per-service integration tests that assert RLS is honored. - Per-tenant Postgres role + grant management for the schema-per-tenant tier requires automation; we already need this for tenant lifecycle.
Operational notes
-
PgBouncer connection-init. Every Cloud SQL pool runs an init script:
SET app.tenant_id = '<uuid from JWT claim>';for the shared-tier services, and:
SET search_path = 'tenant_<uuid>_billing','public';for
billing-serviceandpayment-gateway-service. The init runs on every connection checkout in transaction mode. -
RLS policy template (used in every shared-tier table):
ALTER TABLE <name> ENABLE ROW LEVEL SECURITY;ALTER TABLE <name> FORCE ROW LEVEL SECURITY;CREATE POLICY <name>_tenant_isolation ON <name>USING (tenant_id = current_setting('app.tenant_id')::uuid)WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);The
FORCE ROW LEVEL SECURITYclause closes the table-owner bypass; the only legitimate bypass is via a separately credentialed ops-role used for declared maintenance jobs. -
Schema-per-tenant provisioning saga (
tenant-service): ontenant.created.v1, the saga executes templated migrations (tenant_<uuid>_billing,tenant_<uuid>_payments) and emitstenant.finance_schemas.provisioned.v1.billing-serviceandpayment-gateway-servicestart serving the tenant only after this event. -
Schema-per-tenant offboarding saga (DSAR erasure, owned by
audit-service): on confirmed erasure request, the sagapg_dumps the tenant's finance schemas to encrypted Cloud Storage (proof of erasure window), then drops the schemas, then emitstenant.finance_schemas.dropped.v1. Shared-tier services receive the same erasure event and run scoped row-level deletes under a service account that elevatesapp.tenant_idto the target tenant.
Tenant tier escalation matrix
| From | To | Trigger | Mechanism |
|---|---|---|---|
| Shared schema (default) | Dedicated schema (per service) | Enterprise contract demand or audit constraint | tenant-service settings flip a tier flag; per-service repos route by tier; data is migrated row-by-row in a one-shot saga |
| Dedicated schema | Dedicated database | Regulatory or geographic isolation requirement | tenant-service issues a tier-3 flag; per-service infra provisions the new Cloud SQL instance; logical replication seeds it; cutover with read-only window |
The application contract — tenantId parameter on every repository method — is invariant across tiers. No domain code changes when a tenant moves between tiers.
Compliance
- Every table in every shared-schema service must declare
tenant_id UUID NOT NULLand an RLS policy. CI lint enforces. - Every repository method must accept
tenantIdas an explicit parameter; no implicit ALS-only repositories. CI enforces via type lint. - Every PgBouncer pool must run the connection-init that sets
app.tenant_id. Infra CI enforces. billing-serviceandpayment-gateway-servicemigrations must be schema-aware (templated per tenant); migration tooling enforces.search-aggregation-serviceis the only service permitted to query acrosstenant_idboundaries. Code review checklist enforces; periodic dependency-graph audit re-verifies.- Tenant lifecycle saga in
tenant-servicemust provision finance schemas atomically with tenant creation. Integration tests assert. - Tenant offboarding saga must drop finance schemas as part of the GDPR erasure flow. Integration tests assert.