file-storage-service — DATA_MODEL
Companion: DOMAIN_MODEL · APPLICATION_LOGIC · SECURITY_MODEL · 06 Data Models · ADR-0002 Multi-Tenancy · standards/NAMING.md
Storage is Cloud SQL Postgres 16 (regional HA, europe-west4 primary; cross-region read replica in europe-west1 for the retention sweeper) for all metadata, audit, and policy rows. The blob store is Google Cloud Storage: three platform buckets (melmastoon-media-prod, melmastoon-private-prod, melmastoon-archive-prod) with mandatory tenants/{tenantId}/... key prefix. CDN edge cache is Google Cloud CDN in front of melmastoon-media-prod only. Metadata lives in the single platform schema file_storage; tenant isolation is enforced by RLS on tenant_id and by the domain ObjectKey invariant — file storage is not per-tenant-schema (different from billing-service); a single shared schema lets us run one set of cross-tenant background sweepers efficiently.
PgBouncer is in transaction pooling mode. Every request handler calls SET LOCAL app.tenant_id = '<tnt_…>' immediately after checkout; every RLS policy reads current_setting('app.tenant_id', true). The mandatory tenant-isolation.spec.ts integration test proves that omission of the SET fails the policy (read returns zero rows; write raises 42501 insufficient_privilege).
1. ID prefixes (additions to NAMING.md)
| Prefix | Aggregate / table |
|---|---|
med_ | file_objects (existing in NAMING) |
bkt_ | buckets (logical) |
ups_ | upload_sessions |
var_ | variants |
scn_ | scan_results |
ret_ | retention_policies |
grt_ | access_grants |
ers_ | erasure_requests |
hld_ | retention_holds |
qta_ | quotas (one per tenant; primary key tenantId) |
All ULIDs (Crockford 26-char). Money is irrelevant to this service. bytes columns are bigint (signed); GCS objects are capped at 5 TiB so a bigint is comfortable. SHA-256 stored as bytea (32 bytes) with a text accessor for hex.
2. Schema layout (file_storage)
CREATE SCHEMA IF NOT EXISTS file_storage;
SET search_path TO file_storage, public;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Every table below has:
tenant_id text NOT NULL(thetnt_…ULID).created_at timestamptz NOT NULL DEFAULT now()andupdated_at timestamptz NOT NULL DEFAULT now()where mutable.version int NOT NULL DEFAULT 1on every aggregate root for optimistic concurrency.- An RLS policy named
<table>_tenant_isolationthat readscurrent_setting('app.tenant_id', true). - A composite index leading with
tenant_id.
2.1 buckets (logical bucket registry — platform-scoped)
CREATE TABLE buckets (
id text PRIMARY KEY CHECK (id LIKE 'bkt_%'),
name text NOT NULL UNIQUE, -- 'media' | 'private' | 'archive'
gcs_bucket text NOT NULL, -- 'melmastoon-media-prod'
data_class text NOT NULL CHECK (data_class IN ('public_media','private','archive')),
cdn_enabled boolean NOT NULL DEFAULT false,
cmek_key_resource text, -- projects/.../cryptoKeys/...
default_retention text NOT NULL, -- retention_policies.name
created_at timestamptz NOT NULL DEFAULT now()
);
-- platform-only table; RLS deliberately disabled. Read-only at runtime; rows seeded by terraform.
Seeded rows:
id | name | gcs_bucket | data_class | cdn_enabled | default_retention |
|---|---|---|---|---|---|
bkt_media | media | melmastoon-media-prod | public_media | true | default |
bkt_private | private | melmastoon-private-prod | private | false | default |
bkt_archive | archive | melmastoon-archive-prod | archive | false | tax_compliance |
2.2 file_objects (the central aggregate)
CREATE TABLE file_objects (
id text PRIMARY KEY CHECK (id LIKE 'med_%'),
tenant_id text NOT NULL CHECK (tenant_id LIKE 'tnt_%'),
bucket_id text NOT NULL REFERENCES buckets(id) ON DELETE RESTRICT,
scope text NOT NULL CHECK (scope IN (
'property_photo','tenant_logo','theme_asset',
'invoice_pdf','receipt_scan','guest_id_scan',
'vendor_lock_report','notification_attachment','misc')),
data_class text NOT NULL CHECK (data_class IN ('public_media','private','archive')),
object_key text NOT NULL, -- 'tenants/tnt_../scope/.../med_..ext'
status text NOT NULL CHECK (status IN (
'initiated','uploaded','scanning','ready',
'quarantined','archived','purged')),
content_type text NOT NULL,
bytes bigint CHECK (bytes IS NULL OR bytes >= 0),
sha256 bytea, -- 32 bytes; null until confirm
retention_policy_name text NOT NULL REFERENCES retention_policies(name) DEFERRABLE INITIALLY DEFERRED,
owner_actor jsonb NOT NULL, -- { userId | systemId, kind }
owner_scope_refs jsonb NOT NULL DEFAULT '{}'::jsonb, -- { propertyId, photoSlot } | { reservationId, guestId } | { invoiceId } | ...
ai_provenance jsonb, -- AIProvenance VO when AI-enriched
alt_text jsonb, -- I18nString { default, values }
tags text[] NOT NULL DEFAULT '{}',
alias_of_file_id text REFERENCES file_objects(id) ON DELETE SET NULL,
hard_delete_after timestamptz, -- computed at upload from policy
archived_at timestamptz,
purged_at timestamptz,
quarantined_at timestamptz,
legal_hold_until timestamptz, -- override; sweeper respects
version int NOT NULL DEFAULT 1,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (tenant_id, object_key)
);
CREATE INDEX file_objects_tenant_status_idx ON file_objects (tenant_id, status);
CREATE INDEX file_objects_tenant_scope_status_idx ON file_objects (tenant_id, scope, status);
CREATE INDEX file_objects_tenant_sha_idx ON file_objects (tenant_id, scope, sha256) WHERE sha256 IS NOT NULL AND status = 'ready';
CREATE INDEX file_objects_owner_property_idx ON file_objects ((owner_scope_refs->>'propertyId')) WHERE owner_scope_refs ? 'propertyId';
CREATE INDEX file_objects_owner_guest_idx ON file_objects ((owner_scope_refs->>'guestId')) WHERE owner_scope_refs ? 'guestId';
CREATE INDEX file_objects_owner_reservation_idx ON file_objects ((owner_scope_refs->>'reservationId')) WHERE owner_scope_refs ? 'reservationId';
CREATE INDEX file_objects_owner_invoice_idx ON file_objects ((owner_scope_refs->>'invoiceId')) WHERE owner_scope_refs ? 'invoiceId';
CREATE INDEX file_objects_hard_delete_after_idx ON file_objects (hard_delete_after) WHERE status IN ('archived','quarantined');
CREATE INDEX file_objects_tags_gin_idx ON file_objects USING gin (tags);
ALTER TABLE file_objects ENABLE ROW LEVEL SECURITY;
CREATE POLICY file_objects_tenant_isolation ON file_objects
USING (tenant_id = current_setting('app.tenant_id', true))
WITH CHECK (tenant_id = current_setting('app.tenant_id', true));
-- Cross-tenant alias guard: aliasOfFileId must be in same tenant.
CREATE OR REPLACE FUNCTION file_objects_alias_same_tenant_check() RETURNS trigger AS $$
BEGIN
IF NEW.alias_of_file_id IS NOT NULL THEN
PERFORM 1 FROM file_objects WHERE id = NEW.alias_of_file_id AND tenant_id = NEW.tenant_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'cross-tenant alias forbidden' USING ERRCODE = '23514';
END IF;
END IF;
RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER trg_file_objects_alias_check BEFORE INSERT OR UPDATE OF alias_of_file_id
ON file_objects FOR EACH ROW EXECUTE FUNCTION file_objects_alias_same_tenant_check();
-- ObjectKey prefix integrity: enforce that object_key starts with 'tenants/<tenantId>/'.
ALTER TABLE file_objects ADD CONSTRAINT file_objects_prefix_chk
CHECK (object_key LIKE 'tenants/' || tenant_id || '/%');
2.3 upload_sessions
CREATE TABLE upload_sessions (
id text PRIMARY KEY CHECK (id LIKE 'ups_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
object_key text NOT NULL,
signed_url text NOT NULL,
resumable_session_uri text, -- only set for resumable uploads
status text NOT NULL CHECK (status IN ('open','completed','aborted','expired')),
resumable boolean NOT NULL DEFAULT false,
chunk_size_bytes int NOT NULL DEFAULT 0,
bytes_received bigint NOT NULL DEFAULT 0,
reserved_bytes bigint NOT NULL DEFAULT 0, -- speculative quota reservation
declared_bytes bigint NOT NULL,
expires_at timestamptz NOT NULL,
closed_at timestamptz,
closed_reason text,
idempotency_key text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (tenant_id, idempotency_key)
);
CREATE INDEX upload_sessions_open_expiry_idx ON upload_sessions (expires_at) WHERE status = 'open';
CREATE INDEX upload_sessions_tenant_status_idx ON upload_sessions (tenant_id, status);
ALTER TABLE upload_sessions ENABLE ROW LEVEL SECURITY;
CREATE POLICY upload_sessions_tenant_isolation ON upload_sessions
USING (tenant_id = current_setting('app.tenant_id', true))
WITH CHECK (tenant_id = current_setting('app.tenant_id', true));
2.4 variants
CREATE TABLE variants (
id text PRIMARY KEY CHECK (id LIKE 'var_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
preset text NOT NULL CHECK (preset IN (
'thumb','hero','full',
'avif_thumb','avif_hero','avif_full',
'hls_720p','hls_480p','poster')),
object_key text NOT NULL,
content_type text NOT NULL,
bytes bigint NOT NULL CHECK (bytes >= 0),
width_px int,
height_px int,
duration_ms int, -- video only
status text NOT NULL CHECK (status IN ('pending','ready','failed')),
failure_reason text,
attempts int NOT NULL DEFAULT 0,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (file_object_id, preset)
);
CREATE INDEX variants_tenant_file_idx ON variants (tenant_id, file_object_id);
CREATE INDEX variants_failed_idx ON variants (status) WHERE status = 'failed';
ALTER TABLE variants ENABLE ROW LEVEL SECURITY;
CREATE POLICY variants_tenant_isolation ON variants
USING (tenant_id = current_setting('app.tenant_id', true))
WITH CHECK (tenant_id = current_setting('app.tenant_id', true));
2.5 scan_results
CREATE TABLE scan_results (
id text PRIMARY KEY CHECK (id LIKE 'scn_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
scanner text NOT NULL CHECK (scanner IN ('clamav','cloud_dlp')),
verdict text NOT NULL CHECK (verdict IN ('passed','failed','inconclusive')),
threats text[] NOT NULL DEFAULT '{}',
engine_version text NOT NULL,
definitions_version text,
scanned_at timestamptz NOT NULL,
raw_response jsonb, -- redacted; full response retained for forensics 30d
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX scan_results_tenant_file_idx ON scan_results (tenant_id, file_object_id, scanned_at DESC);
CREATE INDEX scan_results_failed_idx ON scan_results (tenant_id, verdict) WHERE verdict IN ('failed','inconclusive');
ALTER TABLE scan_results ENABLE ROW LEVEL SECURITY;
CREATE POLICY scan_results_tenant_isolation ON scan_results
USING (tenant_id = current_setting('app.tenant_id', true));
2.6 access_grants (signed-URL audit)
CREATE TABLE access_grants (
id text PRIMARY KEY CHECK (id LIKE 'grt_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
variant_preset text, -- null = original
actor jsonb NOT NULL,
purpose text NOT NULL CHECK (purpose IN ('view','download','attach','embed')),
caller_ip inet,
caller_user_agent text,
signature_fingerprint text NOT NULL, -- sha256 of GCS Signature query param
issued_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,
revoked_at timestamptz,
revoked_reason text
) PARTITION BY RANGE (issued_at);
-- Monthly partitions; created in advance by partition-cron job.
CREATE INDEX access_grants_tenant_file_issued_idx ON access_grants (tenant_id, file_object_id, issued_at DESC);
CREATE INDEX access_grants_fingerprint_idx ON access_grants (signature_fingerprint);
ALTER TABLE access_grants ENABLE ROW LEVEL SECURITY;
CREATE POLICY access_grants_tenant_isolation ON access_grants
USING (tenant_id = current_setting('app.tenant_id', true));
The audit table is partitioned monthly because issueDownloadUrl is the single highest-volume call in the service (one row per signed URL). Old partitions (> 13 months) are dropped after export to BigQuery (see OBSERVABILITY §6).
2.7 retention_policies
CREATE TABLE retention_policies (
id text PRIMARY KEY CHECK (id LIKE 'ret_%'),
tenant_id text, -- null = platform default
name text NOT NULL, -- 'pii_id_scan', 'tax_compliance', ...
retention_class text NOT NULL CHECK (retention_class IN ('operational','regulated','audit')),
min_retention_days int NOT NULL CHECK (min_retention_days >= 0),
max_retention_days int CHECK (max_retention_days IS NULL OR max_retention_days >= min_retention_days),
redaction_after_days int,
hard_delete_after_days int NOT NULL,
jurisdiction char(2), -- ISO 3166-1 alpha-2 or null = global
active boolean NOT NULL DEFAULT true,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE NULLS NOT DISTINCT (tenant_id, name, jurisdiction)
);
CREATE INDEX retention_policies_lookup_idx ON retention_policies (tenant_id, name, jurisdiction) WHERE active;
ALTER TABLE retention_policies ENABLE ROW LEVEL SECURITY;
CREATE POLICY retention_policies_visibility ON retention_policies
USING (tenant_id IS NULL OR tenant_id = current_setting('app.tenant_id', true));
Seeded platform defaults (loaded by migration 0010_seed_retention_policies.sql):
name | class | min | max | redaction_after | hard_delete_after | jurisdiction |
|---|---|---|---|---|---|---|
default | operational | 0 | null | null | 90 | null |
pii_id_scan | regulated | 0 | 30 | 30 | 30 | AF |
pii_id_scan | regulated | 30 | 90 | 90 | 90 | null (global default) |
tax_compliance | regulated | 2555 | null | null | 2555 | null (≈ 7 y) |
vendor_lock_report | operational | 0 | 365 | null | 365 | null |
theme_asset | operational | 0 | null | null | 30 | null (only on supersede) |
invoice_pdf | regulated | 2555 | null | null | 2555 | null |
short_lived_attachment | operational | 0 | 30 | null | 30 | null |
2.8 retention_holds
CREATE TABLE retention_holds (
id text PRIMARY KEY CHECK (id LIKE 'hld_%'),
tenant_id text NOT NULL,
file_object_id text REFERENCES file_objects(id) ON DELETE CASCADE,
scope_kind text NOT NULL CHECK (scope_kind IN ('file','guest','reservation','property','tenant','legal_case')),
scope_ref jsonb NOT NULL,
reason text NOT NULL,
hold_until timestamptz NOT NULL,
created_by jsonb NOT NULL,
released_at timestamptz,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX retention_holds_tenant_file_idx ON retention_holds (tenant_id, file_object_id);
CREATE INDEX retention_holds_release_due_idx ON retention_holds (hold_until) WHERE released_at IS NULL;
ALTER TABLE retention_holds ENABLE ROW LEVEL SECURITY;
CREATE POLICY retention_holds_tenant_isolation ON retention_holds
USING (tenant_id = current_setting('app.tenant_id', true));
2.9 quotas and quota_counters
CREATE TABLE quotas (
tenant_id text PRIMARY KEY CHECK (tenant_id LIKE 'tnt_%'),
cap_bytes bigint NOT NULL,
cap_objects bigint NOT NULL,
by_scope_caps jsonb NOT NULL DEFAULT '{}'::jsonb, -- optional per-scope caps
warned_80_at timestamptz,
warned_95_at timestamptz,
blocked_at timestamptz,
plan_code text NOT NULL,
updated_at timestamptz NOT NULL DEFAULT now()
);
ALTER TABLE quotas ENABLE ROW LEVEL SECURITY;
CREATE POLICY quotas_tenant_isolation ON quotas
USING (tenant_id = current_setting('app.tenant_id', true));
CREATE TABLE quota_counters (
tenant_id text NOT NULL,
scope text NOT NULL,
bytes_used bigint NOT NULL DEFAULT 0,
objects_used bigint NOT NULL DEFAULT 0,
bytes_reserved bigint NOT NULL DEFAULT 0, -- speculative reservations from open sessions
updated_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (tenant_id, scope)
);
ALTER TABLE quota_counters ENABLE ROW LEVEL SECURITY;
CREATE POLICY quota_counters_tenant_isolation ON quota_counters
USING (tenant_id = current_setting('app.tenant_id', true));
Counter updates use single-shot CAS:
UPDATE quota_counters
SET bytes_used = bytes_used + $1,
objects_used = objects_used + 1,
bytes_reserved = bytes_reserved - $1,
updated_at = now()
WHERE tenant_id = $2 AND scope = $3
RETURNING bytes_used, objects_used;
The aggregate roll-up (SELECT sum(bytes_used) FROM quota_counters WHERE tenant_id = ?) is checked before any reservation; the WITH CHECK clause bytes_used + ? <= cap_bytes is enforced at the application layer because cap_bytes lives in the parent quotas row and we want the operation to fail with the canonical MELMASTOON.FILE.QUOTA_EXCEEDED rather than a generic constraint violation.
2.10 erasure_requests
CREATE TABLE erasure_requests (
id text PRIMARY KEY CHECK (id LIKE 'ers_%'),
tenant_id text NOT NULL,
scope_kind text NOT NULL CHECK (scope_kind IN ('guest','tenant','reservation','property')),
scope_ref jsonb NOT NULL,
requested_by jsonb NOT NULL,
reason text NOT NULL,
ticket_id text,
status text NOT NULL CHECK (status IN ('queued','in_progress','completed','partial','failed')),
matched_objects int NOT NULL DEFAULT 0,
purged_objects int NOT NULL DEFAULT 0,
deferred_objects int NOT NULL DEFAULT 0,
purged_bytes bigint NOT NULL DEFAULT 0,
certificate_sha256 text,
certificate_signer text,
errors jsonb NOT NULL DEFAULT '[]'::jsonb,
requested_at timestamptz NOT NULL DEFAULT now(),
completed_at timestamptz,
idempotency_key text NOT NULL,
UNIQUE (tenant_id, idempotency_key)
);
CREATE INDEX erasure_requests_tenant_status_idx ON erasure_requests (tenant_id, status);
ALTER TABLE erasure_requests ENABLE ROW LEVEL SECURITY;
CREATE POLICY erasure_requests_tenant_isolation ON erasure_requests
USING (tenant_id = current_setting('app.tenant_id', true));
CREATE TABLE erasure_request_items (
erasure_request_id text NOT NULL REFERENCES erasure_requests(id) ON DELETE CASCADE,
tenant_id text NOT NULL,
file_object_id text NOT NULL,
outcome text NOT NULL CHECK (outcome IN ('purged','deferred','failed')),
reason text,
released_at timestamptz,
PRIMARY KEY (erasure_request_id, file_object_id)
);
ALTER TABLE erasure_request_items ENABLE ROW LEVEL SECURITY;
CREATE POLICY erasure_request_items_tenant_isolation ON erasure_request_items
USING (tenant_id = current_setting('app.tenant_id', true));
2.11 outbox and inbox
CREATE TABLE outbox (
id bigserial PRIMARY KEY,
tenant_id text NOT NULL,
event_id text NOT NULL UNIQUE, -- evt_<ULID>
event_type text NOT NULL,
topic text NOT NULL,
ordering_key text NOT NULL,
payload jsonb NOT NULL,
attributes jsonb NOT NULL, -- envelope attributes
occurred_at timestamptz NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
published_at timestamptz,
attempts int NOT NULL DEFAULT 0,
last_error text
);
CREATE INDEX outbox_unpublished_idx ON outbox (created_at) WHERE published_at IS NULL;
CREATE TABLE inbox (
message_id text PRIMARY KEY, -- Pub/Sub message id
tenant_id text NOT NULL,
event_id text NOT NULL,
event_type text NOT NULL,
handler text NOT NULL,
received_at timestamptz NOT NULL DEFAULT now(),
applied_at timestamptz,
UNIQUE (event_id, handler)
);
CREATE INDEX inbox_tenant_idx ON inbox (tenant_id, received_at DESC);
outbox and inbox are not RLS-enabled; the relay/consumer workers operate as the platform service account and need to scan across tenants. Each row carries tenant_id for downstream filtering.
2.12 idempotency_records
CREATE TABLE idempotency_records (
tenant_id text NOT NULL,
route_hash text NOT NULL, -- sha256(method + path)
idempotency_key text NOT NULL,
request_hash text NOT NULL, -- sha256(canonical body)
response_status int NOT NULL,
response_body bytea NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,
PRIMARY KEY (tenant_id, route_hash, idempotency_key)
);
CREATE INDEX idempotency_records_expiry_idx ON idempotency_records (expires_at);
ALTER TABLE idempotency_records ENABLE ROW LEVEL SECURITY;
CREATE POLICY idempotency_records_tenant_isolation ON idempotency_records
USING (tenant_id = current_setting('app.tenant_id', true));
A per-hour cron purges expired rows. TTL = 24 h.
2.13 signed_url_blacklist (Redis-mirrored)
A small Postgres backstop for revoked signature fingerprints — used at startup to seed the Redis ZSET so that revocations survive a Redis flush.
CREATE TABLE signed_url_blacklist (
signature_fingerprint text PRIMARY KEY,
tenant_id text NOT NULL,
file_object_id text NOT NULL,
revoked_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL, -- = grant.expires_at
reason text NOT NULL
);
CREATE INDEX signed_url_blacklist_expiry_idx ON signed_url_blacklist (expires_at);
3. TypeScript types (kysely / contract types)
// infrastructure/postgres/schema.ts
import type { Generated, ColumnType } from 'kysely';
import type { TenantId, FileObjectId, BucketId, UploadSessionId, VariantId,
ScanResultId, RetentionPolicyId, AccessGrantId, ErasureRequestId,
RetentionHoldId, AIProvenance, I18nString, ActorRef } from '@ghasi/contracts-melmastoon';
export interface FileObjectsTable {
id: FileObjectId;
tenant_id: TenantId;
bucket_id: BucketId;
scope: Scope;
data_class: DataClass;
object_key: string;
status: FileStatus;
content_type: string;
bytes: ColumnType<bigint | null, bigint | null, bigint | null>;
sha256: ColumnType<Buffer | null, Buffer | null, Buffer | null>;
retention_policy_name: string;
owner_actor: ActorRef;
owner_scope_refs: Record<string, string>;
ai_provenance: AIProvenance | null;
alt_text: I18nString | null;
tags: string[];
alias_of_file_id: FileObjectId | null;
hard_delete_after: ColumnType<Date | null, Date | null, Date | null>;
archived_at: ColumnType<Date | null, Date | null, Date | null>;
purged_at: ColumnType<Date | null, Date | null, Date | null>;
quarantined_at: ColumnType<Date | null, Date | null, Date | null>;
legal_hold_until: ColumnType<Date | null, Date | null, Date | null>;
version: ColumnType<number, number, number>;
created_at: ColumnType<Date, never, never>;
updated_at: ColumnType<Date, Date, Date>;
}
export type Scope = 'property_photo' | 'tenant_logo' | 'theme_asset'
| 'invoice_pdf' | 'receipt_scan' | 'guest_id_scan'
| 'vendor_lock_report' | 'notification_attachment' | 'misc';
export type DataClass = 'public_media' | 'private' | 'archive';
export type FileStatus = 'initiated' | 'uploaded' | 'scanning' | 'ready'
| 'quarantined' | 'archived' | 'purged';
4. Migrations
Migrations live under services/file-storage-service/db/migrations/ and are executed by node-pg-migrate in the pre-deploy Helm hook.
| # | File | Notes |
|---|---|---|
| 0001 | init_schema.sql | CREATE SCHEMA file_storage; pgcrypto + pg_trgm extensions |
| 0002 | buckets.sql | logical buckets table + seed rows |
| 0003 | retention_policies.sql | table + seed of platform defaults |
| 0004 | file_objects.sql | central aggregate + RLS + alias trigger + prefix CHECK |
| 0005 | upload_sessions.sql | + per-tenant idempotency unique |
| 0006 | variants.sql | + (file_object_id, preset) unique |
| 0007 | scan_results.sql | |
| 0008 | access_grants.sql | partitioned monthly; partition-cron runs in deploy job |
| 0009 | retention_holds.sql | |
| 0010 | quotas.sql | + quota_counters + per-tenant defaults trigger |
| 0011 | erasure_requests.sql | + items child |
| 0012 | outbox_inbox.sql | unindexed RLS by design |
| 0013 | idempotency_records.sql | |
| 0014 | signed_url_blacklist.sql | |
| 0015 | seed_retention_policies.sql | platform retention defaults |
| 0016 | seed_buckets.sql | bkt_media / bkt_private / bkt_archive |
Per-service migration ordinals reset to 0001. The CI gate migration-check enforces (a) every new table has tenant_id, (b) every multi-tenant table has an <table>_tenant_isolation policy, (c) no DROP COLUMN without an ADR.
5. RLS, prefix isolation, and the ObjectKey invariant
Three layers of defense for cross-tenant prevention; an integration test covers each:
- Domain (
ObjectKey.toGcsKey) always emitstenants/<tenantId>/.... Unit-tested. - Postgres (
file_objects_prefix_chkCHECK + RLS) rejects rows whoseobject_keydoesn't start withtenants/<tenantId>/. Integration-tested by attempting an insert with a forged key. - GCS IAM — the service account holds
roles/storage.objectAdminon the buckets but the buckets carry no public IAM grants anduniformBucketLevelAccess=true. Signed URLs are issued by the service-account key (or workload identity) and cryptographically bind the path. A test spec attempts to access a known object key from a different tenant context and asserts that (a) RLS hides the metadata row and (b) the issuance logic refuses to construct anObjectKeyfor a differenttenant_id.
6. Performance posture
| Operation | Index used | p95 target |
|---|---|---|
Initiate upload (insert file_objects + upload_sessions + outbox) | per-tenant unique on idempotency_key; prefix CHECK eval | < 80 ms (DB) |
Confirm upload (UPDATE file_objects + dedupe lookup) | file_objects_tenant_sha_idx partial | < 60 ms (DB) |
Issue download URL (insert access_grants) | partition-pruned write | < 20 ms (DB) |
| Get metadata | file_objects_pkey | < 5 ms (DB) |
| Quota query | aggregate over quota_counters (≤ 9 rows / tenant) | < 10 ms (DB) |
| Sweeper (retention) | file_objects_hard_delete_after_idx + retention_holds_release_due_idx | scans 100 k rows / s |
| Erasure by guest | file_objects_owner_guest_idx GIN-on-jsonb-path | < 200 ms for 1 k matches |
7. Capacity sizing (year-2 plan)
| Table | Rows / tenant | Tenants | Total | Hot fraction |
|---|---|---|---|---|
file_objects | ~50 000 | 5 000 | 250 M | 5 % |
variants | ~200 000 | 5 000 | 1 B | 5 % |
access_grants | ~10 M / month / tenant peak | 5 000 | 50 B / month | last 30 d (partitioned) |
outbox | ~2 events / file | 5 000 | published-and-pruned in 24 h | working set < 100 k |
access_grants partitioning + monthly drop is mandatory to keep the table footprint bounded.
8. Backup & PITR
- Automated daily backup retained for 35 days (Cloud SQL automated backups).
- Point-in-time recovery for the last 7 days (binlog-equivalent WAL retention).
- A cross-region read replica is promoted on regional outage; data loss budget ≤ 5 min (SLO).
- GCS buckets carry versioning with a 30-day lifecycle delete; soft-deleted bytes are recoverable for that window.
9. Sync to BigQuery (analytics)
A Datastream job replicates file_objects, variants, access_grants, quota_counters to the analytics_file_storage dataset for the SLO/usage dashboards (latency p95 ≤ 5 min). PII columns (alt_text, owner_scope_refs.guestId) are dropped at the Datastream transform before landing.
10. References
- DOMAIN_MODEL for invariants enforced in code.
- SECURITY_MODEL §3 for the prefix invariant test surface.
- docs/06-data-models.md for the platform-wide conventions on RLS, money, and naming.