Skip to main content

file-storage-service — DATA_MODEL

Companion: DOMAIN_MODEL · APPLICATION_LOGIC · SECURITY_MODEL · 06 Data Models · ADR-0002 Multi-Tenancy · standards/NAMING.md

Storage is Cloud SQL Postgres 16 (regional HA, europe-west4 primary; cross-region read replica in europe-west1 for the retention sweeper) for all metadata, audit, and policy rows. The blob store is Google Cloud Storage: three platform buckets (melmastoon-media-prod, melmastoon-private-prod, melmastoon-archive-prod) with mandatory tenants/{tenantId}/... key prefix. CDN edge cache is Google Cloud CDN in front of melmastoon-media-prod only. Metadata lives in the single platform schema file_storage; tenant isolation is enforced by RLS on tenant_id and by the domain ObjectKey invariant — file storage is not per-tenant-schema (different from billing-service); a single shared schema lets us run one set of cross-tenant background sweepers efficiently.

PgBouncer is in transaction pooling mode. Every request handler calls SET LOCAL app.tenant_id = '<tnt_…>' immediately after checkout; every RLS policy reads current_setting('app.tenant_id', true). The mandatory tenant-isolation.spec.ts integration test proves that omission of the SET fails the policy (read returns zero rows; write raises 42501 insufficient_privilege).

1. ID prefixes (additions to NAMING.md)

PrefixAggregate / table
med_file_objects (existing in NAMING)
bkt_buckets (logical)
ups_upload_sessions
var_variants
scn_scan_results
ret_retention_policies
grt_access_grants
ers_erasure_requests
hld_retention_holds
qta_quotas (one per tenant; primary key tenantId)

All ULIDs (Crockford 26-char). Money is irrelevant to this service. bytes columns are bigint (signed); GCS objects are capped at 5 TiB so a bigint is comfortable. SHA-256 stored as bytea (32 bytes) with a text accessor for hex.

2. Schema layout (file_storage)

CREATE SCHEMA IF NOT EXISTS file_storage;
SET search_path TO file_storage, public;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

Every table below has:

  • tenant_id text NOT NULL (the tnt_… ULID).
  • created_at timestamptz NOT NULL DEFAULT now() and updated_at timestamptz NOT NULL DEFAULT now() where mutable.
  • version int NOT NULL DEFAULT 1 on every aggregate root for optimistic concurrency.
  • An RLS policy named <table>_tenant_isolation that reads current_setting('app.tenant_id', true).
  • A composite index leading with tenant_id.

2.1 buckets (logical bucket registry — platform-scoped)

CREATE TABLE buckets (
id text PRIMARY KEY CHECK (id LIKE 'bkt_%'),
name text NOT NULL UNIQUE, -- 'media' | 'private' | 'archive'
gcs_bucket text NOT NULL, -- 'melmastoon-media-prod'
data_class text NOT NULL CHECK (data_class IN ('public_media','private','archive')),
cdn_enabled boolean NOT NULL DEFAULT false,
cmek_key_resource text, -- projects/.../cryptoKeys/...
default_retention text NOT NULL, -- retention_policies.name
created_at timestamptz NOT NULL DEFAULT now()
);
-- platform-only table; RLS deliberately disabled. Read-only at runtime; rows seeded by terraform.

Seeded rows:

idnamegcs_bucketdata_classcdn_enableddefault_retention
bkt_mediamediamelmastoon-media-prodpublic_mediatruedefault
bkt_privateprivatemelmastoon-private-prodprivatefalsedefault
bkt_archivearchivemelmastoon-archive-prodarchivefalsetax_compliance

2.2 file_objects (the central aggregate)

CREATE TABLE file_objects (
id text PRIMARY KEY CHECK (id LIKE 'med_%'),
tenant_id text NOT NULL CHECK (tenant_id LIKE 'tnt_%'),
bucket_id text NOT NULL REFERENCES buckets(id) ON DELETE RESTRICT,
scope text NOT NULL CHECK (scope IN (
'property_photo','tenant_logo','theme_asset',
'invoice_pdf','receipt_scan','guest_id_scan',
'vendor_lock_report','notification_attachment','misc')),
data_class text NOT NULL CHECK (data_class IN ('public_media','private','archive')),
object_key text NOT NULL, -- 'tenants/tnt_../scope/.../med_..ext'
status text NOT NULL CHECK (status IN (
'initiated','uploaded','scanning','ready',
'quarantined','archived','purged')),
content_type text NOT NULL,
bytes bigint CHECK (bytes IS NULL OR bytes >= 0),
sha256 bytea, -- 32 bytes; null until confirm
retention_policy_name text NOT NULL REFERENCES retention_policies(name) DEFERRABLE INITIALLY DEFERRED,
owner_actor jsonb NOT NULL, -- { userId | systemId, kind }
owner_scope_refs jsonb NOT NULL DEFAULT '{}'::jsonb, -- { propertyId, photoSlot } | { reservationId, guestId } | { invoiceId } | ...
ai_provenance jsonb, -- AIProvenance VO when AI-enriched
alt_text jsonb, -- I18nString { default, values }
tags text[] NOT NULL DEFAULT '{}',
alias_of_file_id text REFERENCES file_objects(id) ON DELETE SET NULL,
hard_delete_after timestamptz, -- computed at upload from policy
archived_at timestamptz,
purged_at timestamptz,
quarantined_at timestamptz,
legal_hold_until timestamptz, -- override; sweeper respects
version int NOT NULL DEFAULT 1,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (tenant_id, object_key)
);

CREATE INDEX file_objects_tenant_status_idx ON file_objects (tenant_id, status);
CREATE INDEX file_objects_tenant_scope_status_idx ON file_objects (tenant_id, scope, status);
CREATE INDEX file_objects_tenant_sha_idx ON file_objects (tenant_id, scope, sha256) WHERE sha256 IS NOT NULL AND status = 'ready';
CREATE INDEX file_objects_owner_property_idx ON file_objects ((owner_scope_refs->>'propertyId')) WHERE owner_scope_refs ? 'propertyId';
CREATE INDEX file_objects_owner_guest_idx ON file_objects ((owner_scope_refs->>'guestId')) WHERE owner_scope_refs ? 'guestId';
CREATE INDEX file_objects_owner_reservation_idx ON file_objects ((owner_scope_refs->>'reservationId')) WHERE owner_scope_refs ? 'reservationId';
CREATE INDEX file_objects_owner_invoice_idx ON file_objects ((owner_scope_refs->>'invoiceId')) WHERE owner_scope_refs ? 'invoiceId';
CREATE INDEX file_objects_hard_delete_after_idx ON file_objects (hard_delete_after) WHERE status IN ('archived','quarantined');
CREATE INDEX file_objects_tags_gin_idx ON file_objects USING gin (tags);

ALTER TABLE file_objects ENABLE ROW LEVEL SECURITY;
CREATE POLICY file_objects_tenant_isolation ON file_objects
USING (tenant_id = current_setting('app.tenant_id', true))
WITH CHECK (tenant_id = current_setting('app.tenant_id', true));

-- Cross-tenant alias guard: aliasOfFileId must be in same tenant.
CREATE OR REPLACE FUNCTION file_objects_alias_same_tenant_check() RETURNS trigger AS $$
BEGIN
IF NEW.alias_of_file_id IS NOT NULL THEN
PERFORM 1 FROM file_objects WHERE id = NEW.alias_of_file_id AND tenant_id = NEW.tenant_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'cross-tenant alias forbidden' USING ERRCODE = '23514';
END IF;
END IF;
RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER trg_file_objects_alias_check BEFORE INSERT OR UPDATE OF alias_of_file_id
ON file_objects FOR EACH ROW EXECUTE FUNCTION file_objects_alias_same_tenant_check();

-- ObjectKey prefix integrity: enforce that object_key starts with 'tenants/<tenantId>/'.
ALTER TABLE file_objects ADD CONSTRAINT file_objects_prefix_chk
CHECK (object_key LIKE 'tenants/' || tenant_id || '/%');

2.3 upload_sessions

CREATE TABLE upload_sessions (
id text PRIMARY KEY CHECK (id LIKE 'ups_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
object_key text NOT NULL,
signed_url text NOT NULL,
resumable_session_uri text, -- only set for resumable uploads
status text NOT NULL CHECK (status IN ('open','completed','aborted','expired')),
resumable boolean NOT NULL DEFAULT false,
chunk_size_bytes int NOT NULL DEFAULT 0,
bytes_received bigint NOT NULL DEFAULT 0,
reserved_bytes bigint NOT NULL DEFAULT 0, -- speculative quota reservation
declared_bytes bigint NOT NULL,
expires_at timestamptz NOT NULL,
closed_at timestamptz,
closed_reason text,
idempotency_key text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (tenant_id, idempotency_key)
);
CREATE INDEX upload_sessions_open_expiry_idx ON upload_sessions (expires_at) WHERE status = 'open';
CREATE INDEX upload_sessions_tenant_status_idx ON upload_sessions (tenant_id, status);
ALTER TABLE upload_sessions ENABLE ROW LEVEL SECURITY;
CREATE POLICY upload_sessions_tenant_isolation ON upload_sessions
USING (tenant_id = current_setting('app.tenant_id', true))
WITH CHECK (tenant_id = current_setting('app.tenant_id', true));

2.4 variants

CREATE TABLE variants (
id text PRIMARY KEY CHECK (id LIKE 'var_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
preset text NOT NULL CHECK (preset IN (
'thumb','hero','full',
'avif_thumb','avif_hero','avif_full',
'hls_720p','hls_480p','poster')),
object_key text NOT NULL,
content_type text NOT NULL,
bytes bigint NOT NULL CHECK (bytes >= 0),
width_px int,
height_px int,
duration_ms int, -- video only
status text NOT NULL CHECK (status IN ('pending','ready','failed')),
failure_reason text,
attempts int NOT NULL DEFAULT 0,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (file_object_id, preset)
);
CREATE INDEX variants_tenant_file_idx ON variants (tenant_id, file_object_id);
CREATE INDEX variants_failed_idx ON variants (status) WHERE status = 'failed';
ALTER TABLE variants ENABLE ROW LEVEL SECURITY;
CREATE POLICY variants_tenant_isolation ON variants
USING (tenant_id = current_setting('app.tenant_id', true))
WITH CHECK (tenant_id = current_setting('app.tenant_id', true));

2.5 scan_results

CREATE TABLE scan_results (
id text PRIMARY KEY CHECK (id LIKE 'scn_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
scanner text NOT NULL CHECK (scanner IN ('clamav','cloud_dlp')),
verdict text NOT NULL CHECK (verdict IN ('passed','failed','inconclusive')),
threats text[] NOT NULL DEFAULT '{}',
engine_version text NOT NULL,
definitions_version text,
scanned_at timestamptz NOT NULL,
raw_response jsonb, -- redacted; full response retained for forensics 30d
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX scan_results_tenant_file_idx ON scan_results (tenant_id, file_object_id, scanned_at DESC);
CREATE INDEX scan_results_failed_idx ON scan_results (tenant_id, verdict) WHERE verdict IN ('failed','inconclusive');
ALTER TABLE scan_results ENABLE ROW LEVEL SECURITY;
CREATE POLICY scan_results_tenant_isolation ON scan_results
USING (tenant_id = current_setting('app.tenant_id', true));

2.6 access_grants (signed-URL audit)

CREATE TABLE access_grants (
id text PRIMARY KEY CHECK (id LIKE 'grt_%'),
tenant_id text NOT NULL,
file_object_id text NOT NULL REFERENCES file_objects(id) ON DELETE CASCADE,
variant_preset text, -- null = original
actor jsonb NOT NULL,
purpose text NOT NULL CHECK (purpose IN ('view','download','attach','embed')),
caller_ip inet,
caller_user_agent text,
signature_fingerprint text NOT NULL, -- sha256 of GCS Signature query param
issued_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,
revoked_at timestamptz,
revoked_reason text
) PARTITION BY RANGE (issued_at);

-- Monthly partitions; created in advance by partition-cron job.
CREATE INDEX access_grants_tenant_file_issued_idx ON access_grants (tenant_id, file_object_id, issued_at DESC);
CREATE INDEX access_grants_fingerprint_idx ON access_grants (signature_fingerprint);
ALTER TABLE access_grants ENABLE ROW LEVEL SECURITY;
CREATE POLICY access_grants_tenant_isolation ON access_grants
USING (tenant_id = current_setting('app.tenant_id', true));

The audit table is partitioned monthly because issueDownloadUrl is the single highest-volume call in the service (one row per signed URL). Old partitions (> 13 months) are dropped after export to BigQuery (see OBSERVABILITY §6).

2.7 retention_policies

CREATE TABLE retention_policies (
id text PRIMARY KEY CHECK (id LIKE 'ret_%'),
tenant_id text, -- null = platform default
name text NOT NULL, -- 'pii_id_scan', 'tax_compliance', ...
retention_class text NOT NULL CHECK (retention_class IN ('operational','regulated','audit')),
min_retention_days int NOT NULL CHECK (min_retention_days >= 0),
max_retention_days int CHECK (max_retention_days IS NULL OR max_retention_days >= min_retention_days),
redaction_after_days int,
hard_delete_after_days int NOT NULL,
jurisdiction char(2), -- ISO 3166-1 alpha-2 or null = global
active boolean NOT NULL DEFAULT true,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
UNIQUE NULLS NOT DISTINCT (tenant_id, name, jurisdiction)
);
CREATE INDEX retention_policies_lookup_idx ON retention_policies (tenant_id, name, jurisdiction) WHERE active;
ALTER TABLE retention_policies ENABLE ROW LEVEL SECURITY;
CREATE POLICY retention_policies_visibility ON retention_policies
USING (tenant_id IS NULL OR tenant_id = current_setting('app.tenant_id', true));

Seeded platform defaults (loaded by migration 0010_seed_retention_policies.sql):

nameclassminmaxredaction_afterhard_delete_afterjurisdiction
defaultoperational0nullnull90null
pii_id_scanregulated0303030AF
pii_id_scanregulated30909090null (global default)
tax_complianceregulated2555nullnull2555null (≈ 7 y)
vendor_lock_reportoperational0365null365null
theme_assetoperational0nullnull30null (only on supersede)
invoice_pdfregulated2555nullnull2555null
short_lived_attachmentoperational030null30null

2.8 retention_holds

CREATE TABLE retention_holds (
id text PRIMARY KEY CHECK (id LIKE 'hld_%'),
tenant_id text NOT NULL,
file_object_id text REFERENCES file_objects(id) ON DELETE CASCADE,
scope_kind text NOT NULL CHECK (scope_kind IN ('file','guest','reservation','property','tenant','legal_case')),
scope_ref jsonb NOT NULL,
reason text NOT NULL,
hold_until timestamptz NOT NULL,
created_by jsonb NOT NULL,
released_at timestamptz,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX retention_holds_tenant_file_idx ON retention_holds (tenant_id, file_object_id);
CREATE INDEX retention_holds_release_due_idx ON retention_holds (hold_until) WHERE released_at IS NULL;
ALTER TABLE retention_holds ENABLE ROW LEVEL SECURITY;
CREATE POLICY retention_holds_tenant_isolation ON retention_holds
USING (tenant_id = current_setting('app.tenant_id', true));

2.9 quotas and quota_counters

CREATE TABLE quotas (
tenant_id text PRIMARY KEY CHECK (tenant_id LIKE 'tnt_%'),
cap_bytes bigint NOT NULL,
cap_objects bigint NOT NULL,
by_scope_caps jsonb NOT NULL DEFAULT '{}'::jsonb, -- optional per-scope caps
warned_80_at timestamptz,
warned_95_at timestamptz,
blocked_at timestamptz,
plan_code text NOT NULL,
updated_at timestamptz NOT NULL DEFAULT now()
);
ALTER TABLE quotas ENABLE ROW LEVEL SECURITY;
CREATE POLICY quotas_tenant_isolation ON quotas
USING (tenant_id = current_setting('app.tenant_id', true));

CREATE TABLE quota_counters (
tenant_id text NOT NULL,
scope text NOT NULL,
bytes_used bigint NOT NULL DEFAULT 0,
objects_used bigint NOT NULL DEFAULT 0,
bytes_reserved bigint NOT NULL DEFAULT 0, -- speculative reservations from open sessions
updated_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (tenant_id, scope)
);
ALTER TABLE quota_counters ENABLE ROW LEVEL SECURITY;
CREATE POLICY quota_counters_tenant_isolation ON quota_counters
USING (tenant_id = current_setting('app.tenant_id', true));

Counter updates use single-shot CAS:

UPDATE quota_counters
SET bytes_used = bytes_used + $1,
objects_used = objects_used + 1,
bytes_reserved = bytes_reserved - $1,
updated_at = now()
WHERE tenant_id = $2 AND scope = $3
RETURNING bytes_used, objects_used;

The aggregate roll-up (SELECT sum(bytes_used) FROM quota_counters WHERE tenant_id = ?) is checked before any reservation; the WITH CHECK clause bytes_used + ? <= cap_bytes is enforced at the application layer because cap_bytes lives in the parent quotas row and we want the operation to fail with the canonical MELMASTOON.FILE.QUOTA_EXCEEDED rather than a generic constraint violation.

2.10 erasure_requests

CREATE TABLE erasure_requests (
id text PRIMARY KEY CHECK (id LIKE 'ers_%'),
tenant_id text NOT NULL,
scope_kind text NOT NULL CHECK (scope_kind IN ('guest','tenant','reservation','property')),
scope_ref jsonb NOT NULL,
requested_by jsonb NOT NULL,
reason text NOT NULL,
ticket_id text,
status text NOT NULL CHECK (status IN ('queued','in_progress','completed','partial','failed')),
matched_objects int NOT NULL DEFAULT 0,
purged_objects int NOT NULL DEFAULT 0,
deferred_objects int NOT NULL DEFAULT 0,
purged_bytes bigint NOT NULL DEFAULT 0,
certificate_sha256 text,
certificate_signer text,
errors jsonb NOT NULL DEFAULT '[]'::jsonb,
requested_at timestamptz NOT NULL DEFAULT now(),
completed_at timestamptz,
idempotency_key text NOT NULL,
UNIQUE (tenant_id, idempotency_key)
);
CREATE INDEX erasure_requests_tenant_status_idx ON erasure_requests (tenant_id, status);
ALTER TABLE erasure_requests ENABLE ROW LEVEL SECURITY;
CREATE POLICY erasure_requests_tenant_isolation ON erasure_requests
USING (tenant_id = current_setting('app.tenant_id', true));

CREATE TABLE erasure_request_items (
erasure_request_id text NOT NULL REFERENCES erasure_requests(id) ON DELETE CASCADE,
tenant_id text NOT NULL,
file_object_id text NOT NULL,
outcome text NOT NULL CHECK (outcome IN ('purged','deferred','failed')),
reason text,
released_at timestamptz,
PRIMARY KEY (erasure_request_id, file_object_id)
);
ALTER TABLE erasure_request_items ENABLE ROW LEVEL SECURITY;
CREATE POLICY erasure_request_items_tenant_isolation ON erasure_request_items
USING (tenant_id = current_setting('app.tenant_id', true));

2.11 outbox and inbox

CREATE TABLE outbox (
id bigserial PRIMARY KEY,
tenant_id text NOT NULL,
event_id text NOT NULL UNIQUE, -- evt_<ULID>
event_type text NOT NULL,
topic text NOT NULL,
ordering_key text NOT NULL,
payload jsonb NOT NULL,
attributes jsonb NOT NULL, -- envelope attributes
occurred_at timestamptz NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
published_at timestamptz,
attempts int NOT NULL DEFAULT 0,
last_error text
);
CREATE INDEX outbox_unpublished_idx ON outbox (created_at) WHERE published_at IS NULL;

CREATE TABLE inbox (
message_id text PRIMARY KEY, -- Pub/Sub message id
tenant_id text NOT NULL,
event_id text NOT NULL,
event_type text NOT NULL,
handler text NOT NULL,
received_at timestamptz NOT NULL DEFAULT now(),
applied_at timestamptz,
UNIQUE (event_id, handler)
);
CREATE INDEX inbox_tenant_idx ON inbox (tenant_id, received_at DESC);

outbox and inbox are not RLS-enabled; the relay/consumer workers operate as the platform service account and need to scan across tenants. Each row carries tenant_id for downstream filtering.

2.12 idempotency_records

CREATE TABLE idempotency_records (
tenant_id text NOT NULL,
route_hash text NOT NULL, -- sha256(method + path)
idempotency_key text NOT NULL,
request_hash text NOT NULL, -- sha256(canonical body)
response_status int NOT NULL,
response_body bytea NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,
PRIMARY KEY (tenant_id, route_hash, idempotency_key)
);
CREATE INDEX idempotency_records_expiry_idx ON idempotency_records (expires_at);
ALTER TABLE idempotency_records ENABLE ROW LEVEL SECURITY;
CREATE POLICY idempotency_records_tenant_isolation ON idempotency_records
USING (tenant_id = current_setting('app.tenant_id', true));

A per-hour cron purges expired rows. TTL = 24 h.

2.13 signed_url_blacklist (Redis-mirrored)

A small Postgres backstop for revoked signature fingerprints — used at startup to seed the Redis ZSET so that revocations survive a Redis flush.

CREATE TABLE signed_url_blacklist (
signature_fingerprint text PRIMARY KEY,
tenant_id text NOT NULL,
file_object_id text NOT NULL,
revoked_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL, -- = grant.expires_at
reason text NOT NULL
);
CREATE INDEX signed_url_blacklist_expiry_idx ON signed_url_blacklist (expires_at);

3. TypeScript types (kysely / contract types)

// infrastructure/postgres/schema.ts
import type { Generated, ColumnType } from 'kysely';
import type { TenantId, FileObjectId, BucketId, UploadSessionId, VariantId,
ScanResultId, RetentionPolicyId, AccessGrantId, ErasureRequestId,
RetentionHoldId, AIProvenance, I18nString, ActorRef } from '@ghasi/contracts-melmastoon';

export interface FileObjectsTable {
id: FileObjectId;
tenant_id: TenantId;
bucket_id: BucketId;
scope: Scope;
data_class: DataClass;
object_key: string;
status: FileStatus;
content_type: string;
bytes: ColumnType<bigint | null, bigint | null, bigint | null>;
sha256: ColumnType<Buffer | null, Buffer | null, Buffer | null>;
retention_policy_name: string;
owner_actor: ActorRef;
owner_scope_refs: Record<string, string>;
ai_provenance: AIProvenance | null;
alt_text: I18nString | null;
tags: string[];
alias_of_file_id: FileObjectId | null;
hard_delete_after: ColumnType<Date | null, Date | null, Date | null>;
archived_at: ColumnType<Date | null, Date | null, Date | null>;
purged_at: ColumnType<Date | null, Date | null, Date | null>;
quarantined_at: ColumnType<Date | null, Date | null, Date | null>;
legal_hold_until: ColumnType<Date | null, Date | null, Date | null>;
version: ColumnType<number, number, number>;
created_at: ColumnType<Date, never, never>;
updated_at: ColumnType<Date, Date, Date>;
}

export type Scope = 'property_photo' | 'tenant_logo' | 'theme_asset'
| 'invoice_pdf' | 'receipt_scan' | 'guest_id_scan'
| 'vendor_lock_report' | 'notification_attachment' | 'misc';
export type DataClass = 'public_media' | 'private' | 'archive';
export type FileStatus = 'initiated' | 'uploaded' | 'scanning' | 'ready'
| 'quarantined' | 'archived' | 'purged';

4. Migrations

Migrations live under services/file-storage-service/db/migrations/ and are executed by node-pg-migrate in the pre-deploy Helm hook.

#FileNotes
0001init_schema.sqlCREATE SCHEMA file_storage; pgcrypto + pg_trgm extensions
0002buckets.sqllogical buckets table + seed rows
0003retention_policies.sqltable + seed of platform defaults
0004file_objects.sqlcentral aggregate + RLS + alias trigger + prefix CHECK
0005upload_sessions.sql+ per-tenant idempotency unique
0006variants.sql+ (file_object_id, preset) unique
0007scan_results.sql
0008access_grants.sqlpartitioned monthly; partition-cron runs in deploy job
0009retention_holds.sql
0010quotas.sql+ quota_counters + per-tenant defaults trigger
0011erasure_requests.sql+ items child
0012outbox_inbox.sqlunindexed RLS by design
0013idempotency_records.sql
0014signed_url_blacklist.sql
0015seed_retention_policies.sqlplatform retention defaults
0016seed_buckets.sqlbkt_media / bkt_private / bkt_archive

Per-service migration ordinals reset to 0001. The CI gate migration-check enforces (a) every new table has tenant_id, (b) every multi-tenant table has an <table>_tenant_isolation policy, (c) no DROP COLUMN without an ADR.

5. RLS, prefix isolation, and the ObjectKey invariant

Three layers of defense for cross-tenant prevention; an integration test covers each:

  1. Domain (ObjectKey.toGcsKey) always emits tenants/<tenantId>/.... Unit-tested.
  2. Postgres (file_objects_prefix_chk CHECK + RLS) rejects rows whose object_key doesn't start with tenants/<tenantId>/. Integration-tested by attempting an insert with a forged key.
  3. GCS IAM — the service account holds roles/storage.objectAdmin on the buckets but the buckets carry no public IAM grants and uniformBucketLevelAccess=true. Signed URLs are issued by the service-account key (or workload identity) and cryptographically bind the path. A test spec attempts to access a known object key from a different tenant context and asserts that (a) RLS hides the metadata row and (b) the issuance logic refuses to construct an ObjectKey for a different tenant_id.

6. Performance posture

OperationIndex usedp95 target
Initiate upload (insert file_objects + upload_sessions + outbox)per-tenant unique on idempotency_key; prefix CHECK eval< 80 ms (DB)
Confirm upload (UPDATE file_objects + dedupe lookup)file_objects_tenant_sha_idx partial< 60 ms (DB)
Issue download URL (insert access_grants)partition-pruned write< 20 ms (DB)
Get metadatafile_objects_pkey< 5 ms (DB)
Quota queryaggregate over quota_counters (≤ 9 rows / tenant)< 10 ms (DB)
Sweeper (retention)file_objects_hard_delete_after_idx + retention_holds_release_due_idxscans 100 k rows / s
Erasure by guestfile_objects_owner_guest_idx GIN-on-jsonb-path< 200 ms for 1 k matches

7. Capacity sizing (year-2 plan)

TableRows / tenantTenantsTotalHot fraction
file_objects~50 0005 000250 M5 %
variants~200 0005 0001 B5 %
access_grants~10 M / month / tenant peak5 00050 B / monthlast 30 d (partitioned)
outbox~2 events / file5 000published-and-pruned in 24 hworking set < 100 k

access_grants partitioning + monthly drop is mandatory to keep the table footprint bounded.

8. Backup & PITR

  • Automated daily backup retained for 35 days (Cloud SQL automated backups).
  • Point-in-time recovery for the last 7 days (binlog-equivalent WAL retention).
  • A cross-region read replica is promoted on regional outage; data loss budget ≤ 5 min (SLO).
  • GCS buckets carry versioning with a 30-day lifecycle delete; soft-deleted bytes are recoverable for that window.

9. Sync to BigQuery (analytics)

A Datastream job replicates file_objects, variants, access_grants, quota_counters to the analytics_file_storage dataset for the SLO/usage dashboards (latency p95 ≤ 5 min). PII columns (alt_text, owner_scope_refs.guestId) are dropped at the Datastream transform before landing.

10. References