Data Model
:::info Source
Sourced from services/catalog-service/DATA_MODEL.md in the documentation repo.
:::
Companion: DOMAIN_MODEL · ../../docs/12-data-models.md · ../../docs/13-security-compliance-tenancy.md
1. Database
- Engine: Postgres 15.
- Schema:
catalog. - Tenancy: Row-Level Security (RLS) enforced via
SET LOCAL app.current_tenant. Platform-admin role bypasses RLS (explicitBYPASSRLS). - Migrations: Flyway-style,
migrations/V__*.sql, forward-only. Rollback = new forward fix.
2. Tables (DDL)
2.1 catalog.courses
CREATE TABLE catalog.courses (
id TEXT PRIMARY KEY, -- crs_<ULID>
tenant_id TEXT NOT NULL,
slug TEXT NOT NULL,
title JSONB NOT NULL, -- I18nString
description JSONB NOT NULL DEFAULT '{}'::jsonb,
authors JSONB NOT NULL DEFAULT '[]'::jsonb,
cover JSONB, -- MediaRef
default_locale TEXT NOT NULL,
visibility TEXT NOT NULL CHECK (visibility IN ('private','org','marketplace','public')),
tags TEXT[] NOT NULL DEFAULT '{}',
status TEXT NOT NULL CHECK (status IN ('active','archived')),
latest_version_id TEXT,
version_count INTEGER NOT NULL DEFAULT 0,
etag TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uq_courses_tenant_slug UNIQUE (tenant_id, slug)
);
CREATE INDEX idx_courses_tenant_status_visibility ON catalog.courses (tenant_id, status, visibility);
CREATE INDEX idx_courses_tags ON catalog.courses USING GIN (tags);
CREATE INDEX idx_courses_title_gin ON catalog.courses USING GIN ((title) jsonb_path_ops);
ALTER TABLE catalog.courses ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_courses_tenant ON catalog.courses
USING (tenant_id = current_setting('app.current_tenant', true)
OR visibility IN ('marketplace','public'));
2.2 catalog.course_versions
CREATE TABLE catalog.course_versions (
id TEXT PRIMARY KEY,
course_id TEXT NOT NULL REFERENCES catalog.courses(id) ON DELETE RESTRICT,
tenant_id TEXT NOT NULL,
version_label TEXT NOT NULL,
published_at TIMESTAMPTZ NOT NULL,
published_by TEXT NOT NULL,
duration_minutes INTEGER NOT NULL,
estimated_reading_minutes INTEGER,
module_summaries JSONB NOT NULL DEFAULT '[]'::jsonb,
locales TEXT[] NOT NULL,
play_package JSONB NOT NULL, -- { playPackageId, sha256, format }
changelog JSONB,
status TEXT NOT NULL CHECK (status IN ('published','deprecated','withdrawn')),
deprecated_at TIMESTAMPTZ,
withdrawn_at TIMESTAMPTZ,
withdrawn_reason TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uq_cv_course_label UNIQUE (course_id, version_label)
);
CREATE INDEX idx_cv_course_status ON catalog.course_versions (course_id, status, published_at DESC);
CREATE INDEX idx_cv_tenant_status ON catalog.course_versions (tenant_id, status, published_at DESC);
CREATE INDEX idx_cv_locales ON catalog.course_versions USING GIN (locales);
CREATE INDEX idx_cv_play_package_id ON catalog.course_versions ((play_package->>'playPackageId'));
ALTER TABLE catalog.course_versions ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_cv_tenant ON catalog.course_versions
USING (tenant_id = current_setting('app.current_tenant', true)
OR EXISTS (
SELECT 1 FROM catalog.courses c
WHERE c.id = course_id AND c.visibility IN ('marketplace','public')
));
2.3 catalog.taxonomies
CREATE TABLE catalog.taxonomies (
id TEXT PRIMARY KEY, -- tax_<ULID>
tenant_id TEXT, -- NULL = global
namespace TEXT NOT NULL,
title JSONB NOT NULL,
tree JSONB NOT NULL, -- materialised paths
version INTEGER NOT NULL DEFAULT 1,
updated_by TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uq_tax_tenant_ns UNIQUE (tenant_id, namespace)
);
CREATE INDEX idx_taxonomies_tenant ON catalog.taxonomies (tenant_id, namespace);
ALTER TABLE catalog.taxonomies ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_tax_read ON catalog.taxonomies
USING (tenant_id IS NULL OR tenant_id = current_setting('app.current_tenant', true));
2.4 catalog.course_taxonomy (join)
CREATE TABLE catalog.course_taxonomy (
course_id TEXT NOT NULL REFERENCES catalog.courses(id) ON DELETE CASCADE,
taxonomy_id TEXT NOT NULL REFERENCES catalog.taxonomies(id) ON DELETE RESTRICT,
node_path TEXT NOT NULL,
PRIMARY KEY (course_id, taxonomy_id, node_path)
);
CREATE INDEX idx_ct_tax_path ON catalog.course_taxonomy (taxonomy_id, node_path);
2.5 catalog.outbox
CREATE TABLE catalog.outbox (
outbox_id TEXT PRIMARY KEY, -- ULID
subject TEXT NOT NULL,
envelope JSONB NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
attempts INTEGER NOT NULL DEFAULT 0,
next_attempt TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ
);
CREATE INDEX idx_outbox_unpublished ON catalog.outbox (next_attempt) WHERE published_at IS NULL;
2.6 catalog.inbox
CREATE TABLE catalog.inbox (
event_id TEXT PRIMARY KEY,
subject TEXT NOT NULL,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
2.7 Audit / history
CREATE TABLE catalog.course_audit (
id BIGSERIAL PRIMARY KEY,
course_id TEXT NOT NULL,
at TIMESTAMPTZ NOT NULL DEFAULT now(),
actor TEXT NOT NULL,
action TEXT NOT NULL,
diff JSONB NOT NULL,
trace_id TEXT,
tenant_id TEXT NOT NULL
);
CREATE INDEX idx_audit_course_at ON catalog.course_audit (course_id, at DESC);
3. Indexes & Query Patterns
| Query | Index used |
|---|---|
| Browse by tenant + status + visibility | idx_courses_tenant_status_visibility |
| Tag search | idx_courses_tags (GIN) |
| Substring title (weak) | idx_courses_title_gin |
| Versions of a course, newest first | idx_cv_course_status |
| Locale filter | idx_cv_locales (GIN) |
| Taxonomy node facet | idx_ct_tax_path |
4. Storage Estimates
| Table | Rows / tenant / yr | Growth | Notes |
|---|---|---|---|
| courses | 500–5000 | linear | metadata only |
| course_versions | 3–5× courses | linear | immutable; most active versions |
| course_taxonomy | 5× courses | linear | |
| taxonomies | 1–3 | flat | tree is ~ 100 KB max |
| outbox | high-churn | steady-state < 10 MB | truncated post-publish |
| inbox | high-churn | steady-state < 50 MB | VACUUM daily |
| course_audit | 5× course_versions | append-only | partition by month at > 10 M rows |
5. RLS Policies (summary)
| Table | Read | Write |
|---|---|---|
courses | own tenant OR visibility ∈ {marketplace,public} | own tenant + catalog.course.edit |
course_versions | via course visibility | system only (no user-writable except state) |
taxonomies | own tenant OR global | own tenant + catalog.taxonomy.edit; global = platform-admin |
course_taxonomy | inherit from courses | own tenant |
outbox / inbox | service role | service role |
course_audit | platform-admin + own tenant | append-only via trigger |
6. Triggers
trg_courses_etag: on INSERT/UPDATE, setetag = encode(gen_random_bytes(12),'hex'),updated_at = now().trg_audit_courses: row-level AFTER UPDATE, writescourse_auditdiff.trg_course_versions_immutable: BEFORE UPDATE — reject changes to immutable columns (all except status/deprecated_at/withdrawn_at/withdrawn_reason).
7. Encryption & PII
- No PII at rest beyond
authors[].displayName(already-public author info). - All connections TLS 1.3.
- Backups use AWS KMS CMK per-region.
- No tenant-scoped DEK required for catalog (no sensitive assets; media lives elsewhere).
8. Archival & Retention
| Data | Retention |
|---|---|
courses | lifetime of tenant + 13 mo post-deletion (soft-deleted) |
course_versions | same, plus 7 y if referenced by certification records |
outbox rows | 30 d after published_at, then purged |
inbox rows | 30 d, then purged |
course_audit | 7 y (regulated) |
9. Read Model (CQRS-lite)
Primary: direct reads from courses + course_versions (small, indexed).
Materialised view catalog.v_course_summary:
CREATE MATERIALIZED VIEW catalog.v_course_summary AS
SELECT
c.id, c.tenant_id, c.slug, c.title, c.visibility, c.status,
c.latest_version_id,
lv.version_label AS latest_version_label,
lv.duration_minutes AS latest_duration,
lv.locales AS latest_locales,
c.version_count, c.etag, c.updated_at
FROM catalog.courses c
LEFT JOIN catalog.course_versions lv ON lv.id = c.latest_version_id;
CREATE UNIQUE INDEX ON catalog.v_course_summary (id);
Refreshed concurrently on publish/metadata events (job latency < 2 s).
10. Connection Pooling & Tuning
- Pool: pgbouncer, transaction mode, 100 connections.
statement_timeout = 5sfor HTTP handlers.idle_in_transaction_session_timeout = 30s.work_mem = 16MB.
11. Seed Data
Global taxonomy tax_ghasi_subjects with namespace ghasi:subjects seeded via migrations/V007_seed_subjects.sql. Contains: Science, Technology, Engineering, Mathematics, Arts, Humanities, Languages, Business, Health, Compliance.
12. Cross-Service References (FK-by-event, not FK-by-DB)
| Field | Owner service | Validated via |
|---|---|---|
tenant_id | tenant-service | cached registry |
authors[].userId | identity-service | soft check (null-tolerant) |
play_package.playPackageId | content-service | event payload SHA |
cover.assetId | media-service | cached resolve call |
Catalog never performs cross-schema joins. All cross-service data is projected via events.