Skip to main content

Data Model

:::info Source Sourced from services/catalog-service/DATA_MODEL.md in the documentation repo. :::

Companion: DOMAIN_MODEL · ../../docs/12-data-models.md · ../../docs/13-security-compliance-tenancy.md

1. Database

  • Engine: Postgres 15.
  • Schema: catalog.
  • Tenancy: Row-Level Security (RLS) enforced via SET LOCAL app.current_tenant. Platform-admin role bypasses RLS (explicit BYPASSRLS).
  • Migrations: Flyway-style, migrations/V__*.sql, forward-only. Rollback = new forward fix.

2. Tables (DDL)

2.1 catalog.courses

CREATE TABLE catalog.courses (
id TEXT PRIMARY KEY, -- crs_<ULID>
tenant_id TEXT NOT NULL,
slug TEXT NOT NULL,
title JSONB NOT NULL, -- I18nString
description JSONB NOT NULL DEFAULT '{}'::jsonb,
authors JSONB NOT NULL DEFAULT '[]'::jsonb,
cover JSONB, -- MediaRef
default_locale TEXT NOT NULL,
visibility TEXT NOT NULL CHECK (visibility IN ('private','org','marketplace','public')),
tags TEXT[] NOT NULL DEFAULT '{}',
status TEXT NOT NULL CHECK (status IN ('active','archived')),
latest_version_id TEXT,
version_count INTEGER NOT NULL DEFAULT 0,
etag TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uq_courses_tenant_slug UNIQUE (tenant_id, slug)
);
CREATE INDEX idx_courses_tenant_status_visibility ON catalog.courses (tenant_id, status, visibility);
CREATE INDEX idx_courses_tags ON catalog.courses USING GIN (tags);
CREATE INDEX idx_courses_title_gin ON catalog.courses USING GIN ((title) jsonb_path_ops);
ALTER TABLE catalog.courses ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_courses_tenant ON catalog.courses
USING (tenant_id = current_setting('app.current_tenant', true)
OR visibility IN ('marketplace','public'));

2.2 catalog.course_versions

CREATE TABLE catalog.course_versions (
id TEXT PRIMARY KEY,
course_id TEXT NOT NULL REFERENCES catalog.courses(id) ON DELETE RESTRICT,
tenant_id TEXT NOT NULL,
version_label TEXT NOT NULL,
published_at TIMESTAMPTZ NOT NULL,
published_by TEXT NOT NULL,
duration_minutes INTEGER NOT NULL,
estimated_reading_minutes INTEGER,
module_summaries JSONB NOT NULL DEFAULT '[]'::jsonb,
locales TEXT[] NOT NULL,
play_package JSONB NOT NULL, -- { playPackageId, sha256, format }
changelog JSONB,
status TEXT NOT NULL CHECK (status IN ('published','deprecated','withdrawn')),
deprecated_at TIMESTAMPTZ,
withdrawn_at TIMESTAMPTZ,
withdrawn_reason TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uq_cv_course_label UNIQUE (course_id, version_label)
);
CREATE INDEX idx_cv_course_status ON catalog.course_versions (course_id, status, published_at DESC);
CREATE INDEX idx_cv_tenant_status ON catalog.course_versions (tenant_id, status, published_at DESC);
CREATE INDEX idx_cv_locales ON catalog.course_versions USING GIN (locales);
CREATE INDEX idx_cv_play_package_id ON catalog.course_versions ((play_package->>'playPackageId'));
ALTER TABLE catalog.course_versions ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_cv_tenant ON catalog.course_versions
USING (tenant_id = current_setting('app.current_tenant', true)
OR EXISTS (
SELECT 1 FROM catalog.courses c
WHERE c.id = course_id AND c.visibility IN ('marketplace','public')
));

2.3 catalog.taxonomies

CREATE TABLE catalog.taxonomies (
id TEXT PRIMARY KEY, -- tax_<ULID>
tenant_id TEXT, -- NULL = global
namespace TEXT NOT NULL,
title JSONB NOT NULL,
tree JSONB NOT NULL, -- materialised paths
version INTEGER NOT NULL DEFAULT 1,
updated_by TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uq_tax_tenant_ns UNIQUE (tenant_id, namespace)
);
CREATE INDEX idx_taxonomies_tenant ON catalog.taxonomies (tenant_id, namespace);
ALTER TABLE catalog.taxonomies ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_tax_read ON catalog.taxonomies
USING (tenant_id IS NULL OR tenant_id = current_setting('app.current_tenant', true));

2.4 catalog.course_taxonomy (join)

CREATE TABLE catalog.course_taxonomy (
course_id TEXT NOT NULL REFERENCES catalog.courses(id) ON DELETE CASCADE,
taxonomy_id TEXT NOT NULL REFERENCES catalog.taxonomies(id) ON DELETE RESTRICT,
node_path TEXT NOT NULL,
PRIMARY KEY (course_id, taxonomy_id, node_path)
);
CREATE INDEX idx_ct_tax_path ON catalog.course_taxonomy (taxonomy_id, node_path);

2.5 catalog.outbox

CREATE TABLE catalog.outbox (
outbox_id TEXT PRIMARY KEY, -- ULID
subject TEXT NOT NULL,
envelope JSONB NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
attempts INTEGER NOT NULL DEFAULT 0,
next_attempt TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ
);
CREATE INDEX idx_outbox_unpublished ON catalog.outbox (next_attempt) WHERE published_at IS NULL;

2.6 catalog.inbox

CREATE TABLE catalog.inbox (
event_id TEXT PRIMARY KEY,
subject TEXT NOT NULL,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

2.7 Audit / history

CREATE TABLE catalog.course_audit (
id BIGSERIAL PRIMARY KEY,
course_id TEXT NOT NULL,
at TIMESTAMPTZ NOT NULL DEFAULT now(),
actor TEXT NOT NULL,
action TEXT NOT NULL,
diff JSONB NOT NULL,
trace_id TEXT,
tenant_id TEXT NOT NULL
);
CREATE INDEX idx_audit_course_at ON catalog.course_audit (course_id, at DESC);

3. Indexes & Query Patterns

QueryIndex used
Browse by tenant + status + visibilityidx_courses_tenant_status_visibility
Tag searchidx_courses_tags (GIN)
Substring title (weak)idx_courses_title_gin
Versions of a course, newest firstidx_cv_course_status
Locale filteridx_cv_locales (GIN)
Taxonomy node facetidx_ct_tax_path

4. Storage Estimates

TableRows / tenant / yrGrowthNotes
courses500–5000linearmetadata only
course_versions3–5× courseslinearimmutable; most active versions
course_taxonomy5× courseslinear
taxonomies1–3flattree is ~ 100 KB max
outboxhigh-churnsteady-state < 10 MBtruncated post-publish
inboxhigh-churnsteady-state < 50 MBVACUUM daily
course_audit5× course_versionsappend-onlypartition by month at > 10 M rows

5. RLS Policies (summary)

TableReadWrite
coursesown tenant OR visibility ∈ {marketplace,public}own tenant + catalog.course.edit
course_versionsvia course visibilitysystem only (no user-writable except state)
taxonomiesown tenant OR globalown tenant + catalog.taxonomy.edit; global = platform-admin
course_taxonomyinherit from coursesown tenant
outbox / inboxservice roleservice role
course_auditplatform-admin + own tenantappend-only via trigger

6. Triggers

  • trg_courses_etag: on INSERT/UPDATE, set etag = encode(gen_random_bytes(12),'hex'), updated_at = now().
  • trg_audit_courses: row-level AFTER UPDATE, writes course_audit diff.
  • trg_course_versions_immutable: BEFORE UPDATE — reject changes to immutable columns (all except status/deprecated_at/withdrawn_at/withdrawn_reason).

7. Encryption & PII

  • No PII at rest beyond authors[].displayName (already-public author info).
  • All connections TLS 1.3.
  • Backups use AWS KMS CMK per-region.
  • No tenant-scoped DEK required for catalog (no sensitive assets; media lives elsewhere).

8. Archival & Retention

DataRetention
courseslifetime of tenant + 13 mo post-deletion (soft-deleted)
course_versionssame, plus 7 y if referenced by certification records
outbox rows30 d after published_at, then purged
inbox rows30 d, then purged
course_audit7 y (regulated)

9. Read Model (CQRS-lite)

Primary: direct reads from courses + course_versions (small, indexed).

Materialised view catalog.v_course_summary:

CREATE MATERIALIZED VIEW catalog.v_course_summary AS
SELECT
c.id, c.tenant_id, c.slug, c.title, c.visibility, c.status,
c.latest_version_id,
lv.version_label AS latest_version_label,
lv.duration_minutes AS latest_duration,
lv.locales AS latest_locales,
c.version_count, c.etag, c.updated_at
FROM catalog.courses c
LEFT JOIN catalog.course_versions lv ON lv.id = c.latest_version_id;
CREATE UNIQUE INDEX ON catalog.v_course_summary (id);

Refreshed concurrently on publish/metadata events (job latency < 2 s).

10. Connection Pooling & Tuning

  • Pool: pgbouncer, transaction mode, 100 connections.
  • statement_timeout = 5s for HTTP handlers.
  • idle_in_transaction_session_timeout = 30s.
  • work_mem = 16MB.

11. Seed Data

Global taxonomy tax_ghasi_subjects with namespace ghasi:subjects seeded via migrations/V007_seed_subjects.sql. Contains: Science, Technology, Engineering, Mathematics, Arts, Humanities, Languages, Business, Health, Compliance.

12. Cross-Service References (FK-by-event, not FK-by-DB)

FieldOwner serviceValidated via
tenant_idtenant-servicecached registry
authors[].userIdidentity-servicesoft check (null-tolerant)
play_package.playPackageIdcontent-serviceevent payload SHA
cover.assetIdmedia-servicecached resolve call

Catalog never performs cross-schema joins. All cross-service data is projected via events.