Skip to main content

Data Model

:::info Source Sourced from services/content-service/DATA_MODEL.md in the documentation repo. :::

Companion: 12 Data Models · DOMAIN_MODEL · 13 Security

1. Database

  • Engine: PostgreSQL 16+
  • Schema name: content
  • Connection pool: pgbouncer (transaction mode) → Postgres primary + read replicas
  • RLS: Enabled on every table. app.tenant_id session variable set per request.
  • Backup: Daily full + WAL continuous; PITR window 30 days; cross-region replica for DR.

2. Entity Relationship Diagram (text)

play_packages ─── 1..* bundles ─── 0..1 license_envelopes (embedded)

├── 1..* format_artifacts (json column)

└── 1..* play_package_assets (denormalized for query)

scorm_imports (standalone, produces play_packages on success)
tamper_reports (references bundles)
outbox (transactional outbox)
inbox (consumer idempotency)
idempotency_keys (HTTP idempotency)

3. Table: play_packages

CREATE TABLE content.play_packages (
id text PRIMARY KEY, -- ppk_<ULID>
tenant_id uuid NOT NULL,
course_id text NOT NULL,
course_version_id text NOT NULL,
locale text NOT NULL,

manifest jsonb NOT NULL, -- PackageManifest
assets jsonb NOT NULL, -- AssetReference[]
formats jsonb NOT NULL DEFAULT '{}', -- FormatArtifacts

hash text NOT NULL, -- sha256:...
signature text NOT NULL, -- JWS
signature_kid text NOT NULL,

built_at timestamptz,
built_from_draft_version integer,
built_from_commit_hash text,

status text NOT NULL
CHECK (status IN ('building', 'built', 'revoked')),
revoked_at timestamptz,
revoked_by text,
revoke_reason text,

total_size_bytes bigint NOT NULL DEFAULT 0,
duration_minutes integer NOT NULL DEFAULT 0,

created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version integer NOT NULL DEFAULT 1, -- optimistic concurrency

CONSTRAINT play_packages_version_unique
UNIQUE (tenant_id, course_version_id, locale)
WHERE status != 'revoked' DEFERRABLE INITIALLY DEFERRED
);

-- Row-level security
ALTER TABLE content.play_packages ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON content.play_packages
USING (tenant_id = current_setting('app.tenant_id')::uuid)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);

-- Indexes
CREATE INDEX play_packages_tenant_course_idx
ON content.play_packages (tenant_id, course_id, locale)
WHERE status = 'built';

CREATE INDEX play_packages_tenant_status_idx
ON content.play_packages (tenant_id, status, built_at DESC);

CREATE INDEX play_packages_course_version_idx
ON content.play_packages (course_version_id);

CREATE INDEX play_packages_building_stuck_idx
ON content.play_packages (created_at)
WHERE status = 'building';

-- Trigger: auto-update updated_at + version
CREATE TRIGGER play_packages_update_meta
BEFORE UPDATE ON content.play_packages
FOR EACH ROW EXECUTE FUNCTION content.set_updated_meta();

Notes

  • manifest and assets are stored as JSONB for flexibility + indexed query via GIN when needed.
  • Unique constraint prevents two non-revoked packages for same (tenant, version, locale).
  • Garbage collection job removes packages stuck in building > 1 hour.

4. Table: bundles

CREATE TABLE content.bundles (
id text PRIMARY KEY, -- bun_<ULID>
tenant_id uuid NOT NULL,
play_package_id text NOT NULL
REFERENCES content.play_packages(id) ON DELETE RESTRICT,

enrollment_id text NOT NULL,
user_id uuid NOT NULL,
device_id uuid NOT NULL,

url text NOT NULL, -- S3/R2 key
sha256 text NOT NULL, -- sha256:...
signature text NOT NULL, -- JWS
size_bytes bigint NOT NULL,

encryption_alg text NOT NULL DEFAULT 'AES-256-GCM',
encryption_kid text NOT NULL,

license_envelope jsonb NOT NULL, -- signed LicenseEnvelope

built_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,

status text NOT NULL
CHECK (status IN ('available', 'revoked')),
revoked_at timestamptz,
revoke_reason text,

created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);

-- RLS
ALTER TABLE content.bundles ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON content.bundles
USING (tenant_id = current_setting('app.tenant_id')::uuid)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);

-- Indexes
CREATE INDEX bundles_tenant_package_idx
ON content.bundles (tenant_id, play_package_id, status);

CREATE INDEX bundles_user_device_idx
ON content.bundles (tenant_id, user_id, device_id, status);

CREATE INDEX bundles_enrollment_idx
ON content.bundles (tenant_id, enrollment_id, status);

CREATE INDEX bundles_expires_idx
ON content.bundles (expires_at)
WHERE status = 'available';

-- Unique active bundle per (package, enrollment, device)
CREATE UNIQUE INDEX bundles_active_triple_idx
ON content.bundles (play_package_id, enrollment_id, device_id)
WHERE status = 'available';

5. Table: tamper_reports

CREATE TABLE content.tamper_reports (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id uuid NOT NULL,
bundle_id text NOT NULL
REFERENCES content.bundles(id) ON DELETE RESTRICT,

user_id uuid NOT NULL,
device_id uuid NOT NULL,

expected_hash text NOT NULL,
actual_hash text NOT NULL,
location_in_bundle text,
device_fingerprint text,
player_version text,

reported_at timestamptz NOT NULL,
received_at timestamptz NOT NULL DEFAULT now(),

auto_revoked boolean NOT NULL DEFAULT false
);

ALTER TABLE content.tamper_reports ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON content.tamper_reports
USING (tenant_id = current_setting('app.tenant_id')::uuid);

CREATE INDEX tamper_reports_bundle_idx
ON content.tamper_reports (bundle_id, reported_at DESC);

CREATE INDEX tamper_reports_device_idx
ON content.tamper_reports (tenant_id, device_id, reported_at DESC);

Retention: 7 years (audit class).

6. Table: scorm_imports

CREATE TABLE content.scorm_imports (
id text PRIMARY KEY, -- imp_<ULID>
tenant_id uuid NOT NULL,

uploaded_by uuid NOT NULL,
source_filename text NOT NULL,
source_size_bytes bigint NOT NULL,
source_sha256 text NOT NULL,
upload_url text NOT NULL, -- staging S3 location

target_course_id text,
locale text NOT NULL,

scorm_version text, -- '1.2' | '2004' | unknown
manifest_xml text, -- extracted imsmanifest.xml

status text NOT NULL
CHECK (status IN ('uploaded', 'validating', 'scanning', 'ingesting', 'building', 'completed', 'failed')),

stages jsonb NOT NULL DEFAULT '[]',
errors jsonb NOT NULL DEFAULT '[]',

play_package_id text REFERENCES content.play_packages(id),

created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
completed_at timestamptz
);

ALTER TABLE content.scorm_imports ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON content.scorm_imports
USING (tenant_id = current_setting('app.tenant_id')::uuid);

CREATE INDEX scorm_imports_tenant_status_idx
ON content.scorm_imports (tenant_id, status, created_at DESC);

7. Table: outbox (Transactional Outbox)

CREATE TABLE content.outbox (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
occurred_at timestamptz NOT NULL DEFAULT now(),
tenant_id uuid NOT NULL,
topic text NOT NULL, -- e.g., 'content.play_package.built.v1'
envelope jsonb NOT NULL, -- full EventEnvelope
published_at timestamptz,
attempts int NOT NULL DEFAULT 0,
last_error text,
partition_key text NOT NULL
);

CREATE INDEX outbox_unpublished_idx
ON content.outbox (occurred_at)
WHERE published_at IS NULL;

CREATE INDEX outbox_partition_idx
ON content.outbox (partition_key, occurred_at);

No RLS on outbox — publisher worker runs with elevated role. Tenant isolation enforced via envelope content and downstream consumer scoping.

8. Table: inbox (Consumer Idempotency)

CREATE TABLE content.inbox (
event_id text PRIMARY KEY, -- ULID from envelope
subject text NOT NULL,
tenant_id uuid NOT NULL,
processed_at timestamptz NOT NULL DEFAULT now(),
result text NOT NULL CHECK (result IN ('ok', 'skipped', 'failed'))
);

CREATE INDEX inbox_subject_idx
ON content.inbox (subject, processed_at DESC);

CREATE INDEX inbox_tenant_idx
ON content.inbox (tenant_id, processed_at DESC);

-- Retention: 90 days (redelivery window)
CREATE INDEX inbox_gc_idx ON content.inbox (processed_at);

9. Table: idempotency_keys (HTTP)

CREATE TABLE content.idempotency_keys (
tenant_id uuid NOT NULL,
key text NOT NULL,
request_hash text NOT NULL, -- sha256(method+path+body)
response_body jsonb,
response_code int,
created_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,
PRIMARY KEY (tenant_id, key)
);

CREATE INDEX idempotency_expires_idx
ON content.idempotency_keys (expires_at);

-- TTL: 24 hours

10. Table: tenant_signing_keys_cache

Local cache of tenant public keys for signature verification (keys are in KMS, public portions cached here for performance).

CREATE TABLE content.tenant_signing_keys_cache (
tenant_id uuid NOT NULL,
kid text NOT NULL,
algorithm text NOT NULL, -- 'EdDSA' | 'ES256'
public_key text NOT NULL, -- PEM
activated_at timestamptz NOT NULL,
rotated_at timestamptz,
PRIMARY KEY (tenant_id, kid)
);

CREATE INDEX tenant_signing_keys_active_idx
ON content.tenant_signing_keys_cache (tenant_id)
WHERE rotated_at IS NULL;

11. S3/R2 Storage Layout

Bundles and exports are stored in object storage with a strict per-tenant prefix:

s3://ghasi-content-{region}/
tenants/
{tenantId}/
play-packages/
{playPackageId}/
manifest.json (immutable; cached by CDN)
bundles/
{bundleId}.bin (encrypted blob; ~200MB-5GB typical)
exports/
scorm-1_2/
{courseVersionId}-{locale}.zip
scorm-2004/
{courseVersionId}-{locale}.zip
html5/
{courseVersionId}-{locale}.zip
xapi/
{courseVersionId}-{locale}.zip
imports/
staging/
{importId}.zip (ephemeral; deleted after processing)

Bucket Policy

  • deny all cross-tenant prefix access (bucket policy matches JWT tid).
  • Signed URLs scoped to specific object + caller identity + short TTL (15 min).
  • Server-side encryption with tenant-level CMEK where required by tenant contract.

12. Trigger Functions

12.1 set_updated_meta

CREATE OR REPLACE FUNCTION content.set_updated_meta()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
NEW.updated_at = now();
NEW.version = OLD.version + 1;
RETURN NEW;
END;
$$;

12.2 cascade_revocation

Triggered when a PlayPackage moves to revoked:

CREATE OR REPLACE FUNCTION content.cascade_revocation()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
IF NEW.status = 'revoked' AND OLD.status != 'revoked' THEN
UPDATE content.bundles
SET status = 'revoked',
revoked_at = NEW.revoked_at,
revoke_reason = 'package_revoked',
updated_at = now()
WHERE play_package_id = NEW.id
AND status = 'available';

-- Outbox events for each cascaded bundle
INSERT INTO content.outbox (tenant_id, topic, envelope, partition_key)
SELECT b.tenant_id,
'content.play_package.bundle.revoked.v1',
content.build_envelope(b.id, 'bundle_revoked', ...),
b.id
FROM content.bundles b
WHERE b.play_package_id = NEW.id
AND b.status = 'revoked'
AND b.revoked_at = NEW.revoked_at;
END IF;
RETURN NEW;
END;
$$;

CREATE TRIGGER play_packages_cascade_revocation
AFTER UPDATE ON content.play_packages
FOR EACH ROW EXECUTE FUNCTION content.cascade_revocation();

13. Migrations

Managed via Flyway (or equivalent). Migration files in services/content-service/db/migrations/:

V001__initial_schema.sql -- play_packages, bundles, outbox, inbox
V002__scorm_imports.sql
V003__tamper_reports.sql
V004__tenant_signing_keys_cache.sql
V005__idempotency_keys.sql
V006__cascade_revocation_trigger.sql
V007__indexes_optimization.sql

Migrations are forward-only. Rollback achieved via forward migrations (e.g., V010__revert_V009.sql).

See MIGRATION_PLAN for rollout strategy.

14. Query Patterns

14.1 Hot Path: Get PlayPackage

SELECT id, manifest, assets, formats, hash, signature, status
FROM content.play_packages
WHERE tenant_id = current_setting('app.tenant_id')::uuid
AND id = $1;

Expected p95: < 5ms (primary key lookup, manifest JSONB served as-is).

14.2 Get Bundle by (Enrollment, Device)

SELECT id, url, sha256, signature, license_envelope, expires_at
FROM content.bundles
WHERE tenant_id = current_setting('app.tenant_id')::uuid
AND enrollment_id = $1
AND device_id = $2
AND status = 'available'
LIMIT 1;

14.3 List Bundles for Revocation (by Package)

SELECT id, user_id, device_id
FROM content.bundles
WHERE play_package_id = $1
AND status = 'available';

15. Capacity Planning

MetricAssumptionSizing
PlayPackages per tenant100–10KLow bytes (< 50MB total across all rows with JSONB)
Bundles per tenant1K–1M~2KB per row + S3 blobs
Tamper reportsRare; < 1/1000 bundlesNegligible
SCORM imports< 100/tenant/yearNegligible
Outbox depth< 100K unpublished< 1GB typical

Postgres Sizing

  • db.m7g.2xlarge baseline (8 vCPU, 32 GB RAM)
  • Scaled vertically for Tier-1 tenants; sharded horizontally by tenant_id if single-tenant exceeds 100K bundles.

S3 Sizing

  • Bundle blobs: ~500MB average → 500TB per 1M bundles
  • Lifecycle policy: bundles moved to Glacier Deep Archive after 90 days of inactivity (with wake-up cost warning to sync-service).

16. Data Residency

Residency enforced at multiple layers:

  • Tenant config specifies residency (us, eu, me, ap).
  • Database: separate Postgres cluster per region; content-service instances pinned to region.
  • S3: region-specific buckets; bucket policy rejects cross-region writes.
  • Events: dataResidency field in envelope; stream routing honors residency.

When a tenant is under legal hold (e.g., litigation):

  • Rows marked with legal_hold: true in a separate legal_holds table (global).
  • Deletion operations (including GDPR erasure) skipped for held data.
  • Audit entry on every read of held data.

18. Archival

  • PlayPackages: retained 7 years after revocation (audit class).
  • Bundles: encrypted blobs deleted 30 days after revocation; metadata retained 1 year.
  • SCORM imports: source zips deleted after 90 days; metadata retained 1 year.
  • Tamper reports: 7 years (security audit).