Data Model
:::info Source
Sourced from services/content-service/DATA_MODEL.md in the documentation repo.
:::
Companion: 12 Data Models · DOMAIN_MODEL · 13 Security
1. Database
- Engine: PostgreSQL 16+
- Schema name:
content - Connection pool: pgbouncer (transaction mode) → Postgres primary + read replicas
- RLS: Enabled on every table.
app.tenant_idsession variable set per request. - Backup: Daily full + WAL continuous; PITR window 30 days; cross-region replica for DR.
2. Entity Relationship Diagram (text)
play_packages ─── 1..* bundles ─── 0..1 license_envelopes (embedded)
│
├── 1..* format_artifacts (json column)
│
└── 1..* play_package_assets (denormalized for query)
scorm_imports (standalone, produces play_packages on success)
tamper_reports (references bundles)
outbox (transactional outbox)
inbox (consumer idempotency)
idempotency_keys (HTTP idempotency)
3. Table: play_packages
CREATE TABLE content.play_packages (
id text PRIMARY KEY, -- ppk_<ULID>
tenant_id uuid NOT NULL,
course_id text NOT NULL,
course_version_id text NOT NULL,
locale text NOT NULL,
manifest jsonb NOT NULL, -- PackageManifest
assets jsonb NOT NULL, -- AssetReference[]
formats jsonb NOT NULL DEFAULT '{}', -- FormatArtifacts
hash text NOT NULL, -- sha256:...
signature text NOT NULL, -- JWS
signature_kid text NOT NULL,
built_at timestamptz,
built_from_draft_version integer,
built_from_commit_hash text,
status text NOT NULL
CHECK (status IN ('building', 'built', 'revoked')),
revoked_at timestamptz,
revoked_by text,
revoke_reason text,
total_size_bytes bigint NOT NULL DEFAULT 0,
duration_minutes integer NOT NULL DEFAULT 0,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
version integer NOT NULL DEFAULT 1, -- optimistic concurrency
CONSTRAINT play_packages_version_unique
UNIQUE (tenant_id, course_version_id, locale)
WHERE status != 'revoked' DEFERRABLE INITIALLY DEFERRED
);
-- Row-level security
ALTER TABLE content.play_packages ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON content.play_packages
USING (tenant_id = current_setting('app.tenant_id')::uuid)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);
-- Indexes
CREATE INDEX play_packages_tenant_course_idx
ON content.play_packages (tenant_id, course_id, locale)
WHERE status = 'built';
CREATE INDEX play_packages_tenant_status_idx
ON content.play_packages (tenant_id, status, built_at DESC);
CREATE INDEX play_packages_course_version_idx
ON content.play_packages (course_version_id);
CREATE INDEX play_packages_building_stuck_idx
ON content.play_packages (created_at)
WHERE status = 'building';
-- Trigger: auto-update updated_at + version
CREATE TRIGGER play_packages_update_meta
BEFORE UPDATE ON content.play_packages
FOR EACH ROW EXECUTE FUNCTION content.set_updated_meta();
Notes
manifestandassetsare stored as JSONB for flexibility + indexed query via GIN when needed.- Unique constraint prevents two non-revoked packages for same (tenant, version, locale).
- Garbage collection job removes packages stuck in
building> 1 hour.
4. Table: bundles
CREATE TABLE content.bundles (
id text PRIMARY KEY, -- bun_<ULID>
tenant_id uuid NOT NULL,
play_package_id text NOT NULL
REFERENCES content.play_packages(id) ON DELETE RESTRICT,
enrollment_id text NOT NULL,
user_id uuid NOT NULL,
device_id uuid NOT NULL,
url text NOT NULL, -- S3/R2 key
sha256 text NOT NULL, -- sha256:...
signature text NOT NULL, -- JWS
size_bytes bigint NOT NULL,
encryption_alg text NOT NULL DEFAULT 'AES-256-GCM',
encryption_kid text NOT NULL,
license_envelope jsonb NOT NULL, -- signed LicenseEnvelope
built_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,
status text NOT NULL
CHECK (status IN ('available', 'revoked')),
revoked_at timestamptz,
revoke_reason text,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
-- RLS
ALTER TABLE content.bundles ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON content.bundles
USING (tenant_id = current_setting('app.tenant_id')::uuid)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);
-- Indexes
CREATE INDEX bundles_tenant_package_idx
ON content.bundles (tenant_id, play_package_id, status);
CREATE INDEX bundles_user_device_idx
ON content.bundles (tenant_id, user_id, device_id, status);
CREATE INDEX bundles_enrollment_idx
ON content.bundles (tenant_id, enrollment_id, status);
CREATE INDEX bundles_expires_idx
ON content.bundles (expires_at)
WHERE status = 'available';
-- Unique active bundle per (package, enrollment, device)
CREATE UNIQUE INDEX bundles_active_triple_idx
ON content.bundles (play_package_id, enrollment_id, device_id)
WHERE status = 'available';
5. Table: tamper_reports
CREATE TABLE content.tamper_reports (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id uuid NOT NULL,
bundle_id text NOT NULL
REFERENCES content.bundles(id) ON DELETE RESTRICT,
user_id uuid NOT NULL,
device_id uuid NOT NULL,
expected_hash text NOT NULL,
actual_hash text NOT NULL,
location_in_bundle text,
device_fingerprint text,
player_version text,
reported_at timestamptz NOT NULL,
received_at timestamptz NOT NULL DEFAULT now(),
auto_revoked boolean NOT NULL DEFAULT false
);
ALTER TABLE content.tamper_reports ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON content.tamper_reports
USING (tenant_id = current_setting('app.tenant_id')::uuid);
CREATE INDEX tamper_reports_bundle_idx
ON content.tamper_reports (bundle_id, reported_at DESC);
CREATE INDEX tamper_reports_device_idx
ON content.tamper_reports (tenant_id, device_id, reported_at DESC);
Retention: 7 years (audit class).
6. Table: scorm_imports
CREATE TABLE content.scorm_imports (
id text PRIMARY KEY, -- imp_<ULID>
tenant_id uuid NOT NULL,
uploaded_by uuid NOT NULL,
source_filename text NOT NULL,
source_size_bytes bigint NOT NULL,
source_sha256 text NOT NULL,
upload_url text NOT NULL, -- staging S3 location
target_course_id text,
locale text NOT NULL,
scorm_version text, -- '1.2' | '2004' | unknown
manifest_xml text, -- extracted imsmanifest.xml
status text NOT NULL
CHECK (status IN ('uploaded', 'validating', 'scanning', 'ingesting', 'building', 'completed', 'failed')),
stages jsonb NOT NULL DEFAULT '[]',
errors jsonb NOT NULL DEFAULT '[]',
play_package_id text REFERENCES content.play_packages(id),
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
completed_at timestamptz
);
ALTER TABLE content.scorm_imports ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON content.scorm_imports
USING (tenant_id = current_setting('app.tenant_id')::uuid);
CREATE INDEX scorm_imports_tenant_status_idx
ON content.scorm_imports (tenant_id, status, created_at DESC);
7. Table: outbox (Transactional Outbox)
CREATE TABLE content.outbox (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
occurred_at timestamptz NOT NULL DEFAULT now(),
tenant_id uuid NOT NULL,
topic text NOT NULL, -- e.g., 'content.play_package.built.v1'
envelope jsonb NOT NULL, -- full EventEnvelope
published_at timestamptz,
attempts int NOT NULL DEFAULT 0,
last_error text,
partition_key text NOT NULL
);
CREATE INDEX outbox_unpublished_idx
ON content.outbox (occurred_at)
WHERE published_at IS NULL;
CREATE INDEX outbox_partition_idx
ON content.outbox (partition_key, occurred_at);
No RLS on outbox — publisher worker runs with elevated role. Tenant isolation enforced via envelope content and downstream consumer scoping.
8. Table: inbox (Consumer Idempotency)
CREATE TABLE content.inbox (
event_id text PRIMARY KEY, -- ULID from envelope
subject text NOT NULL,
tenant_id uuid NOT NULL,
processed_at timestamptz NOT NULL DEFAULT now(),
result text NOT NULL CHECK (result IN ('ok', 'skipped', 'failed'))
);
CREATE INDEX inbox_subject_idx
ON content.inbox (subject, processed_at DESC);
CREATE INDEX inbox_tenant_idx
ON content.inbox (tenant_id, processed_at DESC);
-- Retention: 90 days (redelivery window)
CREATE INDEX inbox_gc_idx ON content.inbox (processed_at);
9. Table: idempotency_keys (HTTP)
CREATE TABLE content.idempotency_keys (
tenant_id uuid NOT NULL,
key text NOT NULL,
request_hash text NOT NULL, -- sha256(method+path+body)
response_body jsonb,
response_code int,
created_at timestamptz NOT NULL DEFAULT now(),
expires_at timestamptz NOT NULL,
PRIMARY KEY (tenant_id, key)
);
CREATE INDEX idempotency_expires_idx
ON content.idempotency_keys (expires_at);
-- TTL: 24 hours
10. Table: tenant_signing_keys_cache
Local cache of tenant public keys for signature verification (keys are in KMS, public portions cached here for performance).
CREATE TABLE content.tenant_signing_keys_cache (
tenant_id uuid NOT NULL,
kid text NOT NULL,
algorithm text NOT NULL, -- 'EdDSA' | 'ES256'
public_key text NOT NULL, -- PEM
activated_at timestamptz NOT NULL,
rotated_at timestamptz,
PRIMARY KEY (tenant_id, kid)
);
CREATE INDEX tenant_signing_keys_active_idx
ON content.tenant_signing_keys_cache (tenant_id)
WHERE rotated_at IS NULL;
11. S3/R2 Storage Layout
Bundles and exports are stored in object storage with a strict per-tenant prefix:
s3://ghasi-content-{region}/
tenants/
{tenantId}/
play-packages/
{playPackageId}/
manifest.json (immutable; cached by CDN)
bundles/
{bundleId}.bin (encrypted blob; ~200MB-5GB typical)
exports/
scorm-1_2/
{courseVersionId}-{locale}.zip
scorm-2004/
{courseVersionId}-{locale}.zip
html5/
{courseVersionId}-{locale}.zip
xapi/
{courseVersionId}-{locale}.zip
imports/
staging/
{importId}.zip (ephemeral; deleted after processing)
Bucket Policy
denyall cross-tenant prefix access (bucket policy matches JWTtid).- Signed URLs scoped to specific object + caller identity + short TTL (15 min).
- Server-side encryption with tenant-level CMEK where required by tenant contract.
12. Trigger Functions
12.1 set_updated_meta
CREATE OR REPLACE FUNCTION content.set_updated_meta()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
NEW.updated_at = now();
NEW.version = OLD.version + 1;
RETURN NEW;
END;
$$;
12.2 cascade_revocation
Triggered when a PlayPackage moves to revoked:
CREATE OR REPLACE FUNCTION content.cascade_revocation()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
IF NEW.status = 'revoked' AND OLD.status != 'revoked' THEN
UPDATE content.bundles
SET status = 'revoked',
revoked_at = NEW.revoked_at,
revoke_reason = 'package_revoked',
updated_at = now()
WHERE play_package_id = NEW.id
AND status = 'available';
-- Outbox events for each cascaded bundle
INSERT INTO content.outbox (tenant_id, topic, envelope, partition_key)
SELECT b.tenant_id,
'content.play_package.bundle.revoked.v1',
content.build_envelope(b.id, 'bundle_revoked', ...),
b.id
FROM content.bundles b
WHERE b.play_package_id = NEW.id
AND b.status = 'revoked'
AND b.revoked_at = NEW.revoked_at;
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER play_packages_cascade_revocation
AFTER UPDATE ON content.play_packages
FOR EACH ROW EXECUTE FUNCTION content.cascade_revocation();
13. Migrations
Managed via Flyway (or equivalent). Migration files in services/content-service/db/migrations/:
V001__initial_schema.sql -- play_packages, bundles, outbox, inbox
V002__scorm_imports.sql
V003__tamper_reports.sql
V004__tenant_signing_keys_cache.sql
V005__idempotency_keys.sql
V006__cascade_revocation_trigger.sql
V007__indexes_optimization.sql
Migrations are forward-only. Rollback achieved via forward migrations (e.g., V010__revert_V009.sql).
See MIGRATION_PLAN for rollout strategy.
14. Query Patterns
14.1 Hot Path: Get PlayPackage
SELECT id, manifest, assets, formats, hash, signature, status
FROM content.play_packages
WHERE tenant_id = current_setting('app.tenant_id')::uuid
AND id = $1;
Expected p95: < 5ms (primary key lookup, manifest JSONB served as-is).
14.2 Get Bundle by (Enrollment, Device)
SELECT id, url, sha256, signature, license_envelope, expires_at
FROM content.bundles
WHERE tenant_id = current_setting('app.tenant_id')::uuid
AND enrollment_id = $1
AND device_id = $2
AND status = 'available'
LIMIT 1;
14.3 List Bundles for Revocation (by Package)
SELECT id, user_id, device_id
FROM content.bundles
WHERE play_package_id = $1
AND status = 'available';
15. Capacity Planning
| Metric | Assumption | Sizing |
|---|---|---|
| PlayPackages per tenant | 100–10K | Low bytes (< 50MB total across all rows with JSONB) |
| Bundles per tenant | 1K–1M | ~2KB per row + S3 blobs |
| Tamper reports | Rare; < 1/1000 bundles | Negligible |
| SCORM imports | < 100/tenant/year | Negligible |
| Outbox depth | < 100K unpublished | < 1GB typical |
Postgres Sizing
- db.m7g.2xlarge baseline (8 vCPU, 32 GB RAM)
- Scaled vertically for Tier-1 tenants; sharded horizontally by
tenant_idif single-tenant exceeds 100K bundles.
S3 Sizing
- Bundle blobs: ~500MB average → 500TB per 1M bundles
- Lifecycle policy: bundles moved to Glacier Deep Archive after 90 days of inactivity (with wake-up cost warning to sync-service).
16. Data Residency
Residency enforced at multiple layers:
- Tenant config specifies residency (
us,eu,me,ap). - Database: separate Postgres cluster per region; content-service instances pinned to region.
- S3: region-specific buckets; bucket policy rejects cross-region writes.
- Events:
dataResidencyfield in envelope; stream routing honors residency.
17. Legal Hold
When a tenant is under legal hold (e.g., litigation):
- Rows marked with
legal_hold: truein a separatelegal_holdstable (global). - Deletion operations (including GDPR erasure) skipped for held data.
- Audit entry on every read of held data.
18. Archival
- PlayPackages: retained 7 years after revocation (audit class).
- Bundles: encrypted blobs deleted 30 days after revocation; metadata retained 1 year.
- SCORM imports: source zips deleted after 90 days; metadata retained 1 year.
- Tamper reports: 7 years (security audit).