Data Model
:::info Source
Sourced from services/progress-service/DATA_MODEL.md in the documentation repo.
:::
1. Database
Postgres 16 schema progress. Partitioned tables (monthly + tenant hash) for statements.
2. Tables
2.1 attempts
CREATE TABLE progress.attempts (
id ULID PRIMARY KEY,
tenant_id UUID NOT NULL,
user_id ULID NOT NULL,
enrollment_id ULID NOT NULL,
course_version_id ULID NOT NULL,
attempt_number INT NOT NULL,
started_at TIMESTAMPTZ NOT NULL,
ended_at TIMESTAMPTZ,
outcome TEXT CHECK (outcome IN ('passed','failed','incomplete','abandoned')),
score NUMERIC(5,4),
duration_seconds INT,
state TEXT NOT NULL DEFAULT 'open' CHECK (state IN ('open','closed')),
UNIQUE (tenant_id, enrollment_id, attempt_number)
);
CREATE INDEX attempts_by_user ON progress.attempts (tenant_id, user_id, started_at DESC);
CREATE INDEX attempts_by_enrollment ON progress.attempts (tenant_id, enrollment_id, attempt_number DESC);
CREATE INDEX attempts_open ON progress.attempts (tenant_id, state) WHERE state = 'open';
2.2 statements (partitioned monthly × tenant hash)
CREATE TABLE progress.statements (
id ULID NOT NULL,
tenant_id UUID NOT NULL,
attempt_id ULID NOT NULL,
user_id ULID NOT NULL,
verb_id TEXT NOT NULL,
object_id TEXT NOT NULL,
actor JSONB NOT NULL,
object JSONB NOT NULL,
result JSONB,
context JSONB,
timestamp TIMESTAMPTZ NOT NULL,
stored TIMESTAMPTZ NOT NULL DEFAULT now(),
authority JSONB NOT NULL,
cmi5_session TEXT,
cmi5_registration TEXT,
PRIMARY KEY (id, tenant_id)
) PARTITION BY RANGE (timestamp);
-- monthly partitions created by pg_partman; tenant hash sub-partitions for hot tenants
CREATE INDEX statements_attempt ON progress.statements (tenant_id, attempt_id, timestamp DESC);
CREATE INDEX statements_user_verb ON progress.statements (tenant_id, user_id, verb_id, timestamp DESC);
CREATE INDEX statements_verb ON progress.statements (tenant_id, verb_id, timestamp DESC);
CREATE INDEX statements_registration ON progress.statements ((context->>'registration')) WHERE context->>'registration' IS NOT NULL;
2.3 completion_records
CREATE TABLE progress.completion_records (
id ULID PRIMARY KEY,
tenant_id UUID NOT NULL,
user_id ULID NOT NULL,
enrollment_id ULID NOT NULL,
course_version_id ULID NOT NULL,
attempt_id ULID NOT NULL,
completed_at TIMESTAMPTZ NOT NULL,
score NUMERIC(5,4) NOT NULL,
passed BOOLEAN NOT NULL,
evidence_statement_ids ULID[] NOT NULL,
UNIQUE (tenant_id, attempt_id)
);
CREATE INDEX completion_by_user ON progress.completion_records (tenant_id, user_id, completed_at DESC);
CREATE INDEX completion_by_enrollment ON progress.completion_records (tenant_id, enrollment_id);
2.4 outbox, inbox, sagas
Standard platform patterns.
2.5 activity_state (xAPI State API)
CREATE TABLE progress.activity_state (
tenant_id UUID NOT NULL,
activity_id TEXT NOT NULL,
agent TEXT NOT NULL,
registration TEXT NOT NULL,
state_id TEXT NOT NULL,
content_type TEXT NOT NULL,
body BYTEA NOT NULL,
etag TEXT NOT NULL,
updated_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (tenant_id, activity_id, agent, registration, state_id)
);
3. Indexing Strategy
- Primary query path:
(tenant_id, enrollment_id, timestamp)for transcripts. - Verb queries:
(tenant_id, verb_id, timestamp). - Actor queries:
(tenant_id, user_id, verb_id, timestamp). - Registration queries: functional index on
context->>'registration'.
4. RLS
ALTER TABLE progress.attempts ENABLE ROW LEVEL SECURITY;
CREATE POLICY attempts_tenant_iso ON progress.attempts
USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- same for statements, completion_records, activity_state
5. Partitioning
statementspartitioned monthly bytimestamp.- Hot months (current + last 2): hash sub-partitioned by tenant (16 ways).
- Old partitions: compressed + read-only; moved to cold tier after 180 days.
- Retention: hot 180 d, cold 7 years (regulated class).
6. Migration Strategy
- Add column: nullable + backfill.
- Add index
CONCURRENTLY. - Retire old partitions:
DETACH PARTITION+ archive to S3 Parquet. - Schema changes require replay test (can rebuild projections from event log).
7. Performance Considerations
- Insert throughput: 10k stmt/sec/region sustained.
- Query optimization:
- Transcript queries use covering indexes.
- Analytics reads go through replicas.
- Complex aggregates run in analytics-service (ClickHouse), not LRS.
- Write amplification: event emit on every statement; outbox batched.
- Batch inserts for offline replay (COPY protocol).
8. Backup & Retention
- WAL archiving per region.
- Daily snapshots; restore tested quarterly.
- Retention per
retentionClass:- Operational attempts/statements: 180 days hot.
- Regulated (compliance-relevant): 7 years cold archive.
- Legal hold overrides GDPR erasure for tax/legal.
9. Data Classification
- Statement
actor+contextmay contain PHI (healthcare tenants). Mark tables Restricted. bodyin activity_state may contain quiz responses → Confidential.- Redact PII from logs; full PII only in audit log.