Slice Risk Register
:::info Source
Sourced from docs/roadmap/slice-risk-register.md in the documentation repo.
:::
Execution-layer companion to ROADMAP.md and 14 Risks & Trade-offs.
Risks framed per slice (S0–S6) so risk discussions are scheduled alongside work that might trigger them. Each row: severity (S1 critical · S2 high · S3 medium · S4 low), impact, mitigation, owner, dependencies.
S0 — Platform Foundation
| ID | Risk | Sev | Impact | Mitigation | Owner | Dependency |
|---|---|---|---|---|---|---|
| S0-R1 | Tenant isolation regression | S1 | Cross-tenant data leak; contract terminations | Two-tenant CI suite; mandatory code review on any RLS policy change; pen-test; RLS bypass tests | Platform + Security | Postgres RLS framework; JWT + RequestContext |
| S0-R2 | Event envelope drift | S2 | Services fall out of sync; Pact breakage; refactor avalanche | Envelope frozen + schema registry CI gate; ADR for any change | Platform | Schema registry |
| S0-R3 | AI gateway port contract churn | S2 | 19 services refactor | AIClient port frozen; version rule additive-only; adapter abstraction | AI Services | AI adapter tests |
| S0-R4 | Sync protocol churn | S2 | Every client rebuilds; offline bundles invalidated | /sync/v1/ frozen; additive-only | Sync + Platform | Sync protocol ADR |
| S0-R5 | KMS mis-configuration | S1 | Loss of data confidentiality; DR failure | KMS key hierarchy + rotation design reviewed by Security; DR drill | Security + Platform | KMS vendor selection |
| S0-R6 | OpenTelemetry overhead | S3 | Unexpected latency or cost | Sampling; async exporters; dashboards for OTel health | SRE | — |
| S0-R7 | Two-tenant test suite gaps | S1 | Silent leaks slip past CI | Matrix-test every endpoint; property-based tests | Platform + Security | CI infra |
| S0-R8 | Over-building the foundation | S2 | Time-to-M1 slips | Strict M0 scope doc; weekly backlog review | PM + Platform lead | — |
S1 — Minimal Learner (M1)
| ID | Risk | Sev | Impact | Mitigation | Owner | Dependency |
|---|---|---|---|---|---|---|
| S1-R1 | Offline bundle tamper/device-binding bug | S1 | Content piracy; license bypass | AES-256-GCM per-device derivation; JWS signing; tamper CI fixtures; bundle chaos tests | Content + Security | KMS per tenant; device cert |
| S1-R2 | AI tutor hallucination at learner surface | S1 | Wrong answers in compliance training; regulatory exposure | RAG over lesson context; refusal UX; citation of cited blocks; quarterly accuracy eval | AI + Learning | Prompt registry; eval harness |
| S1-R3 | Local model quality gap | S2 | Offline UX feels degraded | "Local model" badge; cloud-refresh CTA; quality eval per release | AI + Mobile Platform | Local-inference SDK |
| S1-R4 | PlayPackage schema late freeze | S2 | Player + Content-Packaging diverge | Freeze before M1 sprint 1; shared TS types | Content-Packaging + Learner | Block schema |
| S1-R5 | License envelope expiry UX ambiguity | S2 | Learners blocked without explanation | Clear countdown UX; proactive refresh on sync | Learner + Design | Sync service |
| S1-R6 | Statement outbox overflow on long offline periods | S2 | Lost statements | Chunked push; client-side caps; backpressure UX | Sync + Progress | IndexedDB quotas |
| S1-R7 | Multi-device cursor resolution bug | S2 | Learner confused about progress | max(cursor) reconciliation + tests; audit each reconciliation | Learner + Sync | Vector clock |
| S1-R8 | Accessibility regressions on player | S2 | WCAG 2.2 AA failure | axe in CI; manual NVDA + VoiceOver per release; reduced-motion toggle | Design + Learner FE | — |
| S1-R9 | Capacitor ↔ web parity gaps | S3 | Bugs appear only on mobile | Shared E2E fixtures; device farm tests | Mobile Platform + QA | Device farm |
| S1-R10 | Design partners insufficient diversity | S3 | Missed feedback from regulated/remote users | Curate partner cohort (regulated, field, multilingual) | PM + Sales | — |
S2 — Authoring MVP + AI Co-Author MVP (M2 first half)
| ID | Risk | Sev | Impact | Mitigation | Owner | Dependency |
|---|---|---|---|---|---|---|
| S2-R1 | Publish saga half-failures | S1 | Orphan CourseVersions, broken catalog | Explicit compensations; chaos tests at every step; admin queue; saga state machine tests | Platform + Authoring + Content | Saga infra |
| S2-R2 | Block registry rushed | S1 | Block kind shape churn + rework | Block schema RFC + freeze at M2 start; new kinds additive only | Authoring + Architecture | Block schema ADR |
| S2-R3 | AI co-author accept rate low | S2 | Low adoption, wasted AI spend | Prompt regression gate at 50 % accept; user-research cadence | AI + Authoring | Eval harness |
| S2-R4 | Provenance UI complexity | S3 | Admins ignore AI transparency | Progressive disclosure in UI; badge always visible | Design + Authoring FE | — |
| S2-R5 | Media pipeline bottleneck on transcode | S2 | Slow author feedback loop | Worker pool; backpressure UX; inline low-res preview | Media | — |
| S2-R6 | Customer content shape surprises | S3 | Real content breaks block validators | Partner beta with real content before freeze | Authoring + PM | — |
| S2-R7 | Publish saga retries exhaust AI budget | S3 | Unexpected AI cost spike | Idempotent AI calls; cache by prompt-hash; retry caps | AI + Authoring | — |
S3 — Marketplace MVP (M2 second half)
| ID | Risk | Sev | Impact | Mitigation | Owner | Dependency |
|---|---|---|---|---|---|---|
| S3-R1 | Payment compliance gaps | S1 | PCI incident; processor termination | Tokenized cards only; PCI scope minimized; processor-abstract ACL | Commerce + Security | Processor sandbox |
| S3-R2 | Refund edge cases leak seats | S2 | Provider disputes; partial refunds wrong | Refund policy DSL + unit-tested matrix; refund-after-seat-consumed rule | Commerce + Legal | — |
| S3-R3 | SCORM 1.2 conformance regression | S2 | 3rd-party LMS rejects zips | SCORM Cloud in CI every build; fixture courses | Content-Packaging | SCORM Cloud account |
| S3-R4 | Webhook replay storms from customers | S3 | DLQ + alert fatigue | Backoff + DLQ + dashboards; per-subscription limits | Comms + SRE | — |
| S3-R5 | Marketplace low-quality listings at launch | S2 | Brand damage | AI moderation + human review; provider onboarding standards | Commerce + AI | Moderation pipeline |
| S3-R6 | Purchase saga split-brain with licensing | S1 | Payment without license or license without payment | Idempotent saga + compensations + reconciliation job | Commerce + Platform | — |
| S3-R7 | Public certificate verify abused for scraping | S3 | Data harvest | Rate limit + bot mitigation + verification-token TTL scheme | Certification + Security | — |
S4 — Compliance + Enterprise (M3)
| ID | Risk | Sev | Impact | Mitigation | Owner | Dependency |
|---|---|---|---|---|---|---|
| S4-R1 | RRULE + timezone correctness | S2 | Wrong due dates; compliance failures | 1 000-fixture suite incl. DST + leap; TZ matrix tests | Enterprise | RRULE engine |
| S4-R2 | SAML edge cases per IdP | S2 | Enterprise deals stall | Test Okta, Azure AD, Google, custom ADFS, Auth0 | Enterprise + Platform | IdP test accounts |
| S4-R3 | ABAC policy complexity breeds mis-grants | S1 | Data leak within tenant | Policy linter; sample-data tests; UI shows plain-language policy | Platform | ABAC DSL |
| S4-R4 | AI grading fairness | S1 | Discrimination claims | Bias eval; human override; EU AI Act high-risk docs; external audit | AI + Compliance | Eval corpus |
| S4-R5 | PDF→course quality variable | S2 | Authors reject AI output | Confidence thresholds; chunk-level accept/reject; fallback to outline-only | AI + Authoring | — |
| S4-R6 | Recurrence storm (many tenants activate on same day) | S2 | Notification burst + queue overload | Jitter materialization; batch send; backpressure | Enterprise + Comms | — |
| S4-R7 | GDPR erasure saga drift | S1 | Erasure incomplete; regulator risk | Every service declares participation; CI gate; saga replay tests | Platform + Compliance | GDPR saga contract |
| S4-R8 | SCORM 2004 + xAPI conformance misses | S2 | Regulated market rejections | ADL suite in CI; cmi5 profile tests | Content-Packaging | ADL LRS |
| S4-R9 | Enterprise procurement delays | S2 | Revenue slips | SOC 2 Type I + DPA + BAA templates ready; reference customers | Enterprise + Legal | SOC 2 auditor |
S5 — Full Authoring + Offline Authoring (M4)
| ID | Risk | Sev | Impact | Mitigation | Owner | Dependency |
|---|---|---|---|---|---|---|
| S5-R1 | Offline authoring conflict UX | S1 | Data loss perception | Pre-merge backup; side-by-side diff; AI merge suggestion; 30-day backup retention | Authoring + Sync | Conflict UI |
| S5-R2 | Yjs doc corruption | S2 | Collab session lost | Periodic snapshots; replay from event log; conflict repair tooling | Authoring | Yjs persistence |
| S5-R3 | Live-collab latency across regions | S2 | UX feels laggy | Regional WS endpoints; presence throttle; awareness compression | Authoring + SRE | — |
| S5-R4 | AI image/TTS content-safety + copyright | S2 | Legal exposure | Content-safety pipeline; provenance on every asset; copyright-risk classifier | Media + AI + Legal | — |
| S5-R5 | LTI 1.3 interop quirks | S2 | Embedding deals stall | LTI conformance tests; partner sandbox | Enterprise + Tenant | LTI tooling |
| S5-R6 | Block taxonomy bloat | S2 | Editor UX complexity | Governance board; usage telemetry; quarterly prune | Authoring + Design | — |
| S5-R7 | Hybrid search ranker quality | S2 | Low relevance | Eval with user-judged pairs; A/B ranker rollout | Data/AI + Search | — |
| S5-R8 | AI translation errors on regulated terminology | S2 | Legal risk | Per-tenant glossaries; reviewer required; legal-language flag | AI + Authoring | Glossary tooling |
S6 — Scale + Advanced Insight + Mobile (M5)
| ID | Risk | Sev | Impact | Mitigation | Owner | Dependency |
|---|---|---|---|---|---|---|
| S6-R1 | Multi-region data residency migration bugs | S1 | Data loss or cross-region leakage | Rehearsals on production-size fixture; checksum verification; rollback path; saga tests | Platform + SRE + all services | Residency saga |
| S6-R2 | HIPAA provider allowlist enforcement | S1 | BAA non-compliance | Tenant-tagged routing; CI gate on provider list; audit export | AI Services + Compliance | BAA contracts |
| S6-R3 | Mobile native regressions from platform updates | S2 | App-store rejection | Device farm; beta channel; staged rollout | Mobile Platform + QA | Device farm |
| S6-R4 | Marketplace abuse at scale | S2 | Brand damage; fraud loss | AI moderation v2; provider deposits; fraud-signal monitoring | Commerce + Security + AI | — |
| S6-R5 | White-label CSP scoping bugs | S2 | XSS across tenants | Per-tenant CSP + nonce; isolated subdomain + cookie scoping | Platform + Security | — |
| S6-R6 | Developer SDK breaking-change temptations | S2 | Integrator churn | Semver strictness; deprecation policy; communication channels | DevEx + PM | SDK governance |
| S6-R7 | At-risk prediction model bias | S1 | Unfair interventions | Quarterly bias eval; feature exclusion list; human-only override; opt-out | Data/AI + Compliance | Eval corpus |
| S6-R8 | ISO 27001 certification scope mismatch | S2 | Audit fail | Control mapping exercise early; internal audit pass | Compliance + SRE | Auditor |
Slice-Independent / Cross-Cutting Risks
| ID | Risk | Sev | Impact | Mitigation | Owner |
|---|---|---|---|---|---|
| X-R1 | AI cost runaway | S1 | Surprise bills | Per-tenant budgets + soft-degrade + hard-stop + alerts | AI Services + Finance |
| X-R2 | Over-eager AI defaults reduce trust | S2 | Users distrust product | Default OFF per tenant; per-feature opt-in; transparent provenance | AI + Design |
| X-R3 | Schema drift across services | S2 | Pact breakage | Schema registry; CI gate; weekly producer review | Platform |
| X-R4 | Solo on-call burnout | S2 | Incident response quality drops | Rotation; buddy system; post-incident reviews weekly | SRE |
| X-R5 | Regional compliance surprises | S2 | Launch blockers | Legal-review per geo before launch | Legal + PM |
| X-R6 | Pilot feedback overwrites roadmap | S3 | Scope creep | PM triage; feedback lands in backlog with slice assignment | PM |
Governance
- Weekly risk review: each team owner presents new, changed, or closed risks.
- Quarterly architecture risk review: top 10 cross-cutting risks reviewed by CTO + architecture.
- Every S1 risk has a named owner, a due date for mitigation, and a verification plan.
- Acceptance criteria for S1/S2 risks to be "closed" includes: mitigation shipped + metric(s) monitored + post-mitigation verification test documented.