Skip to main content

Failure Modes

:::info Source Sourced from services/catalog-service/FAILURE_MODES.md in the documentation repo. :::

1. Scenarios

1.1 Duplicate CourseVersion Registration

  • Mitigation: unique (courseId, versionLabel); idempotent insert.

1.2 Taxonomy Tree Corruption

  • Mitigation: tree invariants at write; nightly integrity job.

1.3 Missing PlayPackage Reference

  • Cause: race between content + catalog.
  • Mitigation: validate playPackageRef resolvable; retry with backoff.

1.4 Slug Collision

  • Mitigation: UNIQUE constraint; user sees error with suggested alternatives.

1.5 Withdrawal Cascade Delay

  • Cause: search/marketplace lag consuming withdraw event.
  • Mitigation: event-driven; typically < 60s; worst-case explicit re-publish.

2. Retry / Backoff

OpMaxBackoff
Postgres write310ms–200ms
Outboxinfiniteexp cap 5m

3. Fallbacks

  • If CDN cache stale, direct Postgres read.

4. Chaos

  • Duplicate publish event → single version registered.
  • Event order jumble → idempotent final state.