Skip to main content

Testing

:::info Source Sourced from services/assessment-service/TESTING_STRATEGY.md in the documentation repo. :::

1. Coverage Targets

Domain 95% line / 98% branch / 80% mutation. Integration 80% line.

2. Unit Tests

  • GradingRule evaluation: pass/fail thresholds, weighted scoring.
  • QuizBank invariants: unique question IDs, at least one correct option per MCQ.
  • BranchingScenario invariants: DAG (no cycles), max depth 50, reachable terminal nodes.
  • AttemptResult integrity hash computation; roundtrip verification.
  • RubricCriterion AI vs human agreement threshold.
  • Randomization seed determinism: same (attemptId, poolId) → same order.

3. Integration Tests

  • Postgres + NATS + mock AI gateway.
  • Flow: create quiz → serve randomized → submit responses → score → emit event.
  • Flow: AI-generate question → review → accept → publish.
  • Flow: AI rubric grade → below confidence → routes to human reviewer.
  • Flow: branching scenario → 5-level deep navigation → reach terminal → score computed.

4. Contract Tests

  • assessment → progress: assessment.attempt_result.scored.v1 Pact.
  • assessment ← authoring: consumes authoring.block.added.v1 for QuizBlockRef.
  • OpenAPI diff in CI.

5. E2E

  • J-08: author creates quiz + AI-assisted → publishes → learner takes → scored → progress event.
  • J-09: branching scenario with 5 paths → each path exercised.
  • J-10: AI rubric grading → human override → learner appeal → resolution.

6. Load Tests

  • 10k concurrent quiz-serve requests → p95 < 200ms.
  • 1k/sec scoring throughput.
  • AI rubric grading: 100 concurrent; queue stable.

7. Chaos

  • Fail AI gateway → local model fallback for question generation; degraded UX for rubric grading.
  • Corrupt answer key blob → detected on decrypt; attempt fails with diagnostic.
  • Kill pod mid-scoring → resume via idempotency.

8. Offline Tests

  • Bundle with encrypted quiz + answer key mounts offline.
  • Learner takes quiz offline; scoring local; result queued.
  • Reconnect → result synced; server validates integrity hash.
  • Tampered bundle → cannot decrypt; scoring fails offline; event queued.

9. AI Safety Tests

  • Red-team corpus: prompt injection in lesson content → generator refuses.
  • PII in response → redacted before grading.
  • Bias eval: grade demographic-parity on sample corpus.
  • Regression: every prompt version bumps require eval pass.

10. Security Tests

  • Cross-tenant quiz access attempt → 403.
  • Answer key in response payload → CI grep catches.
  • Replay submission → idempotent.
  • Attempt integrity hash tampered → rejected.

11. Branching DAG Tests

  • Random-scenario property tests: any generated scenario must be a valid DAG.
  • Cycle detection test fixtures.

12. CI Gates

  • Unit + integration green.
  • AI prompt eval pass.
  • Two-tenant iso green.
  • Mutation ≥ 80% on domain.
  • OpenAPI diff + Pact verified.