Skip to main content

Testing

:::info Source Sourced from services/assessment-service/TESTING_STRATEGY.md in the documentation repo. :::

1. Coverage Targets

Domain 95% line / 98% branch / 80% mutation. Integration 80% line.

2. Unit Tests

GradingRule evaluation: pass/fail thresholds, weighted scoring.
QuizBank invariants: unique question IDs, at least one correct option per MCQ.
BranchingScenario invariants: DAG (no cycles), max depth 50, reachable terminal nodes.
AttemptResult integrity hash computation; roundtrip verification.
RubricCriterion AI vs human agreement threshold.
Randomization seed determinism: same (attemptId, poolId) → same order.

3. Integration Tests

Postgres + NATS + mock AI gateway.
Flow: create quiz → serve randomized → submit responses → score → emit event.
Flow: AI-generate question → review → accept → publish.
Flow: AI rubric grade → below confidence → routes to human reviewer.
Flow: branching scenario → 5-level deep navigation → reach terminal → score computed.

4. Contract Tests

assessment → progress: assessment.attempt_result.scored.v1 Pact.
assessment ← authoring: consumes authoring.block.added.v1 for QuizBlockRef.
OpenAPI diff in CI.

5. E2E

J-08: author creates quiz + AI-assisted → publishes → learner takes → scored → progress event.
J-09: branching scenario with 5 paths → each path exercised.
J-10: AI rubric grading → human override → learner appeal → resolution.

6. Load Tests

10k concurrent quiz-serve requests → p95 < 200ms.
1k/sec scoring throughput.
AI rubric grading: 100 concurrent; queue stable.

7. Chaos

Fail AI gateway → local model fallback for question generation; degraded UX for rubric grading.
Corrupt answer key blob → detected on decrypt; attempt fails with diagnostic.
Kill pod mid-scoring → resume via idempotency.

8. Offline Tests

Bundle with encrypted quiz + answer key mounts offline.
Learner takes quiz offline; scoring local; result queued.
Reconnect → result synced; server validates integrity hash.
Tampered bundle → cannot decrypt; scoring fails offline; event queued.

9. AI Safety Tests

Red-team corpus: prompt injection in lesson content → generator refuses.
PII in response → redacted before grading.
Bias eval: grade demographic-parity on sample corpus.
Regression: every prompt version bumps require eval pass.

10. Security Tests

Cross-tenant quiz access attempt → 403.
Answer key in response payload → CI grep catches.
Replay submission → idempotent.
Attempt integrity hash tampered → rejected.

11. Branching DAG Tests

Random-scenario property tests: any generated scenario must be a valid DAG.
Cycle detection test fixtures.

12. CI Gates

Unit + integration green.
AI prompt eval pass.
Two-tenant iso green.
Mutation ≥ 80% on domain.
OpenAPI diff + Pact verified.

1. Coverage Targets
2. Unit Tests
3. Integration Tests
4. Contract Tests
5. E2E
6. Load Tests
7. Chaos
8. Offline Tests
9. AI Safety Tests
10. Security Tests
11. Branching DAG Tests
12. CI Gates