Skip to main content

AI Integration

:::info Source Sourced from services/assessment-service/AI_INTEGRATION.md in the documentation repo. :::

1. AI Capabilities

CapabilityPrompt IDClassification
Generate quiz question from lesson contentassessment.quiz.generateLimited-risk
Generate distractors (wrong answers)assessment.quiz.distractorsLimited-risk
Rubric grading (open-ended responses)assessment.rubric.gradeHigh-risk (EU AI Act) — requires human override
Branching scenario path suggestionassessment.scenario.nextLimited-risk
Adaptive difficulty (M5+)assessment.adaptive.selectLimited-risk

All calls via AIClient port (F09). All artifacts carry AIProvenance.

2. Rubric Grading (High-Risk)

Per EU AI Act:

  • Explicit human review path: AI grade shown; human reviewer confirms/overrides; override rate tracked per instructor.
  • Confidence threshold: AI grade < 0.85 confidence → mandatory human review.
  • Appeal: learners can dispute grade; routes to human reviewer with SLA.
  • Bias monitoring: quarterly eval on demographic-parity + equalized-odds on consenting sample.
  • Post-market monitoring: accuracy eval vs human-graded ground truth.

3. Prompts & Eval

  • assessment.quiz.generate — eval suite: 500 question generations × 5 subjects; human-rated for accuracy, clarity, difficulty calibration.
  • assessment.quiz.distractors — eval: plausibility + non-overlapping correctness.
  • assessment.rubric.grade — eval: inter-rater reliability with human graders ≥ 0.85 Cohen's kappa.
  • assessment.scenario.next — eval: path-coherence + learner-objective alignment.

Eval sets stored in ai-gateway; run on every prompt version bump.

4. Provenance

{
model: "gpt-4o" | "claude-opus-4-6" | "local-llama-3-8b",
promptId: "assessment.quiz.generate",
promptVersion: "1.3.0",
traceId: "...",
decisionId: "...", // HITL acceptance record (for generated questions)
local: false,
generatedAt: "...",
reviewedBy: UserId, // author who accepted the generated question
reviewedAt: "...",
cost: { microUSD, tokens: { in, out } }
}

5. Safety

  • Pre-call: moderation (sexual, violence, hate, self-harm, illegal); PII redaction.
  • Post-call: output schema validation (question + 4 options + correct index + explanation).
  • Refusal: shown to author with reason; fallback to manual authoring.

6. Cost Controls

  • Per-tenant AI budget enforced by ai-gateway.
  • Quiz generation: ~500 tokens in, ~300 out per question. 1000 questions ≈ 800k tokens.
  • Caching: same lesson-content hash + prompt version → cached result.

7. Local vs Cloud

  • Question generation: cloud preferred (quality); local fallback when budget exhausted.
  • Rubric grading: cloud only (high-risk, quality paramount).
  • Branching suggestion: local acceptable.

8. Right to Explanation

  • AI-graded responses include explanation field: rationale + cited rubric criteria + AI confidence.
  • Displayed to learner in gradebook; learner can trigger human review.

9. Data Privacy

  • Learner response text redacted of PII before sending to AI grader.
  • Provider configured with noTrain = true; verified via integration test.
  • HIPAA tenants: restricted to on-premise/EU providers with BAA.