AI Integration
:::info Source
Sourced from services/assessment-service/AI_INTEGRATION.md in the documentation repo.
:::
1. AI Capabilities
| Capability | Prompt ID | Classification |
|---|---|---|
| Generate quiz question from lesson content | assessment.quiz.generate | Limited-risk |
| Generate distractors (wrong answers) | assessment.quiz.distractors | Limited-risk |
| Rubric grading (open-ended responses) | assessment.rubric.grade | High-risk (EU AI Act) — requires human override |
| Branching scenario path suggestion | assessment.scenario.next | Limited-risk |
| Adaptive difficulty (M5+) | assessment.adaptive.select | Limited-risk |
All calls via AIClient port (F09). All artifacts carry AIProvenance.
2. Rubric Grading (High-Risk)
Per EU AI Act:
- Explicit human review path: AI grade shown; human reviewer confirms/overrides; override rate tracked per instructor.
- Confidence threshold: AI grade < 0.85 confidence → mandatory human review.
- Appeal: learners can dispute grade; routes to human reviewer with SLA.
- Bias monitoring: quarterly eval on demographic-parity + equalized-odds on consenting sample.
- Post-market monitoring: accuracy eval vs human-graded ground truth.
3. Prompts & Eval
assessment.quiz.generate— eval suite: 500 question generations × 5 subjects; human-rated for accuracy, clarity, difficulty calibration.assessment.quiz.distractors— eval: plausibility + non-overlapping correctness.assessment.rubric.grade— eval: inter-rater reliability with human graders ≥ 0.85 Cohen's kappa.assessment.scenario.next— eval: path-coherence + learner-objective alignment.
Eval sets stored in ai-gateway; run on every prompt version bump.
4. Provenance
{
model: "gpt-4o" | "claude-opus-4-6" | "local-llama-3-8b",
promptId: "assessment.quiz.generate",
promptVersion: "1.3.0",
traceId: "...",
decisionId: "...", // HITL acceptance record (for generated questions)
local: false,
generatedAt: "...",
reviewedBy: UserId, // author who accepted the generated question
reviewedAt: "...",
cost: { microUSD, tokens: { in, out } }
}
5. Safety
- Pre-call: moderation (sexual, violence, hate, self-harm, illegal); PII redaction.
- Post-call: output schema validation (question + 4 options + correct index + explanation).
- Refusal: shown to author with reason; fallback to manual authoring.
6. Cost Controls
- Per-tenant AI budget enforced by ai-gateway.
- Quiz generation: ~500 tokens in, ~300 out per question. 1000 questions ≈ 800k tokens.
- Caching: same lesson-content hash + prompt version → cached result.
7. Local vs Cloud
- Question generation: cloud preferred (quality); local fallback when budget exhausted.
- Rubric grading: cloud only (high-risk, quality paramount).
- Branching suggestion: local acceptable.
8. Right to Explanation
- AI-graded responses include
explanationfield: rationale + cited rubric criteria + AI confidence. - Displayed to learner in gradebook; learner can trigger human review.
9. Data Privacy
- Learner response text redacted of PII before sending to AI grader.
- Provider configured with
noTrain = true; verified via integration test. - HIPAA tenants: restricted to on-premise/EU providers with BAA.