AI Integration

:::info Source Sourced from services/assessment-service/AI_INTEGRATION.md in the documentation repo. :::

1. AI Capabilities

Capability	Prompt ID	Classification
Generate quiz question from lesson content	`assessment.quiz.generate`	Limited-risk
Generate distractors (wrong answers)	`assessment.quiz.distractors`	Limited-risk
Rubric grading (open-ended responses)	`assessment.rubric.grade`	High-risk (EU AI Act) — requires human override
Branching scenario path suggestion	`assessment.scenario.next`	Limited-risk
Adaptive difficulty (M5+)	`assessment.adaptive.select`	Limited-risk

All calls via AIClient port (F09). All artifacts carry AIProvenance.

2. Rubric Grading (High-Risk)

Per EU AI Act:

Explicit human review path: AI grade shown; human reviewer confirms/overrides; override rate tracked per instructor.
Confidence threshold: AI grade < 0.85 confidence → mandatory human review.
Appeal: learners can dispute grade; routes to human reviewer with SLA.
Bias monitoring: quarterly eval on demographic-parity + equalized-odds on consenting sample.
Post-market monitoring: accuracy eval vs human-graded ground truth.

3. Prompts & Eval

assessment.quiz.generate — eval suite: 500 question generations × 5 subjects; human-rated for accuracy, clarity, difficulty calibration.
assessment.quiz.distractors — eval: plausibility + non-overlapping correctness.
assessment.rubric.grade — eval: inter-rater reliability with human graders ≥ 0.85 Cohen's kappa.
assessment.scenario.next — eval: path-coherence + learner-objective alignment.

Eval sets stored in ai-gateway; run on every prompt version bump.

4. Provenance

{
  model: "gpt-4o" | "claude-opus-4-6" | "local-llama-3-8b",
  promptId: "assessment.quiz.generate",
  promptVersion: "1.3.0",
  traceId: "...",
  decisionId: "...",           // HITL acceptance record (for generated questions)
  local: false,
  generatedAt: "...",
  reviewedBy: UserId,          // author who accepted the generated question
  reviewedAt: "...",
  cost: { microUSD, tokens: { in, out } }
}

5. Safety

Pre-call: moderation (sexual, violence, hate, self-harm, illegal); PII redaction.
Post-call: output schema validation (question + 4 options + correct index + explanation).
Refusal: shown to author with reason; fallback to manual authoring.

6. Cost Controls

Per-tenant AI budget enforced by ai-gateway.
Quiz generation: ~500 tokens in, ~300 out per question. 1000 questions ≈ 800k tokens.
Caching: same lesson-content hash + prompt version → cached result.

7. Local vs Cloud

Question generation: cloud preferred (quality); local fallback when budget exhausted.
Rubric grading: cloud only (high-risk, quality paramount).
Branching suggestion: local acceptable.

8. Right to Explanation

AI-graded responses include explanation field: rationale + cited rubric criteria + AI confidence.
Displayed to learner in gradebook; learner can trigger human review.

9. Data Privacy

Learner response text redacted of PII before sending to AI grader.
Provider configured with noTrain = true; verified via integration test.
HIPAA tenants: restricted to on-premise/EU providers with BAA.

1. AI Capabilities​

2. Rubric Grading (High-Risk)​

3. Prompts & Eval​

4. Provenance​

5. Safety​

6. Cost Controls​

7. Local vs Cloud​

8. Right to Explanation​

9. Data Privacy​