Observability
:::info Source
Sourced from services/assessment-service/OBSERVABILITY.md in the documentation repo.
:::
1. Logs
Events: assessment.quiz.served, .response.submitted, .attempt.scored, .ai.question.generated, .ai.rubric.graded, .grade.overridden, .appeal.submitted, .scenario.navigated.
Attrs: quiz_bank_id, attempt_id, question_id, scaled_score, passed, scoring_duration_ms. Redact responses.
2. Metrics
RED
assessment_api_requests_total{endpoint,status}— counterassessment_api_duration_seconds{endpoint}— histogram
Domain
assessment_attempts_scored_total{outcome}— counterassessment_scoring_duration_seconds— histogram (target p95 < 500ms)assessment_ai_grade_confidence— histogramassessment_ai_grade_override_rate{instructor_id_hash}— gaugeassessment_appeal_rate— gaugeassessment_questions_generated_total{accepted:bool}— counterassessment_answer_key_decrypt_latency_seconds— histogram
3. Traces
Spans: assessment.score_attempt, assessment.grade_rubric_with_ai, assessment.generate_question, assessment.branch.navigate.
4. Dashboards
- Scoring throughput + latency.
- AI grading: confidence distribution, override rate, appeal rate.
- Quiz authoring: AI accept-rate per author.
- Branching scenario: completion rate by scenario, path distribution.
5. Alerts
| Alert | Threshold | Severity |
|---|---|---|
| scoring-latency-high | p95 > 2s | P2 |
| ai-override-rate-spike | > 30% override for a prompt version | P2 |
| appeal-rate-spike | > 5% appeals for a quiz | P3 |
| answer-key-decrypt-failure | > 1/min | P1 |
| dlq-non-empty | any | P2 |
6. SLOs
| SLI | Target |
|---|---|
| Quiz serve p95 | < 200ms |
| Scoring p95 | < 500ms |
| AI grade p95 | < 3s |
| Scoring success rate | 99.99% |
7. RUM
- Quiz page LCP < 1.5s online; < 600ms offline.
- Response submission INP < 200ms.