Scheduling Service — Observability
Status: populated Owner: TBD Last updated: 2026-04-17 Companion: Service Template · 12 observability
1. SLIs and SLOs
| SLI | SLO Target | Metric |
|---|---|---|
| Availability search p95 latency | < 1 000 ms | scheduling.slot.availability.duration_p95 |
| Appointment create p95 latency | < 500 ms | scheduling.appointment.create.duration_p95 |
| Reminder dispatch success rate | ≥ 99% | scheduling.reminder.dispatch_success_rate |
| Service availability | ≥ 99.9% (30-day) | Uptime probe |
| Outbox publish success rate | ≥ 99.5% | scheduling.outbox.publish_success_rate |
2. OpenTelemetry Instrumentation
| Signal | Key names |
|---|---|
| Traces | scheduling.bookAppointment, scheduling.searchAvailability, scheduling.cancelAppointment, scheduling.dispatchReminder |
| Metrics | scheduling_appointments_created_total, scheduling_cancellations_total, scheduling_noshows_total, scheduling_reminders_sent_total, scheduling_outbox_lag_seconds |
| Logs | Structured JSON; appointmentId, tenantId, actorId; no PHI in log message bodies |
3. Dashboards
| Dashboard | Panels |
|---|---|
| Scheduling Overview | Booking rate, cancellation rate, no-show rate, waitlist size |
| Performance | p50/p95/p99 for availability search, booking |
| Reminder Pipeline | Dispatch rate, retry rate, failure rate by channel |
| Event Health | Outbox lag, publish success/failure |
4. Alerts
| Alert | Threshold | Severity | Runbook |
|---|---|---|---|
| Availability search p95 > 1 000 ms | 5-min sustained | Warning | runbooks/scheduling-slow-search.md |
| Service error rate > 1% | 5-min window | Critical | runbooks/scheduling-error-spike.md |
| Reminder dispatch failure rate > 5% | 10-min window | Warning | runbooks/scheduling-reminder-failure.md |
| Outbox lag > 30 s | Any time | Warning | runbooks/scheduling-outbox-lag.md |
| Pod crash loop | 2 restarts / 10 min | Critical | runbooks/scheduling-pod-crash.md |
5. Health Endpoints
| Endpoint | Purpose |
|---|---|
GET /health/live | Liveness probe |
GET /health/ready | Readiness probe — DB + NATS connected |