Skip to main content

Registration Service — Observability

Status: populated Owner: TBD Last updated: 2026-04-17 Companion: Service Template · 12 observability

1. SLIs and SLOs

SLITarget (SLO)Measurement
Patient search p95 latency< 1 000 msKong latency histogram registration.patient.search.duration_p95
Patient create p95 latency< 500 msregistration.patient.create.duration_p95
MPI scoring p95 duration< 2 000 msregistration.mpi.score.duration_p95
Service availability≥ 99.9% (30-day rolling)Uptime probe registration.health
Event publish success rate≥ 99.5%Outbox relay metric registration.outbox.publish_success_rate
Encounter status transition p95< 200 msregistration.encounter.transition.duration_p95

2. OpenTelemetry Instrumentation

SignalKey metric / span name
Tracesregistration.registerPatient, registration.searchPatients, registration.mergePatient, registration.mpiScore
Metricsregistration_patient_creates_total, registration_patient_searches_total, registration_mpi_duplicates_total, registration_merges_total, registration_outbox_lag_seconds
LogsStructured JSON; patientId, tenantId, actorId included on all log lines; PII excluded from logs

3. Dashboards

DashboardKey panels
Registration OverviewCreate rate, search rate, MPI duplicate rate, merge rate
Performancep50/p95/p99 for create, search, MPI
MPI HealthDuplicate detection rate, score distribution histogram
Event HealthOutbox lag, publish success/failure rate
Error Rate4xx/5xx by endpoint; DUPLICATE_DETECTED rate

4. Alerts

AlertThresholdSeverityRunbook
Patient search p95 > 1 000 ms5-min sustainedWarningrunbooks/registration-slow-search.md
Service error rate > 1%5-min windowCriticalrunbooks/registration-error-spike.md
Outbox lag > 30 sAny timeWarningrunbooks/registration-outbox-lag.md
MPI false-positive rate spike> 2× baseline (7-day)Warningrunbooks/registration-mpi-calibration.md
Pod crash loop2 restarts / 10 minCriticalrunbooks/registration-pod-crash.md
DB connection pool exhaustion> 80% usedWarningrunbooks/registration-db-pool.md

5. Health Endpoints

EndpointPurpose
GET /health/liveLiveness probe — service alive
GET /health/readyReadiness probe — DB + NATS connected