Registration Service — Observability
Status: populated Owner: TBD Last updated: 2026-04-17 Companion: Service Template · 12 observability
1. SLIs and SLOs
| SLI | Target (SLO) | Measurement |
|---|---|---|
| Patient search p95 latency | < 1 000 ms | Kong latency histogram registration.patient.search.duration_p95 |
| Patient create p95 latency | < 500 ms | registration.patient.create.duration_p95 |
| MPI scoring p95 duration | < 2 000 ms | registration.mpi.score.duration_p95 |
| Service availability | ≥ 99.9% (30-day rolling) | Uptime probe registration.health |
| Event publish success rate | ≥ 99.5% | Outbox relay metric registration.outbox.publish_success_rate |
| Encounter status transition p95 | < 200 ms | registration.encounter.transition.duration_p95 |
2. OpenTelemetry Instrumentation
| Signal | Key metric / span name |
|---|---|
| Traces | registration.registerPatient, registration.searchPatients, registration.mergePatient, registration.mpiScore |
| Metrics | registration_patient_creates_total, registration_patient_searches_total, registration_mpi_duplicates_total, registration_merges_total, registration_outbox_lag_seconds |
| Logs | Structured JSON; patientId, tenantId, actorId included on all log lines; PII excluded from logs |
3. Dashboards
| Dashboard | Key panels |
|---|---|
| Registration Overview | Create rate, search rate, MPI duplicate rate, merge rate |
| Performance | p50/p95/p99 for create, search, MPI |
| MPI Health | Duplicate detection rate, score distribution histogram |
| Event Health | Outbox lag, publish success/failure rate |
| Error Rate | 4xx/5xx by endpoint; DUPLICATE_DETECTED rate |
4. Alerts
| Alert | Threshold | Severity | Runbook |
|---|---|---|---|
| Patient search p95 > 1 000 ms | 5-min sustained | Warning | runbooks/registration-slow-search.md |
| Service error rate > 1% | 5-min window | Critical | runbooks/registration-error-spike.md |
| Outbox lag > 30 s | Any time | Warning | runbooks/registration-outbox-lag.md |
| MPI false-positive rate spike | > 2× baseline (7-day) | Warning | runbooks/registration-mpi-calibration.md |
| Pod crash loop | 2 restarts / 10 min | Critical | runbooks/registration-pod-crash.md |
| DB connection pool exhaustion | > 80% used | Warning | runbooks/registration-db-pool.md |
5. Health Endpoints
| Endpoint | Purpose |
|---|---|
GET /health/live | Liveness probe — service alive |
GET /health/ready | Readiness probe — DB + NATS connected |