Skip to main content

Provider Directory Service — Observability

Status: populated Owner: TBD Last updated: 2026-04-17

1. SLIs

SLIDefinition
search_latency_p95_msP95 of GET /practitioners?q=
privilege_check_p99_msP99 of /internal/.../privileges
credential_expiry_publish_lagTime from scheduled window to NATS publish
endpoint_healthcheck_success_rateSuccessful probes / total
outbox_lag_seconds
service_availability

2. SLOs

SLOTarget
search_latency_p95≤ 500 ms
privilege_check_p99≤ 30 ms
outbox_lag p99≤ 10 s
credential_expiry publishwithin 15 min of scheduled time
availability≥ 99.9% monthly

3. Metrics

MetricTypeLabels
provider_directory_http_request_duration_secondshistogramroute, status, tenant
provider_directory_search_hits_totalcountertenant
provider_directory_credentials_expiring_totalgaugedays_ahead, tenant
provider_directory_endpoint_healthgaugeendpoint_id, status
provider_directory_outbox_lag_secondsgauge

4. Dashboards

DashboardPanels
Provider Dir — Hot Pathprivilege check p99, search p95, QPS
Provider Dir — Credentialsexpiring 60/30/7, expired today, expiry notification lag
Provider Dir — Endpoint Healthsuccess rate, error rate per endpoint

5. Alerts

AlertThresholdAction
privilege_check_p99 > 100ms5mpage
outbox_lag > 30s10mpage
credential_expiry job failed 2xpage
endpoint_health fail rate > 50%15mwarn
search_latency_p95 > 1s10mwarn

6. Tracing

Spans: provider_directory.practitioner.search, .credential.lifecycle, .role.assign, .privilege.resolve, .endpoint.healthcheck, .outbox.publish.

7. Runbooks

  • provider-dir-expiring-cred-not-notified.md
  • provider-dir-endpoint-health-down.md
  • provider-dir-search-slow.md