Provider Directory Service — Observability
Status: populated
Owner: TBD
Last updated: 2026-04-17
1. SLIs
| SLI | Definition |
|---|
search_latency_p95_ms | P95 of GET /practitioners?q= |
privilege_check_p99_ms | P99 of /internal/.../privileges |
credential_expiry_publish_lag | Time from scheduled window to NATS publish |
endpoint_healthcheck_success_rate | Successful probes / total |
outbox_lag_seconds | — |
service_availability | — |
2. SLOs
| SLO | Target |
|---|
| search_latency_p95 | ≤ 500 ms |
| privilege_check_p99 | ≤ 30 ms |
| outbox_lag p99 | ≤ 10 s |
| credential_expiry publish | within 15 min of scheduled time |
| availability | ≥ 99.9% monthly |
3. Metrics
| Metric | Type | Labels |
|---|
provider_directory_http_request_duration_seconds | histogram | route, status, tenant |
provider_directory_search_hits_total | counter | tenant |
provider_directory_credentials_expiring_total | gauge | days_ahead, tenant |
provider_directory_endpoint_health | gauge | endpoint_id, status |
provider_directory_outbox_lag_seconds | gauge | — |
4. Dashboards
| Dashboard | Panels |
|---|
Provider Dir — Hot Path | privilege check p99, search p95, QPS |
Provider Dir — Credentials | expiring 60/30/7, expired today, expiry notification lag |
Provider Dir — Endpoint Health | success rate, error rate per endpoint |
5. Alerts
| Alert | Threshold | Action |
|---|
| privilege_check_p99 > 100ms | 5m | page |
| outbox_lag > 30s | 10m | page |
| credential_expiry job failed 2x | — | page |
| endpoint_health fail rate > 50% | 15m | warn |
| search_latency_p95 > 1s | 10m | warn |
6. Tracing
Spans: provider_directory.practitioner.search, .credential.lifecycle, .role.assign, .privilege.resolve, .endpoint.healthcheck, .outbox.publish.
7. Runbooks
provider-dir-expiring-cred-not-notified.md
provider-dir-endpoint-health-down.md
provider-dir-search-slow.md