Medication Service — Observability
Status: populated
Owner: TBD
Last updated: 2026-04-17
Companion: Service Template
1. SLIs
| SLI | Source |
|---|
| Prescription sign latency | HTTP p50/p95/p99 of POST /medications/{id}/sign |
| Drug KB check latency | Span medication.kb.check duration |
| Dispense latency | HTTP p95 of POST /dispenses |
| Inbound gateway event consume lag | NATS consumer lag metric |
| Outbox relay delivery age | Oldest undelivered event age |
| Safety-check block rate | ratio (blocking alerts fired / signs attempted) |
| Override rate | ratio (overrides / blocking alerts) |
2. SLOs
| SLO | Target | Window |
|---|
| Sign p95 latency | ≤ 1500 ms | 30d |
| Drug KB check p95 | ≤ 700 ms | 30d |
| Dispense p95 | ≤ 1200 ms | 30d |
| Inbound event consume lag p95 | ≤ 5 s | 30d |
| Outbox oldest event age p95 | ≤ 10 s | 30d |
| API availability | 99.9% | 30d |
| Inventory decrement error rate | ≤ 0.1% | 30d |
3. Metrics (OpenTelemetry)
| Metric | Type | Labels |
|---|
medication_sign_total | counter | outcome, blocking_alerts, overrides, tenant |
medication_sign_duration_seconds | histogram | tenant |
medication_dispense_total | counter | outcome, is_partial, is_controlled, tenant |
medication_dispense_duration_seconds | histogram | tenant |
medication_kb_check_duration_seconds | histogram | check_type |
medication_alert_overridden_total | counter | alert_type, severity, tenant |
medication_stock_low_total | gauge | node_id, tenant |
medication_expiry_horizon_total | gauge | days_until_expiry_bucket |
medication_outbox_undelivered | gauge | subject |
medication_cs_dispense_total | counter | schedule, tenant |
4. Traces
Mandatory spans: sign.use_case, dispense.use_case, kb.check, inventory.reserve, gateway.post_dispense, outbox.publish.
correlation_id propagated across EHR → medication-service → gateway → pharmacy events.
5. Logs
Structured JSON. No PHI in logs — IDs only. Required fields: tenantId, actorId, correlationId, prescriptionId, dispenseId, spanId.
6. Dashboards
| Dashboard | Key panels |
|---|
| Medication overview | sign/dispense rate, p95 latency, override rate, blocking-alert rate |
| Pharmacy fulfillment | queue depth, dispense throughput, partial-dispense rate, return rate |
| Inventory | low-stock + expiry-horizon counts by node, recall alerts |
| Controlled-substance | CS dispense volume, counter-sign latency, disclosure-accounting export age |
| Gateway interop | inbound event lag, outbound dispense post retries/DLQ |
7. Alerts
| Alert | Threshold | Severity |
|---|
| Sign latency p95 > 3000ms for 10m | P2 | page |
| KB check failure rate > 2% for 5m | P1 | page |
| Outbox age > 60s for 5m | P1 | page |
| Inventory decrement error rate > 1% for 5m | P2 | page |
| DLQ depth > 100 | P2 | page |
| Consumer lag > 30s for 10m | P2 | page |
| Override rate > 30% for 1h | P3 | ticket (clinical governance review) |
8. Runbook References
runbooks/medication-kb-unavailable.md
runbooks/pharmacy-queue-backlog.md
runbooks/outbox-relay-stall.md
runbooks/inventory-decrement-anomaly.md
runbooks/controlled-substance-counter-sign-backlog.md