Skip to main content

Medication Service — Observability

Status: populated Owner: TBD Last updated: 2026-04-17 Companion: Service Template

1. SLIs

SLISource
Prescription sign latencyHTTP p50/p95/p99 of POST /medications/{id}/sign
Drug KB check latencySpan medication.kb.check duration
Dispense latencyHTTP p95 of POST /dispenses
Inbound gateway event consume lagNATS consumer lag metric
Outbox relay delivery ageOldest undelivered event age
Safety-check block rateratio (blocking alerts fired / signs attempted)
Override rateratio (overrides / blocking alerts)

2. SLOs

SLOTargetWindow
Sign p95 latency≤ 1500 ms30d
Drug KB check p95≤ 700 ms30d
Dispense p95≤ 1200 ms30d
Inbound event consume lag p95≤ 5 s30d
Outbox oldest event age p95≤ 10 s30d
API availability99.9%30d
Inventory decrement error rate≤ 0.1%30d

3. Metrics (OpenTelemetry)

MetricTypeLabels
medication_sign_totalcounteroutcome, blocking_alerts, overrides, tenant
medication_sign_duration_secondshistogramtenant
medication_dispense_totalcounteroutcome, is_partial, is_controlled, tenant
medication_dispense_duration_secondshistogramtenant
medication_kb_check_duration_secondshistogramcheck_type
medication_alert_overridden_totalcounteralert_type, severity, tenant
medication_stock_low_totalgaugenode_id, tenant
medication_expiry_horizon_totalgaugedays_until_expiry_bucket
medication_outbox_undeliveredgaugesubject
medication_cs_dispense_totalcounterschedule, tenant

4. Traces

Mandatory spans: sign.use_case, dispense.use_case, kb.check, inventory.reserve, gateway.post_dispense, outbox.publish. correlation_id propagated across EHR → medication-service → gateway → pharmacy events.

5. Logs

Structured JSON. No PHI in logs — IDs only. Required fields: tenantId, actorId, correlationId, prescriptionId, dispenseId, spanId.

6. Dashboards

DashboardKey panels
Medication overviewsign/dispense rate, p95 latency, override rate, blocking-alert rate
Pharmacy fulfillmentqueue depth, dispense throughput, partial-dispense rate, return rate
Inventorylow-stock + expiry-horizon counts by node, recall alerts
Controlled-substanceCS dispense volume, counter-sign latency, disclosure-accounting export age
Gateway interopinbound event lag, outbound dispense post retries/DLQ

7. Alerts

AlertThresholdSeverity
Sign latency p95 > 3000ms for 10mP2page
KB check failure rate > 2% for 5mP1page
Outbox age > 60s for 5mP1page
Inventory decrement error rate > 1% for 5mP2page
DLQ depth > 100P2page
Consumer lag > 30s for 10mP2page
Override rate > 30% for 1hP3ticket (clinical governance review)

8. Runbook References

  • runbooks/medication-kb-unavailable.md
  • runbooks/pharmacy-queue-backlog.md
  • runbooks/outbox-relay-stall.md
  • runbooks/inventory-decrement-anomaly.md
  • runbooks/controlled-substance-counter-sign-backlog.md