Orders Service — Observability
Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template
1. SLIs and SLOs
| SLI | SLO target | Measurement window |
|---|---|---|
orders.create.availability | ≥ 99.5% of POST /v1/orders requests succeed (2xx) | 30-day rolling |
orders.create.latency | p95 order creation < 800 ms (includes CDS check) | 30-day rolling |
orders.activate.latency | p95 order activation < 500 ms | 30-day rolling |
orders.query.latency | p95 order list query < 600 ms | 30-day rolling |
cds.check.latency | p95 CDS check < 300 ms | 30-day rolling |
outbox.delivery.lag | p99 outbox events delivered to NATS within 30 s | 30-day rolling |
2. OpenTelemetry Spans
| Span name | Attributes | Purpose |
|---|---|---|
orders.create | tenantId, patientId, orderType, hasCdsAlerts | Trace order creation |
orders.activate | tenantId, orderId, orderType, cdsAlertCount | Trace activation with CDS resolution |
orders.cds.allergy_check | tenantId, patientId, allergenCode | Trace allergy CDS check |
orders.cds.drug_interaction | tenantId, patientId, medicationCode | Trace DDI check |
orders.cds.duplicate | tenantId, patientId, orderCode | Trace duplicate check |
orders.outbox.relay | tenantId, subject, eventId | Trace outbox relay |
orders.referral.create | tenantId, patientId, referToSpecialty | Trace referral creation |
3. Metrics (Prometheus)
| Metric | Type | Labels |
|---|---|---|
orders_created_total | Counter | tenant_id, order_type, priority |
orders_activated_total | Counter | tenant_id, order_type |
orders_cancelled_total | Counter | tenant_id, order_type |
orders_cds_hard_stop_total | Counter | tenant_id, rule_id |
orders_cds_warning_total | Counter | tenant_id, rule_id |
orders_cds_check_duration_seconds | Histogram | tenant_id, check_type |
orders_referrals_pending_gauge | Gauge | tenant_id, facility_id |
orders_outbox_pending_gauge | Gauge | tenant_id |
4. Structured Logs
All logs use JSON format with mandatory fields: level, timestamp, traceId, spanId, tenantId, service: "orders-service".
Key log events:
ORDER_CREATED— infoORDER_ACTIVATED— infoORDER_CANCELLED— infoCDS_HARD_STOP_BLOCKED— warn (activation blocked)CDS_WARNING_ACKNOWLEDGED— info withreasonOUTBOX_RELAY_FAILED— errorALLERGY_CACHE_UPDATED— info
5. Dashboards
| Dashboard | Key panels |
|---|---|
| Orders Overview | Orders by type/status, creation rate, cancellation rate |
| CDS Activity | Hard-stop rate, warning rate, override rate by rule |
| Referral Tracking | Pending referrals, overdue referrals, acceptance rate |
| Service Health | API latency p50/p95/p99, error rate, outbox depth |
| Order Set Usage | Most-used order sets, instantiation frequency |
6. Alerts
| Alert | Threshold | Runbook |
|---|---|---|
OrdersApiErrorRate | > 1% 5xx over 5 min | Check DB connectivity, RLS policy |
OrdersCdsCheckTimeout | CDS check p95 > 1s for 5 min | Check terminology-service health |
OrdersOutboxDepthHigh | Outbox pending > 200 for > 5 min | Check NATS connectivity, relay worker |
OrdersCdsHardStopSpike | Hard-stop rate increases > 3× baseline | Possible new drug/allergy combination alert; review rules |
OrdersReferralOverdue | Referrals pending > 72h without scheduling | Alert clinical operations |
7. Health Endpoints
| Endpoint | Purpose |
|---|---|
GET /health/live | Liveness probe |
GET /health/ready | Readiness probe — DB, NATS, Redis, CDS connectivity |
GET /health/startup | Startup probe — migrations complete, allergy cache seeded |