Skip to main content

Patient Chart Service — Observability

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 12 observability-telemetry

1. SLIs and SLOs

SLIMeasurementSLO targetWindow
Chart read availability1 - (5xx rate on GET /v1/chart/*, /v1/problems/*, /v1/allergies/*, /v1/vitals/*, /v1/clinical-notes/* ) / total99.9 %30-day rolling
Chart write availability1 - (5xx rate on POST/PUT/PATCH chart write endpoints)99.9 %30-day rolling
Problem list read latency (P95)patient_chart_http_request_duration_ms{operation="list_problems"}< 500 ms5-min window
Note sign latency (P95)patient_chart_http_request_duration_ms{operation="sign_note"}< 1 000 ms5-min window
Vitals record latency (P95)patient_chart_http_request_duration_ms{operation="record_vitals"}< 500 ms5-min window
Allergy advisory latency (P95)patient_chart_http_request_duration_ms{operation="allergy_advisory"}< 300 ms5-min window
NATS outbox lagpatient_chart_outbox_unpublished_rows< 100 rows for > 30 sContinuous
Tenant isolationAutomated tenant-isolation.spec.ts in CI100 % passEvery deploy

2. Key metrics (Prometheus)

MetricTypeLabelsDescription
patient_chart_http_request_duration_msHistogramoperation, status_code, tenant_idHTTP endpoint latency
patient_chart_http_request_totalCounteroperation, status_codeRequest count
patient_chart_domain_event_published_totalCounterevent_typeEvents published to NATS
patient_chart_outbox_unpublished_rowsGaugePending outbox rows
patient_chart_problem_created_totalCountertenant_id, clinical_statusProblem creation rate
patient_chart_allergy_created_totalCountertenant_id, categoryAllergy creation rate
patient_chart_vitals_recorded_totalCountertenant_idVitalsSet creation rate
patient_chart_vitals_abnormal_totalCountertenant_id, code, severityAbnormal vitals flagged
patient_chart_note_signed_totalCountertenant_id, note_typeSigned notes
patient_chart_note_ai_accepted_totalCountertenant_idAI-assist chunks accepted
patient_chart_breakglass_invoked_totalCountertenant_idBreak-glass events
patient_chart_db_query_duration_msHistogramquery_nameDB query latency
patient_chart_downstream_http_duration_msHistogramdependency, status_codeFan-out call latency

3. Distributed tracing (OpenTelemetry)

All handlers emit OTEL spans. Key span hierarchy per request:

HTTP /v1/problems (POST)
patient_chart.policy_check
patient_chart.add_problem (use case)
patient_chart.terminology_lookup
patient_chart.db.problem.insert
patient_chart.outbox.write
patient_chart.event.publish

Span attributes (never include PHI):

  • tenant_id, patient_id, aggregate_type, aggregate_id, operation, correlation_id

OTEL exporter: OTLP → Grafana Tempo.

4. Structured logging

Log format: JSON via pino. Mandatory fields:

FieldDescription
levelinfo / warn / error
servicepatient-chart-service
traceIdOTEL trace id
spanIdOTEL span id
tenantIdCurrent tenant (from JWT)
correlationIdRequest correlation
operationHandler name
msgHuman-readable message

PHI-safe logging: Patient names, DOB, free-text clinical content MUST NOT appear in log output. IDs (pat_*, prb_*, etc.) are permitted.

5. Dashboards

DashboardLocationKey panels
Patient Chart — OverviewGrafana ghasi/patient-chartRequest rate, error rate, P95 latency per operation, outbox lag
Patient Chart — Clinical ActivityGrafana ghasi/patient-chart-clinicalProblems/allergies/vitals/notes created per hour, abnormal vitals rate, break-glass events
Patient Chart — Downstream DepsGrafana ghasi/patient-chart-depsLatency and error rates for each upstream dependency
Patient Chart — SLO BurnGrafana ghasi/patient-chart-sloMulti-window burn rate for availability and latency SLOs

6. Alerts

AlertConditionSeverityRunbook
ChartHighErrorRateError rate > 1 % for > 5 minCritical/runbooks/patient-chart/high-error-rate
ChartP95LatencyHighP95 > 1.5 s for > 10 minWarning/runbooks/patient-chart/high-latency
ChartOutboxLagOutbox unpublished > 100 rows for > 2 minWarning/runbooks/patient-chart/outbox-lag
ChartDBConnectionErrorDB connection error rate > 0 for > 1 minCritical/runbooks/patient-chart/db-failure
ChartBreakGlassSpikeBreak-glass events > 10 in 1 min for same tenantWarning/runbooks/patient-chart/breakglass-spike
ChartAbnormalVitalsUnreviewedpatient_chart_vitals_abnormal_total high rate, no cosignInfo/runbooks/patient-chart/abnormal-vitals
ChartTenantIsolationTestFailedCI tenant-isolation.spec.ts failed in last deployCriticalBlock deploy

7. On-call runbook index

RunbookTrigger
/runbooks/patient-chart/high-error-rate.mdChartHighErrorRate
/runbooks/patient-chart/db-failure.mdChartDBConnectionError
/runbooks/patient-chart/outbox-lag.mdChartOutboxLag
/runbooks/patient-chart/breakglass-spike.mdChartBreakGlassSpike
/runbooks/patient-chart/migration-failure.mdMigration job exit code ≠ 0