Skip to main content

Interop Service — Observability

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 03 platform-services · 02 DDD


1. SLIs and SLOs

SLITarget SLOMeasurement
FHIR read p95 latency< 500 ms (gateway overhead, excluding owning service)http_request_duration_seconds p95
FHIR write p95 latency< 1 s (gateway + owning service combined)Same histogram
HL7 v2 ACK delivery< 2 s from message receipt to ACKinterop_hl7_ack_duration_seconds p95
HL7 v2 processing success rate≥ 99%interop_hl7_processed_total{outcome="success"} / total
FHIR gateway availability≥ 99.9%Error rate < 0.1% over 5 min
Bulk export completion rate≥ 99% of jobs complete within 4 hrsinterop_bulk_export_duration_seconds p99

2. Key Metrics

MetricTypeLabelsDescription
interop_fhir_requests_totalCounterresource_type, operation, tenant_id, outcomeFHIR gateway requests
interop_fhir_routing_duration_secondsHistogramresource_type, serviceTime to proxy to owning service
interop_hl7_messages_totalCounterconnector_id, message_type, direction, tenant_idHL7 messages processed
interop_hl7_ack_duration_secondsHistogramconnector_id, tenant_idACK delivery time
interop_hl7_failed_totalCounterconnector_id, message_type, tenant_idFailed messages
interop_hl7_dead_lettered_totalCounterconnector_id, tenant_idDead-lettered messages
interop_bulk_export_jobs_totalCounterstatus, tenant_idExport jobs
interop_connector_statusGaugeconnector_id, tenant_id1=active, 0=inactive

3. Dashboards

DashboardKey panels
FHIR GatewayRequests/min by resource type, routing latency, error rate by owning service
HL7 v2 IntegrationMessages/hr by connector, processing success rate, DLQ count, ACK latency
Connector HealthPer-connector status, last message time, error rate
Bulk ExportActive jobs, completion rate, export file sizes

4. Alerts

AlertConditionSeverityRunbook
FHIR gateway error rateError rate > 1% sustained 5 minP2runbooks/interop-fhir-errors.md
HL7 DLQ growinginterop_hl7_dead_lettered_total rate > 0P2runbooks/interop-hl7-dlq.md
Connector offlineinterop_connector_status == 0 for active connectorP2runbooks/interop-connector-down.md
Owning service routing failureTarget service 5xx rate > 5%P2runbooks/interop-routing-failure.md
Bulk export stuckJob in in-progress > 4 hoursP3runbooks/interop-bulk-export-stuck.md