Interop Service — Observability
Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 03 platform-services · 02 DDD
1. SLIs and SLOs
| SLI | Target SLO | Measurement |
|---|---|---|
| FHIR read p95 latency | < 500 ms (gateway overhead, excluding owning service) | http_request_duration_seconds p95 |
| FHIR write p95 latency | < 1 s (gateway + owning service combined) | Same histogram |
| HL7 v2 ACK delivery | < 2 s from message receipt to ACK | interop_hl7_ack_duration_seconds p95 |
| HL7 v2 processing success rate | ≥ 99% | interop_hl7_processed_total{outcome="success"} / total |
| FHIR gateway availability | ≥ 99.9% | Error rate < 0.1% over 5 min |
| Bulk export completion rate | ≥ 99% of jobs complete within 4 hrs | interop_bulk_export_duration_seconds p99 |
2. Key Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
interop_fhir_requests_total | Counter | resource_type, operation, tenant_id, outcome | FHIR gateway requests |
interop_fhir_routing_duration_seconds | Histogram | resource_type, service | Time to proxy to owning service |
interop_hl7_messages_total | Counter | connector_id, message_type, direction, tenant_id | HL7 messages processed |
interop_hl7_ack_duration_seconds | Histogram | connector_id, tenant_id | ACK delivery time |
interop_hl7_failed_total | Counter | connector_id, message_type, tenant_id | Failed messages |
interop_hl7_dead_lettered_total | Counter | connector_id, tenant_id | Dead-lettered messages |
interop_bulk_export_jobs_total | Counter | status, tenant_id | Export jobs |
interop_connector_status | Gauge | connector_id, tenant_id | 1=active, 0=inactive |
3. Dashboards
| Dashboard | Key panels |
|---|---|
| FHIR Gateway | Requests/min by resource type, routing latency, error rate by owning service |
| HL7 v2 Integration | Messages/hr by connector, processing success rate, DLQ count, ACK latency |
| Connector Health | Per-connector status, last message time, error rate |
| Bulk Export | Active jobs, completion rate, export file sizes |
4. Alerts
| Alert | Condition | Severity | Runbook |
|---|---|---|---|
| FHIR gateway error rate | Error rate > 1% sustained 5 min | P2 | runbooks/interop-fhir-errors.md |
| HL7 DLQ growing | interop_hl7_dead_lettered_total rate > 0 | P2 | runbooks/interop-hl7-dlq.md |
| Connector offline | interop_connector_status == 0 for active connector | P2 | runbooks/interop-connector-down.md |
| Owning service routing failure | Target service 5xx rate > 5% | P2 | runbooks/interop-routing-failure.md |
| Bulk export stuck | Job in in-progress > 4 hours | P3 | runbooks/interop-bulk-export-stuck.md |