Skip to main content

Interop Service — Failure Modes

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 03 platform-services · 02 DDD


1. Failure Catalog

#FailureUser impactDetectionMitigation
F-01Owning service unreachable (FHIR routing)FHIR reads/writes fail for that resource type5xx spike per owning service; alert P2Return 503 OperationOutcome; circuit breaker per service; alert SRE
F-02PostgreSQL unavailableAll API calls fail; HL7 ACK not sent (messages lost if MLLP connection drops)Health probe failsFailover replica; MLLP reconnect after DB recovery; alert P1
F-03NATS unavailableOutbound HL7 events not triggered; interop events not publishedOutbox lag alertOutbox accumulates; events replayed on reconnect
F-04MLLP client cert expired (outbound)Outbound HL7 v2 messages rejected by external systemTLS handshake error; connector error rate spikePre-rotation alert 30 days before expiry; hot-swap cert via connector update
F-05HL7 v2 parse error (unknown segment)Message stored; not processed; sent to DLQinterop_hl7_dead_lettered_totalStore raw; alert integration admin; manual reprocess after mapping fix
F-06ABAC service unavailablePatient-linked FHIR reads blockedabac_check_failures spike; alertCircuit breaker; deny by default when ABAC unavailable (fail-safe)
F-07Bulk export partial failureExport NDJSON incompleteJob status partial; error manifestClient re-triggers export; partial files flagged in manifest
F-08Profile validation blocking valid resourceExternal partner writes rejected unexpectedlyinterop_profile_validation_failuresConfigurable validation mode: error (block) vs warn (log only); admin override
F-09Redis cache miss (CapabilityStatement)CapabilityStatement regenerated on every requestLatency spike on GET /metadataFall back to in-memory cached value; rebuild from routing table
F-10Duplicate connector port conflictMLLP listener fails to startStartup error logUnique port check on connector activation; error returned to admin