Skip to main content

Laboratory Service — Failure Modes

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 03 platform-services · 02 DDD


1. Failure Catalog

#FailureUser impactDetectionMitigation
F-01PostgreSQL unavailableAll reads/writes fail; lab workflow haltedHealth probe db.ping() fails; alert firesReplica failover; read-only mode displays last cached worklist
F-02NATS unavailableEvents not published; critical alerts delayedOutbox lag alert; NATS health check failsOutbox accumulates; events published on reconnect; max retry
F-03FHIR gateway unreachableResult release fails; chart not updatedlab_fhir_publish_total{outcome="error"} spikeRetry via outbox with exponential backoff; alert on lag > 5 min
F-04Terminology service unavailableLOINC validation skippedTimeout from terminology clientProceed with terminology_validated=false; re-validate on reconnect
F-05Critical alert event not consumedCritical value goes unnotifiedNo matching ack within escalation windowEscalation timer republishes; on-call alert via separate channel
F-06Duplicate order eventDuplicate accession createdIdempotency key check on accession tableUNIQUE(tenant_id, order_id) constraint; inbox dedup
F-07Analyzer import parse errorResult not entered; technologist must enter manuallyLAB_ANALYZER_PARSE_ERROR alert; raw payload storedStore raw message; notify lab admin; manual fallback
F-08Optimistic lock conflict on result409 CONFLICT returned to clientHTTP 409 spikeUI retries with fresh version; technologist re-enters
F-09Result release partial failure (some FHIR publishes fail)Some observations missing from chartPartial release error event; outbox items remain unpublishedPer-observation outbox entries retry independently
F-10Keycloak unavailableAll authenticated requests failHealth check; 401/503 rate spikeLocal JWT cache for 60 s; requests fail gracefully after

2. Degraded-Mode Behavior

ScenarioBehavior
NATS downLab workflow continues; events queued in outbox; critical alerts delayed
Terminology service downCatalog searches use local DB; validation flagged deferred
FHIR gateway downAccession/result entry continues; release queued; chart delayed
Full connectivity loss (facility node)Offline mode — accessions and result entry continue; sync on reconnect