Skip to main content

Registration Service — Failure Modes

Status: populated Owner: TBD Last updated: 2026-04-17 Companion: Service Template

1. Failure Catalog

IDFailureUser ImpactDetectionMitigation
FM-REG-01PostgreSQL primary unavailableAll registration operations fail (503)Healthcheck readiness probe fails; alert on pod not readyReplica promotion; circuit breaker on DB port; K8s pod restart
FM-REG-02NATS JetStream unavailablePatient creates/updates succeed but events not published (data persisted; events queued in outbox)Outbox lag metric > thresholdOutbox relay retries with exponential backoff; events published when NATS recovers
FM-REG-03MPI scoring slow / timeoutPatient create degrades — MPI check times out after configured thresholdregistration.mpi.score.duration_p95 alertFail-safe: if MPI times out, create proceeds with a warning log + NATS duplicate-review event
FM-REG-04Redis (idempotency store) unavailableIdempotent retries may create duplicates for the durationRedis healthcheck failureService degrades gracefully: proceeds without idempotency cache; logs warning; MPI acts as safety net
FM-REG-05Keycloak JWKS endpoint unavailableAll authenticated requests fail (401)Auth guard exception spikeShort-term JWKS cache (5 min) in JWT guard; retry backoff on JWKS refresh
FM-REG-06Portrait object store unavailablePortrait upload/download fails; core registration unaffected503 on portrait endpointsReturn 503 with PORTRAIT_STORAGE_UNAVAILABLE; core create/update proceeds without portrait
FM-REG-07config-service unavailableRequired-field config cannot be retrieved; fall back to default configConfig client error metricCached config with 30-min TTL; fall back to empty required-fields list (accept all creates)
FM-REG-08Disk space exhaustion (DB)Writes fail with Postgres errorDisk usage alertPre-emptive alerting at 75%; automatic archiving of inactive encounter records
FM-REG-09Optimistic lock stormMultiple clients update same patient concurrently; most get 409High rate of OPTIMISTIC_LOCK_CONFLICT errorsUI must implement reload-and-retry pattern; alert if 409 rate > 5% sustained
FM-REG-10MPI false-positive spikeValid patients blocked by duplicate detectionregistration_mpi_duplicates_total spikeConfigurable threshold; admin can temporarily lower sensitivity; runbook for MPI recalibration
FM-REG-11Unmerge of unknown-source mergeUnmerge rejected (UNMERGE_INVALID_STATE) for legacy merges lacking tracking dataUser-visible 400 errorSupervisor workflow for manual correction; audit trail for escalation
FM-REG-12HL7 ADT inbound message parse failureADT event not processed; patient not updatedinterop-service dead-letter queueDead-letter queue alert; manual reprocessing runbook