Skip to main content

Ghasi e-Prescribing Gateway Service — Failure Modes

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 03 platform-services · 02 DDD

Failure Catalog

IDFailureUser impactDetectionMitigation
FM-EPRX-001Postgres unavailableAll MR/MD create/read operations fail with 503Health check fails; alert EprescribingServiceDownCircuit breaker; retry; read-replica for GETs; idempotency keys enable safe retry on restore
FM-EPRX-002NATS unavailablePrescriptions and dispenses persist but events not published; no immediate user impactOutbox pending count alert; EprescribingOutboxStuckOutbox accumulates safely; relay retries on NATS recovery
FM-EPRX-003Subscription notification endpoint downPharmacy or EHR misses notification; manual or pull workaround neededDLQ depth alert EprescribingDLQDepthHigh; delivery_failed eventRetry with backoff (5 attempts, 15 min max); DLQ enqueue; on-call manual replay tool (AC-RX-005)
FM-EPRX-004IG validator unavailable (Zod/HAPI)All profile validation skipped; gateway returns 503 VALIDATOR_UNAVAILABLEHealth endpoint returns validator: degraded; alertConfigurable degraded mode: block writes (strict) or accept with warning (permissive) per tenant policy
FM-EPRX-005Pharmacy routing resolver unavailableTarget pharmacy cannot be determined; MR blockedHTTP timeout on provider-directory call; logged ERRORCache last known routing table (TTL: 5 min); fallback to default pharmacy if configured
FM-EPRX-006ETag conflict storm (concurrent updates)Multiple actors get 412; must retry with refreshSpike in eprescribing_etag_conflicts_totalInform callers via 412 response; client exponential backoff; no data loss
FM-EPRX-007JWT validation failure (Keycloak down)All authenticated requests fail with 401Spike in 401 errors; health alertCached JWKS (5 min TTL); circuit breaker on Keycloak
FM-EPRX-008Idempotency store unavailable (Postgres/Redis)Risk of duplicate MR/MD creation on network retryPhase 1: Postgres failure cascades; Phase 2: Redis failure → fallback to PostgresPhase 2: Redis failure falls back to Postgres check; alert on Redis degradation
FM-EPRX-009RLS policy regressionCross-tenant MR/MD exposureAdversarial test fails in CI; security auditCI gate: tenant-isolation.spec mandatory; RLS reviewed on every migration
FM-EPRX-010Subscription DLQ grows without processingPharmacy/EHR stale; prescription status divergesEprescribingDLQDepthHigh alert persistsOn-call runbook: investigate endpoint; repair and replay; escalate to tenant admin
FM-EPRX-011Outbox relay process crash mid-publishEvent may be published once or not at allRelay restart time; NATS duplicate detectionNATS deduplication on message ID; consumer idempotency (BR-RX-003)
FM-EPRX-012Rate limit hit by legitimate high-volume tenantTenant receives 429; prescriptions delayedRate limit metric spikePer-tenant rate limit configuration; emergency queue option per tenant policy (BR-RX-004)