Patient Portal Service — Failure Modes

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 03 platform-services · 02 DDD

1. Failure Catalog

#	Failure	User impact	Detection	Mitigation
F-01	PostgreSQL unavailable	Portal login fails; account reads fail	Health check probe fails; `portal.upstream.calls.total` DB errors spike	Pod restarts; Postgres HA failover (streaming replication); circuit breaker opens after 3 consecutive failures
F-02	Keycloak patient realm unreachable	Login impossible; all JWT validation fails; all portal endpoints return 401	Health check JWT validation probe fails; login failure rate spikes	Patient realm HA (Keycloak active-passive); cached JWKS for 5 min to handle brief outages
F-03	registration-service unavailable	Patient summary / demographics unavailable	Upstream adapter circuit breaker triggers	Return cached last-known Patient resource (Redis, 5 min stale TTL); display degraded banner
F-04	laboratory-service unavailable	Lab results section empty	Circuit breaker open; upstream error counter	Return cached last-known results bundle; display "Results may not be current" message
F-05	scheduling-service unavailable	Appointment list unavailable; new appointment requests fail	Circuit breaker; upstream error counter	Cache last-known appointment list (Redis); queue appointment requests for retry via outbox
F-06	radiology-service unavailable	Imaging results section empty	Circuit breaker; upstream error counter	Cached results + degraded banner
F-07	claims-service unavailable	Coverage and EOB unavailable	Circuit breaker	Cached claims data + degraded banner
F-08	ai-gateway-service unavailable	Navigation assistant returns 503	Circuit breaker; feature-specific health flag	Graceful degradation: navigation assistant disabled; core portal fully operational
F-09	Redis cache unavailable	Increased latency to upstream services; all requests bypass cache	Redis health probe fails	Fall through to upstream services; performance SLO breach alert fires; no data loss
F-10	NATS JetStream unavailable	Outbox events not delivered; push notifications delayed	Outbox relay worker fails to publish; `outbox_unpublished_count` rises	Outbox persists in PostgreSQL; events delivered when NATS recovers; at-least-once semantics guarantee
F-11	Push notification gateway (FCM/APNs) failure	Mobile push notifications not delivered	FCM/APNs adapter error rate spikes	Silent failure (push is best-effort); patient can still see results by opening portal
F-12	Export job worker crash mid-export	Export job stuck in `in_progress`	Job status age-out monitor; `expjob` exceeds 10 min in-progress	Kubernetes restarts pod; job is idempotent and restarts from checkpoint; failed jobs set to `failed` with `errorDetail`
F-13	Upstream returns unreleased result	Risk of premature result disclosure	Release policy enforcement in BFF code path; unit tests cover policy	Server-side policy check (`?releasePolicy=patient-visible`) is mandatory on every upstream call; result excluded if policy not met
F-14	Proxy delegation record stale / expired	Proxy access continues past expiry	`validTo` checked on every request	`validTo` evaluated server-side per request; `status = expired` auto-set by cron job; immediate revocation supported
F-15	Downstream tenant isolation breach	Patient sees another tenant's data	RLS policy check failure; audit alert	PostgreSQL RLS policy on all tables; `app.tenant_id` set from JWT `tid` claim before every query; audit-service anomaly detection

2. Degradation Mode Summary

The portal implements graceful degradation: upstream failures cause sections to show cached or empty states with user-facing banners, rather than failing the entire portal session. Only authentication failures (F-02) and database failures (F-01) result in complete portal unavailability.

Upstream	Degraded behaviour
registration-service down	Patient summary shows cached demographics + warning
lab/radiology down	Results section shows cached results + "may not be current"
scheduling-service down	Appointments show cached list; new booking disabled with message
ai-gateway down	Navigation assistant disabled; rest of portal unaffected
Redis down	All requests served live from upstream; latency increases

1. Failure Catalog​

2. Degradation Mode Summary​

1. Failure Catalog

2. Degradation Mode Summary