Skip to main content

Care Plan Service — Failure Modes

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 03 platform-services · 02 DDD

Failure Catalog

IDFailureUser impactDetectionMitigation
FM-CP-001Postgres unavailableAll reads and writes fail with 503Health check /health/ready fails; alert CarePlanServiceDownRetry with exponential backoff; circuit breaker; read replica fallback for FHIR reads
FM-CP-002NATS JetStream unavailableDomain events accumulate in outbox; no immediate user impactOutbox pending count alert CarePlanOutboxLagHighOutbox relay retries; events delivered once NATS recovers
FM-CP-003Outbox relay stuckEvents not published > 15 min; downstream services staleAlert CarePlanOutboxStuck on outbox ageManual replay trigger; on-call investigation of relay process
FM-CP-004Terminology service unavailableCoding validation skipped (degraded mode)HTTP timeout on terminology call; logged with WARNGraceful degradation: accept request but skip coding validation; alert operator
FM-CP-005Provider directory unavailableCare team practitioner validation skippedHTTP timeout; logged WARNAccept care team update; async validation job checks later
FM-CP-006JWT validation failure (Keycloak down)All authenticated requests fail with 401Health check; spike in 401 errorsCached JWKS (short TTL); circuit breaker on Keycloak calls
FM-CP-007Concurrent version conflict stormMultiple clients retrying simultaneously; each gets 409High version_conflicts_total metricInform users; no data loss; retry-after hint in 409 response
FM-CP-008Large care plan with many goals/activitiesSlow reads > SLO thresholdp95 latency alertPagination on goals/activities lists; lazy load sub-resources
FM-CP-009RLS policy misconfigurationCross-tenant data leakageRLS integration test fails in CI; adversarial testCI gate: tenant-isolation spec must pass; RLS policy reviewed in security audit
FM-CP-010Module entitlement check failureAll writes blocked even for licensed tenants403 MODULE_NOT_LICENSED errorsCache entitlement checks; fallback to allow if entitlement service unreachable (configurable)