Claims Service — Failure Modes
Status: populated Owner: TBD Last updated: 2026-04-18 Companion: SERVICE_OVERVIEW · Service Template · 02 DDD
Failure Mode Register
| ID | Failure Mode | Likelihood | Impact | Detection | Mitigation |
|---|---|---|---|---|---|
FM-CLM-001 | Clearinghouse/payer API unavailable during claim submission | Medium | High | Submission adapter returns 5xx; circuit breaker opens | Retry with exponential backoff (3 attempts); claim stored as ready for manual resubmission; alert SRE |
FM-CLM-002 | X12 837 generation produces malformed EDI | Low | High | Clearinghouse returns 999 rejection; test suite catches EDI structure errors | EDI golden fixture tests in CI; structured validation before send; fallback to payer REST adapter if available |
FM-CLM-003 | ERA (835) ingest fails mid-processing — partial allocations applied | Low | Critical | ERA processing job fails; partial paid states detected | ERA processing is transactional — all allocations applied or none; idempotency key prevents re-application; dead-letter to manual review queue |
FM-CLM-004 | Duplicate ERA submitted — same remittance processed twice | Low | High | era_idempotency integration test; remittance ID uniqueness constraint | REMITTANCE_ALREADY_APPLIED 409 error; duplicate ERA rejected at ingest; alert for investigation |
FM-CLM-005 | Terminology-service unavailable — coding validation blocked | Medium | Medium | terminology-service health check fails; HTTP 503 from upstream | Circuit breaker with configurable fallback: permissive mode allows claim with warning flag; strict mode blocks submission |
FM-CLM-006 | RLS misconfiguration exposes cross-tenant claim data | Very Low | Critical | CI tenant isolation gate (mandatory); adversarial cross-tenant integration test | RLS policies reviewed on every migration; separate DB role per service; quarterly security audit |
FM-CLM-007 | Outbox relay stops — events backlog indefinitely | Low | High | outbox_lag_seconds Prometheus alert (> 120s) | Relay worker auto-restart via Kubernetes; manual replay via admin tool; lag visible on Outbox Health dashboard |
FM-CLM-008 | Payer credential rotation not reflected — submissions rejected | Low | Medium | Submission returns 401/403 from payer API; circuit breaker | Credentials stored in Vault with rotation hooks; operator runbook for credential refresh; submission adapter surfaces auth errors distinctly |
FM-CLM-009 | Version conflict storm during concurrent claim updates | Low | Low | 409 rate visible in API metrics | Optimistic locking with 409 + current body; clients retry with backoff; rate monitored on Claims Pipeline dashboard |
FM-CLM-010 | NATS JetStream unavailable — events not published | Low | Medium | Health check /health/ready fails NATS probe; outbox relay fails | Transactional outbox ensures events are not lost; relay retries on reconnect; alert SRE if lag > 2 minutes |
FM-CLM-011 | Patient portal serves stale EOB after remittance applied | Low | Low | EOB read lag visible in portal; not a data integrity issue | Eventual consistency by design; portal consumer processes claims.remittance.applied.v1 event; lag typically < 1s |
FM-CLM-012 | Claim assembled with invalid diagnosis pointer (line item → ICD-10 index out of range) | Low | Medium | Scrubbing validation catches at validate() step | CLAIM_VALIDATION_FAILED error with field-level errors; scrubbing unit tests in CI cover pointer range checks |