Immunizations Service — Failure Modes
Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template
Failure Mode Register
| ID | Failure | Detection | Mitigation | Recovery |
|---|---|---|---|---|
| FM-IMM-01 | PostgreSQL unavailable | Readiness probe fails; 503 on all requests | Connection pool retry (3×, exponential backoff); health probe gates traffic | Restore DB; pod restarts; connection pool reconnects automatically |
| FM-IMM-02 | NATS JetStream unavailable | Outbox relay logs errors; IMMUNIZATIONS_OUTBOX_DEPTH_HIGH alert fires | Outbox pattern buffers events in DB; no event loss during NATS downtime | Restore NATS; outbox relay drains backlog |
| FM-IMM-03 | Redis unavailable | BullMQ logs connection errors; forecast refresh queue stalls | Forecast refresh falls back to synchronous execution for new records (degraded mode) | Restore Redis; queue workers reconnect; backlog drains |
| FM-IMM-04 | EPI schedule service unavailable | GET /health/startup fails if schedule not loaded; forecast refresh errors | EPI schedule cached in-memory at startup; TTL 24 hours | Serve from cache; alert if cache TTL expires without refresh |
| FM-IMM-05 | Forecast refresh worker crash | BullMQ job stuck in active state; FORECAST_STALENESS alert fires | Job TTL causes re-queue after 60s; max 3 retries | Restart worker pod; BullMQ retries incomplete jobs |
| FM-IMM-06 | National registry unreachable | Registry sync job status failed after retries; alert fires | Sync retried with exponential backoff (max 5 retries, ~4h total); local recording unaffected | Registry restored; manual sync trigger available via admin API |
| FM-IMM-07 | Outbox relay stuck | IMMUNIZATIONS_OUTBOX_DEPTH_HIGH alert; events not reaching downstream | At-least-once relay; relay worker restart resolves; event consumers are idempotent | Restart relay worker; events delivered at-least-once |
| FM-IMM-08 | Patient deceased flag not received | Forecast still generated for deceased patient | Subscribe to REGISTRATION.patient.vital-status-changed; flag processed asynchronously | Once event arrives, forecast suppressed and defaulter outreach halted |
| FM-IMM-09 | Duplicate immunization record | Two offline devices record same dose for same patient | clientMutationId deduplication; second submission returns 409 | Vaccination officer reviews; correction endpoint available |
| FM-IMM-10 | Wrong patient assigned to record | Clinical error during recording | Correction endpoint (PUT /v1/immunizations/:id/correction) marks original entered-in-error; new record created for correct patient | Audit trail preserved; clinical review required |
| FM-IMM-11 | Coverage materialized view stale | Coverage dashboard shows old data | View refreshed hourly by cron and event-driven on record creation | Cron catchup; manual REFRESH MATERIALIZED VIEW CONCURRENTLY available |
| FM-IMM-12 | Contraindication check bypassed | Overridden without CLINICIAN role | Role guard enforced on override field; audit log captures bypass attempt | Revoke token; review audit log; correct record if needed |