| FM-01 | FHIR gateway unavailable | Synchronous generation returns 503; async jobs fail at binding step; patients cannot get new documents | HTTP 5xx rate on FHIR calls; render job failure rate alert | Circuit breaker; retry async jobs with exponential backoff; surface error to clinician; mark job failed after 3 retries |
| FM-02 | Object storage (S3/MinIO) unavailable | Document storage fails; generation returns 503; downloads unavailable | Storage PUT/GET error metric; alert | Fail generation with structured error; cached PDFs on client may still be viewable; page on-call immediately |
| FM-03 | ClamAV unavailable | Uploads fail; no documents can be ingested via upload path | ClamAV connection error; GET /health/ready returns not ready | Reject uploads with 503 until ClamAV recovered; never skip scan |
| FM-04 | PDF renderer timeout | Single document generation exceeds 5 s SLO; 504 returned | Generation duration histogram; timeout counter | Retry via async render job; investigate template complexity; alert if p95 > 8 s |
| FM-05 | PostgreSQL unavailable | All APIs fail; render workers cannot update job status | DB connection error; readiness probe fails | K8s removes pods from LB; failover to read replica; page on-call |
| FM-06 | NATS unavailable | Events not published; audit gap; render workers may lose job coordination | NATS connection error; outbox relay failure alert | Outbox pattern: events queued in DB; relay retries on recovery; audit gap flagged |
| FM-07 | Render worker crash mid-job | Job stuck in running status; PDF not delivered | Job age in running status > 5 min alert | Watchdog process marks stale running jobs as failed after 5 min; client retries |
| FM-08 | Virus scanner false positive | Legitimate document quarantined; upload fails with VIRUS_DETECTED | User complaint; quarantine rate anomaly alert | Operator reviews quarantine bucket; manual re-scan with updated definitions; resubmit if clean |
| FM-09 | FHIR binding missing data | PDF generation fails with 422 BINDING_RESOLUTION_FAILED | 422 rate on generate endpoint | Return binding path in error; clinician resolves data gap in FHIR; retry generation |
| FM-10 | Presigned URL expiry | User clicks download link after TTL; 403 from object storage | Support ticket; 403 rate on object storage | Refresh presigned URL on next document list / download API call; short TTL by design |
| FM-11 | Template version not found at generation time | Generation fails; clinician cannot produce document | 422 rate; alert if > 1 % | Validate templateVersionId at request intake; surface clear error to clinician |
| FM-12 | config-service unavailable | Tenant branding tokens not resolved; PDF uses default theme | HTTP 5xx on config call | Fallback to default platform design tokens; generate PDF with platform defaults; log warning |