Skip to main content

Claims Service — Observability

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: SERVICE_OVERVIEW · Service Template · 02 DDD

SLIs and SLOs

SLISLOMeasurement Window
Claim assembly API availability≥ 99.9%30-day rolling
Claim assembly p95 latency< 1500ms1-hour
Claim submission success rate≥ 99% (excluding payer-side errors)7-day rolling
Eligibility check p95 latency< 3000ms (includes payer round-trip)1-hour
ERA ingestion processing time< 60s from receipt to allocations appliedper-file
Outbox relay lag< 30s p991-hour
FHIR EOB read p95 latency< 500ms1-hour

OpenTelemetry Instrumentation

The claims-service is instrumented with OpenTelemetry SDK (Node.js) and emits:

  • Traces: HTTP requests, DB queries, EDI adapter calls, payer API calls
  • Metrics: Counter/histogram/gauge via OTEL SDK → Prometheus scrape
  • Logs: Structured JSON logs → OpenTelemetry log bridge → Loki

All spans include tenant_id, claim_id (where applicable), and correlation_id attributes.

Key Metrics

MetricTypeLabelsDescription
claims_assembled_totalCountertenant_id, channelTotal claims assembled
claims_submitted_totalCountertenant_id, channel, statusClaim submissions (success/failure)
claims_denied_totalCountertenant_id, denial_codeClaims denied by payer
claims_paid_totalCountertenant_idClaims paid in full
eligibility_check_duration_secondsHistogramtenant_id, channelEligibility inquiry latency
era_processing_duration_secondsHistogramtenant_idERA ingestion to allocation applied
outbox_lag_secondsGaugetenant_idOldest unpublished outbox record age
submission_adapter_errors_totalCountertenant_id, adapter, error_codeAdapter-level errors
coverage_active_countGaugetenant_idActive coverage records per tenant

Dashboards

DashboardPurpose
Claims PipelineAssembly rate, submission rate, denial rate, paid rate by tenant
ERA ProcessingERA ingestion throughput, processing time, allocations per ERA
EligibilityCheck volume, latency by payer/channel, error rate
Outbox HealthRelay lag, unpublished event count, relay throughput
Payer AdapterPer-adapter error rates, latency, circuit breaker status

Alerts

AlertConditionSeverityRunbook
High claim denial ratedenial_rate > 15% over 1 hour for any tenantWarningrunbooks/claims-high-denial-rate.md
Submission adapter failuresadapter_errors > 10 in 5 minutesCriticalrunbooks/claims-adapter-failure.md
ERA processing timeoutERA not processed within 120s of receiptWarningrunbooks/claims-era-timeout.md
Outbox relay lag spikeoutbox_lag > 120sWarningrunbooks/claims-outbox-lag.md
Eligibility check SLO breachp95 > 3000ms for 10 minutesWarningrunbooks/claims-eligibility-slow.md
Payer circuit opencircuit breaker open for any payer adapterCriticalrunbooks/claims-payer-circuit-open.md

Health Endpoints

EndpointPurpose
GET /health/liveKubernetes liveness probe — returns 200 if process is alive
GET /health/readyReadiness probe — checks DB connection, NATS connection, adapter connectivity
GET /health/startupStartup probe — confirms migrations have run
GET /metricsPrometheus metrics scrape endpoint