Skip to main content

Immunizations Service — Observability

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template

1. SLIs and SLOs

SLISLO targetMeasurement window
immunization.record.availability≥ 99.5% of POST /v1/immunizations requests succeed (2xx)30-day rolling
immunization.record.latencyp95 record API latency < 500 ms30-day rolling
forecast.latencyp95 forecast retrieval < 2 000 ms30-day rolling
forecast.freshness≥ 99% of forecasts refreshed within 5 min of triggering event30-day rolling
registry.sync.success≥ 95% of sync jobs complete within 1 hour7-day rolling
defaulter.query.latencyp95 GET /v1/immunizations/defaulters < 1 000 ms30-day rolling

2. OpenTelemetry Spans

Span nameAttributesPurpose
immunizations.record.createtenantId, patientId, vaccineCode, doseNumberTrace record creation path
immunizations.forecast.refreshtenantId, patientId, recommendationCountTrace forecast computation
immunizations.registry.synctenantId, jobId, recordCountTrace registry sync job
immunizations.outbox.relaytenantId, subject, eventIdTrace outbox relay
immunizations.contraindication.checktenantId, patientId, vaccineCodeTrace contraindication guard

3. Metrics (Prometheus)

MetricTypeLabels
immunizations_records_created_totalCountertenant_id, vaccine_code, status
immunizations_refusals_totalCountertenant_id, vaccine_code, refusal_reason
immunizations_forecast_refresh_duration_secondsHistogramtenant_id
immunizations_registry_sync_jobs_totalCountertenant_id, status
immunizations_defaulters_gaugeGaugetenant_id, facility_id, vaccine_code
immunizations_outbox_pending_gaugeGaugetenant_id
immunizations_coverage_percent_gaugeGaugetenant_id, facility_id, vaccine_code, dose_number

4. Structured Logs

All logs use JSON format with mandatory fields: level, timestamp, traceId, spanId, tenantId, service: "immunizations-service".

Key log events:

  • IMMUNIZATION_RECORDED — info
  • IMMUNIZATION_REFUSED — info
  • CONTRAINDICATION_BLOCKED — warn (vaccine blocked by contraindication)
  • FORECAST_REFRESH_FAILED — error
  • REGISTRY_SYNC_FAILED — error with retryCount
  • OUTBOX_RELAY_FAILED — error

5. Dashboards

DashboardKey panels
Immunizations OverviewRecords created per day, refusals, corrections, by vaccine code
Forecast HealthForecast refresh lag, overdue patient count, defaulter trend
Registry SyncSync job success rate, last sync time, failed job count
Coverage AnalyticsCoverage by antigen, facility, age group; trend over time
Service HealthLatency p50/p95/p99, error rate, outbox depth

6. Alerts

AlertThresholdRunbook
ImmunizationsRecordApiErrorRate> 1% 5xx over 5 minCheck DB connectivity, RLS policy
ImmunizationsForecastStalenessForecast lag > 10 minCheck BullMQ worker health, Redis connectivity
ImmunizationsRegistrySyncFailedAny sync job failed after 3 retriesCheck interop-service, national registry reachability
ImmunizationsOutboxDepthHighOutbox pending > 100 for > 5 minCheck NATS connectivity, relay worker
ImmunizationsCoverageDroppedCoverage metric drops > 10% in 24hInvestigate recording gaps, data import issues

7. Health Endpoints

EndpointPurpose
GET /health/liveLiveness probe — service process alive
GET /health/readyReadiness probe — DB connected, NATS connected, Redis connected
GET /health/startupStartup probe — EPI schedule loaded, migrations complete