Immunizations Service — Observability
Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template
1. SLIs and SLOs
| SLI | SLO target | Measurement window |
|---|---|---|
immunization.record.availability | ≥ 99.5% of POST /v1/immunizations requests succeed (2xx) | 30-day rolling |
immunization.record.latency | p95 record API latency < 500 ms | 30-day rolling |
forecast.latency | p95 forecast retrieval < 2 000 ms | 30-day rolling |
forecast.freshness | ≥ 99% of forecasts refreshed within 5 min of triggering event | 30-day rolling |
registry.sync.success | ≥ 95% of sync jobs complete within 1 hour | 7-day rolling |
defaulter.query.latency | p95 GET /v1/immunizations/defaulters < 1 000 ms | 30-day rolling |
2. OpenTelemetry Spans
| Span name | Attributes | Purpose |
|---|---|---|
immunizations.record.create | tenantId, patientId, vaccineCode, doseNumber | Trace record creation path |
immunizations.forecast.refresh | tenantId, patientId, recommendationCount | Trace forecast computation |
immunizations.registry.sync | tenantId, jobId, recordCount | Trace registry sync job |
immunizations.outbox.relay | tenantId, subject, eventId | Trace outbox relay |
immunizations.contraindication.check | tenantId, patientId, vaccineCode | Trace contraindication guard |
3. Metrics (Prometheus)
| Metric | Type | Labels |
|---|---|---|
immunizations_records_created_total | Counter | tenant_id, vaccine_code, status |
immunizations_refusals_total | Counter | tenant_id, vaccine_code, refusal_reason |
immunizations_forecast_refresh_duration_seconds | Histogram | tenant_id |
immunizations_registry_sync_jobs_total | Counter | tenant_id, status |
immunizations_defaulters_gauge | Gauge | tenant_id, facility_id, vaccine_code |
immunizations_outbox_pending_gauge | Gauge | tenant_id |
immunizations_coverage_percent_gauge | Gauge | tenant_id, facility_id, vaccine_code, dose_number |
4. Structured Logs
All logs use JSON format with mandatory fields: level, timestamp, traceId, spanId, tenantId, service: "immunizations-service".
Key log events:
IMMUNIZATION_RECORDED— infoIMMUNIZATION_REFUSED— infoCONTRAINDICATION_BLOCKED— warn (vaccine blocked by contraindication)FORECAST_REFRESH_FAILED— errorREGISTRY_SYNC_FAILED— error withretryCountOUTBOX_RELAY_FAILED— error
5. Dashboards
| Dashboard | Key panels |
|---|---|
| Immunizations Overview | Records created per day, refusals, corrections, by vaccine code |
| Forecast Health | Forecast refresh lag, overdue patient count, defaulter trend |
| Registry Sync | Sync job success rate, last sync time, failed job count |
| Coverage Analytics | Coverage by antigen, facility, age group; trend over time |
| Service Health | Latency p50/p95/p99, error rate, outbox depth |
6. Alerts
| Alert | Threshold | Runbook |
|---|---|---|
ImmunizationsRecordApiErrorRate | > 1% 5xx over 5 min | Check DB connectivity, RLS policy |
ImmunizationsForecastStaleness | Forecast lag > 10 min | Check BullMQ worker health, Redis connectivity |
ImmunizationsRegistrySyncFailed | Any sync job failed after 3 retries | Check interop-service, national registry reachability |
ImmunizationsOutboxDepthHigh | Outbox pending > 100 for > 5 min | Check NATS connectivity, relay worker |
ImmunizationsCoverageDropped | Coverage metric drops > 10% in 24h | Investigate recording gaps, data import issues |
7. Health Endpoints
| Endpoint | Purpose |
|---|---|
GET /health/live | Liveness probe — service process alive |
GET /health/ready | Readiness probe — DB connected, NATS connected, Redis connected |
GET /health/startup | Startup probe — EPI schedule loaded, migrations complete |