Virtual Care Service — Observability
Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · 12 observability-telemetry
1. SLIs and SLOs
| SLI | SLO | Measurement |
|---|---|---|
| Session creation success rate | ≥ 99% / 30-day window | vcare_session_create_total{status=success} / total |
| Session creation latency (p95) | ≤ 3000 ms | http_request_duration_ms{route=/sessions,method=POST} |
| Join token issuance latency (p95) | ≤ 500 ms | http_request_duration_ms{route=/join-token} |
| Session end → Encounter creation latency (p95) | ≤ 10 s | vcare_encounter_create_latency_seconds |
| Video backend health check success rate | ≥ 99.5% | vcare_backend_health_check_success |
| API availability | ≥ 99.9% | Success rate on all endpoints |
| Fallback initiation rate | — monitored; alert if > 5% | vcare_fallback_initiated_total / vcare_sessions_total |
2. Key Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
vcare_sessions_total | Counter | tenant_id, status, video_backend | Session lifecycle outcomes |
vcare_session_duration_seconds | Histogram | tenant_id, video_backend | Actual session duration |
vcare_waiting_room_wait_seconds | Histogram | tenant_id | Time from patient join to admit |
vcare_participants_per_session | Histogram | tenant_id | Participant count distribution |
vcare_fallback_initiated_total | Counter | tenant_id, reason | Fallback (video→async) activations |
vcare_backend_health_check_latency_ms | Histogram | backend | Jitsi health check round-trip |
vcare_encounter_create_latency_seconds | Histogram | tenant_id | FHIR Encounter creation time post-session |
vcare_token_validation_failures_total | Counter | tenant_id, reason | Invalid/expired join tokens |
vcare_outbox_lag_seconds | Gauge | — | Age of oldest unpublished outbox message |
vcare_consent_gate_blocks_total | Counter | tenant_id, gate_type | Sessions blocked by consent gate |
3. Traces
| Span | Key attributes |
|---|---|
vcare.session.create | tenant_id, video_backend, has_appointment |
vcare.video_provider.health_check | backend, latency_ms, healthy |
vcare.video_provider.create_room | backend, room_name |
vcare.session.end | session_id, duration_seconds, participant_count |
vcare.fhir.encounter_create | session_id, encounter_id, latency_ms |
vcare.join_token.issue | session_id, role |
vcare.join_token.validate | session_id, valid, failure_reason |
vcare.ai.transcribe | session_id, duration_s, model_id |
vcare.fallback.initiate | session_id, reason |
4. Dashboards
| Dashboard | Key panels |
|---|---|
| Virtual Care Operations | Active sessions by tenant, session status funnel, fallback rate, backend health |
| Session Lifecycle | Creation latency, wait-room time, session duration distribution |
| Connectivity & Fallback | Fallback rate trend, fallback by reason, bandwidth degradation events |
| FHIR Integration | Encounter creation latency, success/failure rate |
| Security | Consent gate blocks, token validation failures, cross-tenant violations |
5. Alerts
| Alert | Condition | Severity | Runbook |
|---|---|---|---|
| Video backend unhealthy | Health check fails for 3 consecutive minutes | Critical | runbooks/vcare-backend-health.md |
| Session creation success rate drops | < 99% for 5 min | Critical | runbooks/vcare-session-failures.md |
| Fallback rate spike | > 10% of sessions for 5 min | Warning | runbooks/vcare-fallback-spike.md |
| FHIR Encounter creation lag | Encounter not created within 30s of session end | Warning | runbooks/vcare-encounter-lag.md |
| Outbox lag | Oldest unpublished message > 5 min | Warning | runbooks/vcare-outbox-lag.md |
| Token validation failure rate | > 5% for 5 min | Warning | runbooks/vcare-token-failures.md |