Skip to main content

Observability

:::info Source Sourced from services/notification-service/OBSERVABILITY.md in the documentation repo. :::

1. Logs

Events: notification.queued|sending|sent|delivered|failed|bounced|suppressed, notification.template.created|updated, notification.digest.sent, notification.webhook.received.

2. Metrics

RED

  • notif_api_requests_total{endpoint,status} counter
  • notif_api_duration_seconds{endpoint} histogram

Domain

  • notif_sends_total{channel,template,outcome} counter
  • notif_delivery_duration_seconds{channel} histogram
  • notif_bounce_rate{channel} gauge
  • notif_open_rate{channel,template} gauge
  • notif_click_rate{channel,template} gauge
  • notif_suppression_total{reason} counter
  • notif_webhook_events_total{provider,kind} counter

Cost

  • notif_provider_cost_micro_usd_total{channel,tenant_id} counter
  • notif_ai_cost_micro_usd_total{tenant_id} counter

3. Traces

Spans: notif.send.email, notif.send.sms, notif.send.push, notif.template.render, notif.digest.batch.

4. Dashboards

  • Send volume by channel + template.
  • Delivery rate + bounce.
  • Open/click for email.
  • Provider cost per tenant.

5. Alerts

AlertThresholdSeverity
bounce-rate-high> 5% dailyP2
send-failure-spike> 3% in 10minP2
webhook-lag> 30s p99P2
ai-budget-exhaustedtenant 100%P3
sms-toll-fraud-suspectedunusual destP1

6. SLOs

SLITarget
Queue-to-send p95< 30s
Email delivery p95< 2 min
SMS delivery p95< 30s
Push delivery p95< 10s
API availability99.9%