Skip to main content

Observability

:::info Source Sourced from services/ai-gateway-service/OBSERVABILITY.md in the documentation repo. :::

1. Logs

Events: ai.completion.started|finished|refused, ai.safety.*, ai.budget.debit|alert|exhausted, ai.model.routed|fallback, ai.cache.hit|miss|store, ai.prompt.version_published, ai.injection.detected, ai.pii.redacted, ai.provider.error.

Attrs: prompt_id, prompt_version, model_id, cost_micro_usd, tokens, latency_ms, trace_id, decision_id, cache_key, safety_action. Redact: raw input/output (stored separately in audit tier).

2. Metrics

RED

  • ai_api_requests_total{endpoint,status} counter
  • ai_api_duration_seconds{endpoint} histogram

Domain

  • ai_completions_total{prompt_id,model_id,status} counter
  • ai_first_token_duration_seconds{prompt_id,model_id} histogram
  • ai_total_duration_seconds{prompt_id,model_id} histogram
  • ai_tokens_total{direction=in|out,model_id} counter
  • ai_cost_micro_usd_total{tenant_id,prompt_id,model_id} counter
  • ai_cache_hit_ratio{prompt_id} gauge
  • ai_refusals_total{reason} counter
  • ai_safety_violations_total{category,action} counter
  • ai_injection_detected_total counter
  • ai_pii_redactions_total{kind} counter
  • ai_budget_used_pct{tenant_id,period} gauge
  • ai_embedding_operations_total{model_id} counter
  • ai_knn_search_duration_seconds histogram
  • ai_provider_fallback_total{from,to} counter

3. Traces

Spans: ai.complete, ai.safety.input, ai.safety.output, ai.provider.call{vendor,model}, ai.cache.lookup, ai.budget.debit, ai.embed, ai.knn.

4. Dashboards

  • Completions: rate, p95 first-token, refusal rate.
  • Safety: block rate by category; injection detections.
  • Cost: spend per prompt × model × tenant; projected vs budget.
  • Cache: hit ratio per prompt.
  • Provider health: fallback count; latency per vendor.
  • Bias monitoring (quarterly review board).

5. Alerts

AlertThresholdSeverity
completion-failure-rate> 2% in 10 minP1
first-token-slowp95 > 2sP2
cache-hit-drop< 30% for a promptP3
budget-exhaustionany tenant 100%P2
safety-violation-spike> 1% in 10 minP2
injection-spike> 5/minP2
provider-fallback-spike> 10% fallback for 10 minP2
ai-cost-burn> 1.5× projected dailyP2

6. SLOs

SLITarget
Completion first-token p95< 600ms
Embed p95< 300ms
Cache hit p95< 50ms
Availability99.9% (degraded OK)
Budget enforcement latency< 10ms

7. Business Metrics

  • Daily AI cost per tenant.
  • Cost per feature (tutor, co-author, etc.).
  • Cost per completion trend.
  • Refusal reason breakdown.
  • Bias scorecard (quarterly).