Observability
:::info Source
Sourced from services/ai-gateway-service/OBSERVABILITY.md in the documentation repo.
:::
1. Logs
Events: ai.completion.started|finished|refused, ai.safety.*, ai.budget.debit|alert|exhausted, ai.model.routed|fallback, ai.cache.hit|miss|store, ai.prompt.version_published, ai.injection.detected, ai.pii.redacted, ai.provider.error.
Attrs: prompt_id, prompt_version, model_id, cost_micro_usd, tokens, latency_ms, trace_id, decision_id, cache_key, safety_action. Redact: raw input/output (stored separately in audit tier).
2. Metrics
RED
ai_api_requests_total{endpoint,status}counterai_api_duration_seconds{endpoint}histogram
Domain
ai_completions_total{prompt_id,model_id,status}counterai_first_token_duration_seconds{prompt_id,model_id}histogramai_total_duration_seconds{prompt_id,model_id}histogramai_tokens_total{direction=in|out,model_id}counterai_cost_micro_usd_total{tenant_id,prompt_id,model_id}counterai_cache_hit_ratio{prompt_id}gaugeai_refusals_total{reason}counterai_safety_violations_total{category,action}counterai_injection_detected_totalcounterai_pii_redactions_total{kind}counterai_budget_used_pct{tenant_id,period}gaugeai_embedding_operations_total{model_id}counterai_knn_search_duration_secondshistogramai_provider_fallback_total{from,to}counter
3. Traces
Spans: ai.complete, ai.safety.input, ai.safety.output, ai.provider.call{vendor,model}, ai.cache.lookup, ai.budget.debit, ai.embed, ai.knn.
4. Dashboards
- Completions: rate, p95 first-token, refusal rate.
- Safety: block rate by category; injection detections.
- Cost: spend per prompt × model × tenant; projected vs budget.
- Cache: hit ratio per prompt.
- Provider health: fallback count; latency per vendor.
- Bias monitoring (quarterly review board).
5. Alerts
| Alert | Threshold | Severity |
|---|---|---|
| completion-failure-rate | > 2% in 10 min | P1 |
| first-token-slow | p95 > 2s | P2 |
| cache-hit-drop | < 30% for a prompt | P3 |
| budget-exhaustion | any tenant 100% | P2 |
| safety-violation-spike | > 1% in 10 min | P2 |
| injection-spike | > 5/min | P2 |
| provider-fallback-spike | > 10% fallback for 10 min | P2 |
| ai-cost-burn | > 1.5× projected daily | P2 |
6. SLOs
| SLI | Target |
|---|---|
| Completion first-token p95 | < 600ms |
| Embed p95 | < 300ms |
| Cache hit p95 | < 50ms |
| Availability | 99.9% (degraded OK) |
| Budget enforcement latency | < 10ms |
7. Business Metrics
- Daily AI cost per tenant.
- Cost per feature (tutor, co-author, etc.).
- Cost per completion trend.
- Refusal reason breakdown.
- Bias scorecard (quarterly).