Overview
:::info Source
Sourced from services/ai-gateway-service/SERVICE_OVERVIEW.md in the documentation repo.
:::
1. Purpose
All AI calls across the platform route through this service. Prompt registry, model routing, safety pipeline, per-tenant budget enforcement, provenance tracking, output caching, audit logging.
2. Bounded Context
AI Services (Core — competitive differentiator). UL: Prompt, PromptVersion, Model, AICompletion, Embedding, AIArtifact, SafetyVerdict, AIBudget, AIAuditEntry, PromptInjectionSignal.
3. Responsibilities
- Single
AIClientport for all services (F09 frozen M0 end). - Prompt registry (platform + per-tenant prompts).
- Model routing: local → small cloud → large cloud; cost optimization.
- Safety: pre-call moderation, PII redaction, prompt-injection shield; post-call validation.
- Provenance: every artifact carries AIProvenance VO.
- Per-tenant budgets; hard/soft caps.
- Embeddings: per-tenant vector partitions.
- Audit log (regulated class, 7y retention).
- EU AI Act compliance: risk classification, bias monitoring, right-to-explanation.
4. Non-Responsibilities
- Does not embed business logic (authoring, assessment, delivery own that).
- Does not store domain data (services own their data).
- Does not train models (always uses hosted inference).
5. Dependencies
- Upstream: all services (callers via AIClient port).
- Downstream: analytics (AI audit firehose), content-service (when AI generates content artifacts).
- External: OpenAI, Anthropic, Google (Gemini), Azure OpenAI, Mistral, local Llama variants, Whisper, replicate.com for specialty models.
6. Slices & Milestones
- S0 (M0): gateway + safety pipeline + prompt registry + AIClient port.
- S1 (M1): AI tutor + local inference.
- S2 (M2): AI co-author.
- S4 (M3): full model routing + cost controls.
- S6 (M5): AI insights v2; predictive; on-device privacy-preserving.
7. Architectural Freeze Points
- F04 — AIProvenance VO (M0 end)
- F09 — AIClient port (M0 end)
- F10 — Prompt registry format (M0 end)
- F18 — Safety verdict schema (M1 start)
- F32 — Model routing policy (M3 start)
8. Key Invariants
- Every AI call produces a provenance record (no untracked AI artifacts).
- Tenant-scoped budgets enforced before any provider call.
- Safety pipeline is mandatory; cannot be bypassed.
- Tenant-scoped embeddings; never cross-tenant.
- No training on tenant data (verified at integration test layer).
- HIPAA tenants: restricted provider allowlist + BAA verified.