Overview

:::info Source Sourced from services/ai-gateway-service/SERVICE_OVERVIEW.md in the documentation repo. :::

1. Purpose

All AI calls across the platform route through this service. Prompt registry, model routing, safety pipeline, per-tenant budget enforcement, provenance tracking, output caching, audit logging.

2. Bounded Context

AI Services (Core — competitive differentiator). UL: Prompt, PromptVersion, Model, AICompletion, Embedding, AIArtifact, SafetyVerdict, AIBudget, AIAuditEntry, PromptInjectionSignal.

3. Responsibilities

Single AIClient port for all services (F09 frozen M0 end).
Prompt registry (platform + per-tenant prompts).
Model routing: local → small cloud → large cloud; cost optimization.
Safety: pre-call moderation, PII redaction, prompt-injection shield; post-call validation.
Provenance: every artifact carries AIProvenance VO.
Per-tenant budgets; hard/soft caps.
Embeddings: per-tenant vector partitions.
Audit log (regulated class, 7y retention).
EU AI Act compliance: risk classification, bias monitoring, right-to-explanation.

4. Non-Responsibilities

Does not embed business logic (authoring, assessment, delivery own that).
Does not store domain data (services own their data).
Does not train models (always uses hosted inference).

5. Dependencies

Upstream: all services (callers via AIClient port).
Downstream: analytics (AI audit firehose), content-service (when AI generates content artifacts).
External: OpenAI, Anthropic, Google (Gemini), Azure OpenAI, Mistral, local Llama variants, Whisper, replicate.com for specialty models.

6. Slices & Milestones

S0 (M0): gateway + safety pipeline + prompt registry + AIClient port.
S1 (M1): AI tutor + local inference.
S2 (M2): AI co-author.
S4 (M3): full model routing + cost controls.
S6 (M5): AI insights v2; predictive; on-device privacy-preserving.

7. Architectural Freeze Points

F04 — AIProvenance VO (M0 end)
F09 — AIClient port (M0 end)
F10 — Prompt registry format (M0 end)
F18 — Safety verdict schema (M1 start)
F32 — Model routing policy (M3 start)

8. Key Invariants

Every AI call produces a provenance record (no untracked AI artifacts).
Tenant-scoped budgets enforced before any provider call.
Safety pipeline is mandatory; cannot be bypassed.
Tenant-scoped embeddings; never cross-tenant.
No training on tenant data (verified at integration test layer).
HIPAA tenants: restricted provider allowlist + BAA verified.

1. Purpose​

2. Bounded Context​

3. Responsibilities​

4. Non-Responsibilities​

5. Dependencies​

6. Slices & Milestones​

7. Architectural Freeze Points​

8. Key Invariants​