Skip to main content

Overview

:::info Source Sourced from services/ai-gateway-service/SERVICE_OVERVIEW.md in the documentation repo. :::

1. Purpose

All AI calls across the platform route through this service. Prompt registry, model routing, safety pipeline, per-tenant budget enforcement, provenance tracking, output caching, audit logging.

2. Bounded Context

AI Services (Core — competitive differentiator). UL: Prompt, PromptVersion, Model, AICompletion, Embedding, AIArtifact, SafetyVerdict, AIBudget, AIAuditEntry, PromptInjectionSignal.

3. Responsibilities

  • Single AIClient port for all services (F09 frozen M0 end).
  • Prompt registry (platform + per-tenant prompts).
  • Model routing: local → small cloud → large cloud; cost optimization.
  • Safety: pre-call moderation, PII redaction, prompt-injection shield; post-call validation.
  • Provenance: every artifact carries AIProvenance VO.
  • Per-tenant budgets; hard/soft caps.
  • Embeddings: per-tenant vector partitions.
  • Audit log (regulated class, 7y retention).
  • EU AI Act compliance: risk classification, bias monitoring, right-to-explanation.

4. Non-Responsibilities

  • Does not embed business logic (authoring, assessment, delivery own that).
  • Does not store domain data (services own their data).
  • Does not train models (always uses hosted inference).

5. Dependencies

  • Upstream: all services (callers via AIClient port).
  • Downstream: analytics (AI audit firehose), content-service (when AI generates content artifacts).
  • External: OpenAI, Anthropic, Google (Gemini), Azure OpenAI, Mistral, local Llama variants, Whisper, replicate.com for specialty models.

6. Slices & Milestones

  • S0 (M0): gateway + safety pipeline + prompt registry + AIClient port.
  • S1 (M1): AI tutor + local inference.
  • S2 (M2): AI co-author.
  • S4 (M3): full model routing + cost controls.
  • S6 (M5): AI insights v2; predictive; on-device privacy-preserving.

7. Architectural Freeze Points

  • F04 — AIProvenance VO (M0 end)
  • F09 — AIClient port (M0 end)
  • F10 — Prompt registry format (M0 end)
  • F18 — Safety verdict schema (M1 start)
  • F32 — Model routing policy (M3 start)

8. Key Invariants

  • Every AI call produces a provenance record (no untracked AI artifacts).
  • Tenant-scoped budgets enforced before any provider call.
  • Safety pipeline is mandatory; cannot be bypassed.
  • Tenant-scoped embeddings; never cross-tenant.
  • No training on tenant data (verified at integration test layer).
  • HIPAA tenants: restricted provider allowlist + BAA verified.