Skip to main content

AI Integration

:::info Source Sourced from services/ai-gateway-service/AI_INTEGRATION.md in the documentation repo. :::

1. Self-Reference

This service is the AI integration layer. It owns:

  • AIClient port implementation.
  • Prompt registry.
  • Model registry.
  • Safety pipeline.
  • Provenance enforcement.
  • Budget enforcement.
  • Audit logging.

2. Prompt Catalog (platform + tenant)

Platform prompts (versioned, eval-gated):

IDUsed ByClassification
tutor.rag.responddelivery-serviceLimited-risk
tutor.explain.conceptdelivery-serviceLimited-risk
tutor.summarize.sectiondelivery-serviceLimited-risk
coauthor.block.generateauthoring-serviceLimited-risk
coauthor.block.improveauthoring-serviceLimited-risk
coauthor.translateauthoring-serviceLimited-risk
quiz.generateassessment-serviceLimited-risk
quiz.distractorsassessment-serviceLimited-risk
rubric.gradeassessment-serviceHigh-risk (EU AI Act)
scenario.nextassessment-serviceLimited-risk
analytics.nl_queryanalytics-serviceLimited-risk
analytics.atrisk.predictanalytics-serviceHigh-risk
analytics.anomaly.explainanalytics-serviceLimited-risk
listing.improvemarketplace-serviceLimited-risk
dunning.personalizebilling-serviceLimited-risk
notif.copy.personalizenotification-serviceLimited-risk
media.image.generatemedia-serviceLimited-risk
media.audio.ttsmedia-serviceLimited-risk
media.stt.captionmedia-serviceLimited-risk

Tenants may add tenant-specific prompts that inherit platform safety policy.

3. Model Registry

FamilyExamples
chatGPT-4o-mini, GPT-4o, Claude Sonnet/Opus/Haiku, Gemini Pro, Llama-3-8B-local
embeddingtext-embedding-3-small, text-embedding-3-large, local BGE
imageGPT-4o-image, DALL·E, Stable Diffusion XL
ttsElevenLabs, OpenAI TTS, local Piper
sttWhisper-large, local Whisper-small
moderationOpenAI moderation, local classifier
classifiercustom classifiers (e.g., prompt-injection detector)

4. Safety Pipeline

Pre-call

  1. Template render with user input; validate against inputSchema.
  2. Moderation (categories: sexual, violence, hate, self_harm, illegal). Per-category action.
  3. PII classifier; redact or block per policy.
  4. Prompt-injection shield (heuristic + classifier). If shield mode → sanitize; if detect → score + allow.
  5. Budget check.

Provider call

  • Route via ModelPreference; fallback on provider failure.
  • Strip baggage/trace headers from provider-bound requests (no x-ghasi-* outbound).
  • Configure noTrain.

Post-call

  1. Moderation on output.
  2. Output schema validation (if declared).
  3. If structured output: attempt repair (one retry with stricter system prompt).
  4. Cache decision.
  5. Cost debit + audit entry.

5. Provenance (F04)

Every artifact carries:

interface AIProvenance {
model: string;
version?: string;
promptId?: string;
promptVersion?: SemVer;
traceId: string;
decisionId?: string;
local: boolean;
generatedAt: ISODate;
reviewedBy?: UserId;
reviewedAt?: ISODate;
cost?: { microUSD: number; tokens: { in: number; out: number } };
}

6. Budget Enforcement

  • Per-tenant daily + monthly budgets.
  • Atomic debit on completion.
  • Soft alert at 80%; hard cap at 100% (refuse with ai.refused.budget).
  • Additional budget purchasable via billing (AI credit packs).

7. Caching

  • Key: (tenantId, promptHash, modelId, inputFingerprint).
  • Default TTL 24h; per-prompt override.
  • Non-deterministic prompts (temperature > 0.3): cache disabled.
  • Cache returns full provenance (stored), not re-computed; flags cacheHit: true.

8. EU AI Act Classification

  • High-risk AI: rubric grading, at-risk predictor.
  • Limited-risk AI: tutor, co-author, analytics NL query, listing improve, notification copy.
  • Minimal-risk: moderation, classification.

High-risk capabilities:

  • Explicit HITL (human override path).
  • Accuracy documentation + post-market monitoring.
  • Bias monitoring quarterly (demographic parity, equalized odds).
  • Right-to-explanation UI.
  • Refusal + dispute mechanism.

9. Bias Monitoring

  • Quarterly eval on demographic-parity + equalized-odds on consenting sample data.
  • Findings reviewed by compliance + AI leads.
  • Prompt rollback if bias increases.

10. On-Device AI (S1+)

  • Local models (Llama-3-8B quantized, Whisper-small, Piper TTS).
  • Bundled in PlayPackage for offline.
  • Same safety pipeline applied on-device.
  • Provenance recorded locally, synced on reconnect.

11. Privacy

  • noTrain enforced on all providers.
  • PII redaction pre-call.
  • Tenant-scoped embeddings.
  • HIPAA tenants: on-premise / BAA-signed providers only.
  • HIPAA tenants: audit log of every AI call (regulated).

12. Cost Routing

  • Local first if eligible (low quality demands).
  • Small cloud (e.g., Claude Haiku) for mid-complexity.
  • Large cloud (e.g., Opus) for complex reasoning.
  • Automatic fallback on provider failure.