AI Integration
:::info Source
Sourced from services/ai-gateway-service/AI_INTEGRATION.md in the documentation repo.
:::
1. Self-Reference
This service is the AI integration layer. It owns:
- AIClient port implementation.
- Prompt registry.
- Model registry.
- Safety pipeline.
- Provenance enforcement.
- Budget enforcement.
- Audit logging.
2. Prompt Catalog (platform + tenant)
Platform prompts (versioned, eval-gated):
| ID | Used By | Classification |
|---|---|---|
tutor.rag.respond | delivery-service | Limited-risk |
tutor.explain.concept | delivery-service | Limited-risk |
tutor.summarize.section | delivery-service | Limited-risk |
coauthor.block.generate | authoring-service | Limited-risk |
coauthor.block.improve | authoring-service | Limited-risk |
coauthor.translate | authoring-service | Limited-risk |
quiz.generate | assessment-service | Limited-risk |
quiz.distractors | assessment-service | Limited-risk |
rubric.grade | assessment-service | High-risk (EU AI Act) |
scenario.next | assessment-service | Limited-risk |
analytics.nl_query | analytics-service | Limited-risk |
analytics.atrisk.predict | analytics-service | High-risk |
analytics.anomaly.explain | analytics-service | Limited-risk |
listing.improve | marketplace-service | Limited-risk |
dunning.personalize | billing-service | Limited-risk |
notif.copy.personalize | notification-service | Limited-risk |
media.image.generate | media-service | Limited-risk |
media.audio.tts | media-service | Limited-risk |
media.stt.caption | media-service | Limited-risk |
Tenants may add tenant-specific prompts that inherit platform safety policy.
3. Model Registry
| Family | Examples |
|---|---|
| chat | GPT-4o-mini, GPT-4o, Claude Sonnet/Opus/Haiku, Gemini Pro, Llama-3-8B-local |
| embedding | text-embedding-3-small, text-embedding-3-large, local BGE |
| image | GPT-4o-image, DALL·E, Stable Diffusion XL |
| tts | ElevenLabs, OpenAI TTS, local Piper |
| stt | Whisper-large, local Whisper-small |
| moderation | OpenAI moderation, local classifier |
| classifier | custom classifiers (e.g., prompt-injection detector) |
4. Safety Pipeline
Pre-call
- Template render with user input; validate against inputSchema.
- Moderation (categories: sexual, violence, hate, self_harm, illegal). Per-category action.
- PII classifier; redact or block per policy.
- Prompt-injection shield (heuristic + classifier). If
shieldmode → sanitize; ifdetect→ score + allow. - Budget check.
Provider call
- Route via ModelPreference; fallback on provider failure.
- Strip baggage/trace headers from provider-bound requests (no
x-ghasi-*outbound). - Configure
noTrain.
Post-call
- Moderation on output.
- Output schema validation (if declared).
- If structured output: attempt repair (one retry with stricter system prompt).
- Cache decision.
- Cost debit + audit entry.
5. Provenance (F04)
Every artifact carries:
interface AIProvenance {
model: string;
version?: string;
promptId?: string;
promptVersion?: SemVer;
traceId: string;
decisionId?: string;
local: boolean;
generatedAt: ISODate;
reviewedBy?: UserId;
reviewedAt?: ISODate;
cost?: { microUSD: number; tokens: { in: number; out: number } };
}
6. Budget Enforcement
- Per-tenant daily + monthly budgets.
- Atomic debit on completion.
- Soft alert at 80%; hard cap at 100% (refuse with
ai.refused.budget). - Additional budget purchasable via billing (AI credit packs).
7. Caching
- Key:
(tenantId, promptHash, modelId, inputFingerprint). - Default TTL 24h; per-prompt override.
- Non-deterministic prompts (temperature > 0.3): cache disabled.
- Cache returns full provenance (stored), not re-computed; flags
cacheHit: true.
8. EU AI Act Classification
- High-risk AI: rubric grading, at-risk predictor.
- Limited-risk AI: tutor, co-author, analytics NL query, listing improve, notification copy.
- Minimal-risk: moderation, classification.
High-risk capabilities:
- Explicit HITL (human override path).
- Accuracy documentation + post-market monitoring.
- Bias monitoring quarterly (demographic parity, equalized odds).
- Right-to-explanation UI.
- Refusal + dispute mechanism.
9. Bias Monitoring
- Quarterly eval on demographic-parity + equalized-odds on consenting sample data.
- Findings reviewed by compliance + AI leads.
- Prompt rollback if bias increases.
10. On-Device AI (S1+)
- Local models (Llama-3-8B quantized, Whisper-small, Piper TTS).
- Bundled in PlayPackage for offline.
- Same safety pipeline applied on-device.
- Provenance recorded locally, synced on reconnect.
11. Privacy
noTrainenforced on all providers.- PII redaction pre-call.
- Tenant-scoped embeddings.
- HIPAA tenants: on-premise / BAA-signed providers only.
- HIPAA tenants: audit log of every AI call (regulated).
12. Cost Routing
- Local first if eligible (low quality demands).
- Small cloud (e.g., Claude Haiku) for mid-complexity.
- Large cloud (e.g., Opus) for complex reasoning.
- Automatic fallback on provider failure.