AI Integration
:::info Source
Sourced from services/delivery-service/AI_INTEGRATION.md in the documentation repo.
:::
Companion: 03 ai-gateway-service · DOMAIN_MODEL · SECURITY_MODEL
1. AI Usage in Delivery
Delivery uses AI for a single core feature: the AI Tutor — a conversational assistant scoped to the learner's current lesson context. The tutor operates in two modes:
| Mode | Inference Path | Use Case |
|---|---|---|
| Online | ai-gateway-service -> cloud provider (Anthropic, OpenAI) | Device has connectivity |
| Offline | Local on-device model (via local AI runtime) | Device is offline (mounted bundle scenario) |
All AI calls route through the AIClient port — delivery-service never imports vendor SDKs directly.
2. AIClient Port
interface AIClient {
streamTutorResponse(request: TutorRequest): AsyncIterable<TutorChunk>;
checkLocalAvailability(): Promise<boolean>;
}
interface TutorRequest {
tenantId: TenantId;
userId: UserId;
sessionId: PlaySessionId;
turnId: string;
lessonContext: {
lessonId: string;
lessonTitle: string;
lessonContent: string; // flattened lesson text
blockIds: string[];
};
conversationHistory: ConversationTurn[];
prompt: string;
preferLocal: boolean; // true when offline or device prefers local
maxTokens?: number;
temperature?: number;
}
interface TutorChunk {
type: 'token' | 'tool_call' | 'done' | 'error';
text?: string;
index?: number;
toolCall?: ToolCallTrace;
done?: {
totalTokens: number;
aiProvenance: AIProvenance;
};
error?: { code: string; message: string };
}
interface ConversationTurn {
role: 'user' | 'assistant';
content: string;
timestamp: ISODate;
}
3. Tutor Orchestration Flow
┌─────────────────────────────────────────────────────────────┐
│ 1. Client POSTs /api/v1/play-sessions/{id}/tutor/turn │
│ Body: { prompt: "Explain polymorphism" } │
│ │
│ 2. Delivery: │
│ a. Verify session ownership + active state │
│ b. Load current cursor and lesson content from manifest │
│ c. Fetch last 5 tutor turns for conversation context │
│ d. Determine inference path: │
│ - isOffline session? -> preferLocal = true │
│ - ai-gateway reachable? -> preferLocal = false │
│ e. Build TutorRequest │
│ │
│ 3. Call AIClient.streamTutorResponse(request) │
│ - Stream chunks back to client via SSE │
│ - Buffer full response for persistence │
│ │
│ 4. On stream complete: │
│ a. Create AssistantTurn record │
│ b. Persist with aiProvenance │
│ c. Emit delivery.play_session.tutor_turn.completed.v1 │
└─────────────────────────────────────────────────────────────┘
4. RAG Context Strategy
The tutor uses in-context RAG with lesson content (not vector search). Rationale:
- Lessons are short and self-contained (typical 500-5000 tokens).
- The tutor should be scoped to the current lesson to prevent curriculum drift.
- Vector search introduces latency and cost without meaningful quality gains at lesson scale.
For long-form courses or book chapters, the ai-gateway-service can be invoked with a retrieval hint to pull from course-level embeddings — but this is opt-in per course via PlayPackage manifest flag.
5. System Prompt
The delivery-service injects a system prompt controlled by the ai-gateway's prompt registry. The active prompt (pinned per-tenant) is something like:
You are a focused educational tutor helping a learner understand the current lesson.
LESSON: {lessonTitle}
CONTENT:
{lessonContent}
Rules:
- Answer only questions related to this lesson.
- If the learner asks about unrelated topics, politely redirect to the lesson.
- Be concise: aim for 100-300 words.
- Use examples from the lesson when possible.
- Never invent facts not in the lesson.
- If you don't know, say so.
Delivery does not craft this prompt itself; it calls the gateway's getSystemPrompt('delivery.tutor', tenantId) to retrieve the pinned version.
6. Offline AI Inference
When the session is offline:
| Aspect | Behavior |
|---|---|
| Model | Small on-device model (e.g., Phi-4, Qwen 4B quantized) bundled with the app |
| Inference | Runs in a dedicated worker thread / native runtime |
| Context window | Reduced (typically 4K vs 32K+ online) |
| Response quality | Acceptable for Q&A on lesson content; reduced for complex reasoning |
| Cost | Zero (compute is on-device) |
| Provenance | aiProvenance.local = true, aiProvenance.cost = undefined |
The local model is updated via app update cycles. PlayPackage bundles may include supplementary fine-tuning artifacts for domain-specific tutoring (planned for S5+).
7. AIProvenance
Every AssistantTurn records full provenance per platform convention:
{
model: "claude-haiku-4.5" | "qwen-4b-local",
version: "2026.03",
promptId: "delivery.tutor",
promptVersion: "1.4.2",
traceId: "00-abc...-01",
decisionId: undefined, // no HITL for tutor
local: false,
generatedAt: "2026-04-15T10:15:00Z",
cost: {
microUSD: 450, // 0.00045 USD
tokens: { in: 850, out: 320 }
}
}
8. Cost Controls
| Control | Enforcement |
|---|---|
| Per-session turn limit | 30 turns/hour (via Redis counter) |
| Per-user daily budget | Inherited from tenant; enforced by ai-gateway |
| Token limits | maxTokens capped at 2048 per turn |
| Model routing | ai-gateway selects cheapest model satisfying quality threshold |
| Local-first for offline | Zero cost when offline |
9. Safety Controls
| Control | Where |
|---|---|
| Prompt injection classifier | ai-gateway (pre-call) |
| PII redaction in prompts | ai-gateway (pre-call) |
| Output toxicity filter | ai-gateway (post-stream) |
| Curriculum-drift detector | ai-gateway (post-stream); flags off-topic responses |
| Rate limiting | delivery + ai-gateway |
Unsafe content detection results in an error chunk to the client with a user-friendly fallback message, and the turn is recorded with rating: 'unhelpful' auto-set for review.
10. HITL (Human-in-the-Loop)
Tutor responses are not subject to HITL in production (latency requirement). However:
- Instructor review surface is provided via
GET /api/v1/admin/tutor-turns?flagged=truein admin portal. - Instructors can flag turns for retraining the safety classifier.
rating: 'unhelpful'turns feed into quality improvement dataset.
11. Fallback Behavior
If AIClient is unavailable:
- SSE stream sends
errorevent with codeai.unavailableand a friendly fallback message ("The AI tutor is temporarily unavailable. Please try again in a moment, or review the lesson content."). - AssistantTurn is recorded with no response and error metadata.
- No charge incurred.
- Client-side UI shows fallback state and offers retry.
12. Telemetry
| Metric | Type | Labels |
|---|---|---|
delivery_tutor_turn_duration_seconds | Histogram | tenant_id, local, model |
delivery_tutor_turn_tokens_total | Counter | direction (in/out), tenant_id |
delivery_tutor_cost_microusd_total | Counter | tenant_id, model |
delivery_tutor_errors_total | Counter | error_code, tenant_id |
delivery_tutor_rating_total | Counter | rating, tenant_id |
All tutor interactions carry trace_id across the SSE stream so ai-gateway traces can be correlated with delivery sessions.