Skip to main content

AI Integration

:::info Source Sourced from services/delivery-service/AI_INTEGRATION.md in the documentation repo. :::

Companion: 03 ai-gateway-service · DOMAIN_MODEL · SECURITY_MODEL

1. AI Usage in Delivery

Delivery uses AI for a single core feature: the AI Tutor — a conversational assistant scoped to the learner's current lesson context. The tutor operates in two modes:

ModeInference PathUse Case
Onlineai-gateway-service -> cloud provider (Anthropic, OpenAI)Device has connectivity
OfflineLocal on-device model (via local AI runtime)Device is offline (mounted bundle scenario)

All AI calls route through the AIClient port — delivery-service never imports vendor SDKs directly.

2. AIClient Port

interface AIClient {
streamTutorResponse(request: TutorRequest): AsyncIterable<TutorChunk>;
checkLocalAvailability(): Promise<boolean>;
}

interface TutorRequest {
tenantId: TenantId;
userId: UserId;
sessionId: PlaySessionId;
turnId: string;
lessonContext: {
lessonId: string;
lessonTitle: string;
lessonContent: string; // flattened lesson text
blockIds: string[];
};
conversationHistory: ConversationTurn[];
prompt: string;
preferLocal: boolean; // true when offline or device prefers local
maxTokens?: number;
temperature?: number;
}

interface TutorChunk {
type: 'token' | 'tool_call' | 'done' | 'error';
text?: string;
index?: number;
toolCall?: ToolCallTrace;
done?: {
totalTokens: number;
aiProvenance: AIProvenance;
};
error?: { code: string; message: string };
}

interface ConversationTurn {
role: 'user' | 'assistant';
content: string;
timestamp: ISODate;
}

3. Tutor Orchestration Flow

┌─────────────────────────────────────────────────────────────┐
│ 1. Client POSTs /api/v1/play-sessions/{id}/tutor/turn │
│ Body: { prompt: "Explain polymorphism" } │
│ │
│ 2. Delivery: │
│ a. Verify session ownership + active state │
│ b. Load current cursor and lesson content from manifest │
│ c. Fetch last 5 tutor turns for conversation context │
│ d. Determine inference path: │
│ - isOffline session? -> preferLocal = true │
│ - ai-gateway reachable? -> preferLocal = false │
│ e. Build TutorRequest │
│ │
│ 3. Call AIClient.streamTutorResponse(request) │
│ - Stream chunks back to client via SSE │
│ - Buffer full response for persistence │
│ │
│ 4. On stream complete: │
│ a. Create AssistantTurn record │
│ b. Persist with aiProvenance │
│ c. Emit delivery.play_session.tutor_turn.completed.v1 │
└─────────────────────────────────────────────────────────────┘

4. RAG Context Strategy

The tutor uses in-context RAG with lesson content (not vector search). Rationale:

  • Lessons are short and self-contained (typical 500-5000 tokens).
  • The tutor should be scoped to the current lesson to prevent curriculum drift.
  • Vector search introduces latency and cost without meaningful quality gains at lesson scale.

For long-form courses or book chapters, the ai-gateway-service can be invoked with a retrieval hint to pull from course-level embeddings — but this is opt-in per course via PlayPackage manifest flag.

5. System Prompt

The delivery-service injects a system prompt controlled by the ai-gateway's prompt registry. The active prompt (pinned per-tenant) is something like:

You are a focused educational tutor helping a learner understand the current lesson.

LESSON: {lessonTitle}
CONTENT:
{lessonContent}

Rules:
- Answer only questions related to this lesson.
- If the learner asks about unrelated topics, politely redirect to the lesson.
- Be concise: aim for 100-300 words.
- Use examples from the lesson when possible.
- Never invent facts not in the lesson.
- If you don't know, say so.

Delivery does not craft this prompt itself; it calls the gateway's getSystemPrompt('delivery.tutor', tenantId) to retrieve the pinned version.

6. Offline AI Inference

When the session is offline:

AspectBehavior
ModelSmall on-device model (e.g., Phi-4, Qwen 4B quantized) bundled with the app
InferenceRuns in a dedicated worker thread / native runtime
Context windowReduced (typically 4K vs 32K+ online)
Response qualityAcceptable for Q&A on lesson content; reduced for complex reasoning
CostZero (compute is on-device)
ProvenanceaiProvenance.local = true, aiProvenance.cost = undefined

The local model is updated via app update cycles. PlayPackage bundles may include supplementary fine-tuning artifacts for domain-specific tutoring (planned for S5+).

7. AIProvenance

Every AssistantTurn records full provenance per platform convention:

{
model: "claude-haiku-4.5" | "qwen-4b-local",
version: "2026.03",
promptId: "delivery.tutor",
promptVersion: "1.4.2",
traceId: "00-abc...-01",
decisionId: undefined, // no HITL for tutor
local: false,
generatedAt: "2026-04-15T10:15:00Z",
cost: {
microUSD: 450, // 0.00045 USD
tokens: { in: 850, out: 320 }
}
}

8. Cost Controls

ControlEnforcement
Per-session turn limit30 turns/hour (via Redis counter)
Per-user daily budgetInherited from tenant; enforced by ai-gateway
Token limitsmaxTokens capped at 2048 per turn
Model routingai-gateway selects cheapest model satisfying quality threshold
Local-first for offlineZero cost when offline

9. Safety Controls

ControlWhere
Prompt injection classifierai-gateway (pre-call)
PII redaction in promptsai-gateway (pre-call)
Output toxicity filterai-gateway (post-stream)
Curriculum-drift detectorai-gateway (post-stream); flags off-topic responses
Rate limitingdelivery + ai-gateway

Unsafe content detection results in an error chunk to the client with a user-friendly fallback message, and the turn is recorded with rating: 'unhelpful' auto-set for review.

10. HITL (Human-in-the-Loop)

Tutor responses are not subject to HITL in production (latency requirement). However:

  • Instructor review surface is provided via GET /api/v1/admin/tutor-turns?flagged=true in admin portal.
  • Instructors can flag turns for retraining the safety classifier.
  • rating: 'unhelpful' turns feed into quality improvement dataset.

11. Fallback Behavior

If AIClient is unavailable:

  1. SSE stream sends error event with code ai.unavailable and a friendly fallback message ("The AI tutor is temporarily unavailable. Please try again in a moment, or review the lesson content.").
  2. AssistantTurn is recorded with no response and error metadata.
  3. No charge incurred.
  4. Client-side UI shows fallback state and offers retry.

12. Telemetry

MetricTypeLabels
delivery_tutor_turn_duration_secondsHistogramtenant_id, local, model
delivery_tutor_turn_tokens_totalCounterdirection (in/out), tenant_id
delivery_tutor_cost_microusd_totalCountertenant_id, model
delivery_tutor_errors_totalCountererror_code, tenant_id
delivery_tutor_rating_totalCounterrating, tenant_id

All tutor interactions carry trace_id across the SSE stream so ai-gateway traces can be correlated with delivery sessions.