AI Integration
:::info Source
Sourced from services/authoring-service/08-AI_INTEGRATION.md in the documentation repo.
:::
Companion: ai-gateway-service · 10 Authoring Tool · 02 Domain Model
1. Mission
The authoring-service is the primary consumer of AI on the platform. The AI Co-Author is the dominant user-facing differentiator. Every AI interaction here must be:
- Gateway-routed — zero direct provider calls; all calls go through
ai-gateway-service. - HITL-enforced — AI output never reaches learners without human review.
- Provenance-tracked — every AI artifact carries
AIProvenancepersisted on the block. - Tenant-scoped — prompts, budgets, models, caches are all per-tenant.
- Cost-observed — AI cost and token burn are first-class metrics.
- Safe — pre- and post-moderation; refusals surfaced transparently.
- Offline-capable (S5) — falls back to on-device model when disconnected.
2. Architecture
┌─────────────────── authoring-service ──────────────────────┐
│ │
│ ┌─────────────────┐ ┌────────────────────────────┐ │
│ │ AI Use Cases │─────►│ AIClient (port) │ │
│ │ (generate, │ │ │ │
│ │ improve, │ │ ┌──────────────────────┐ │ │
│ │ quiz, etc.) │ │ │ RemoteAIClient │ │ │
│ └─────────────────┘ │ │ (HTTP → ai-gateway) │ │ │
│ │ │ └──────────────────────┘ │ │
│ │ │ ┌──────────────────────┐ │ │
│ │ │ │ LocalAIClient (S5) │ │ │
│ │ │ │ (on-device model) │ │ │
│ │ │ └──────────────────────┘ │ │
│ │ └────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ AIJobManager │◄── ai.completion.finished.v1 │
│ │ (background │ │
│ │ orchestrator) │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ ▲
▼ │
┌──────────────────┐ ┌────────────────┐
│ │ │ │
│ ai-gateway- │───────►│ LLM provider │
│ service │ │ (Claude/GPT/ │
│ │ │ local) │
└──────────────────┘ └────────────────┘
3. AI Flow Catalog
| Flow | Prompt ID | Frontend entry | Output block kind |
|---|---|---|---|
| Generate block from intent | authoring/block_from_intent@1.0.0 | Inline /ai command, canvas button | Any (text, quiz, image, branching) |
| Lesson from PDF | authoring/lesson_from_pdf@1.0.0 | Import panel | Multiple blocks |
| Improve text block | authoring/simplify@1.0.0 | Inline rewrite menu | Text |
| Quiz from lesson | assessment/quiz_from_lesson@1.0.0 | "Add Quiz" on lesson | Quiz (→ assessment-service) |
| Branching scenario | assessment/branching@1.0.0 | "Add Branching" on lesson | Branching (→ assessment-service) |
| Translate draft | i18n/translate@1.0.0 | Locale dropdown | Duplicate all text/image-alt per locale |
| Narrate (TTS) | media/tts@1.0.0 | "Add Narration" on text | Audio (via media-service) |
| Generate diagram | media/diagram_from_text@1.0.0 | "Add Image" → "AI Diagram" | Image (via media-service) |
| Auto-caption | media/captions@1.0.0 | Video block actions | Caption track on video |
| Learning objectives | authoring/objectives@1.0.0 | Outline panel | Metadata on course |
| Summarize lesson | authoring/summarize@1.0.0 | Lesson actions | Text block |
3.1 Co-author → assessment-service (EPIC-AIT-001, US-62 / US-63)
Structured co-author flows use POST /api/v1/ai/structured on ai-gateway with input.kind authoring.coauthor.quiz_bank or authoring.coauthor.scenario. When ASSESSMENT_SERVICE_BASE_URL is set on authoring-service, the same request cycle persists the gateway-validated JSON into assessment-service:
- Quiz:
POST /api/v1/quiz-banks(empty items) thenPATCH /api/v1/quiz-banks/{id}/itemswith MCQ-shaped items (id,choices[].id,correct) aligned with quiz attempt scoring. - Scenario:
POST /api/v1/scenarioswith agraph{ nodes, edges }where branching edges usechoiceId(mapped from the model’slabelwhen needed).
If the env var is unset (e.g. local proposal-only), responses still include proposal but omit quizBankId / scenarioId.
4. AIClient Port
interface AIClient {
generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle>;
improveBlock(params: AIImproveBlockParams): Promise<AIJobHandle>;
generateQuiz(params: AIQuizParams): Promise<AIJobHandle>;
generateBranching(params: AIBranchingParams): Promise<AIJobHandle>;
translateContent(params: AITranslateParams): Promise<AIJobHandle>;
narrateText(params: AITTSParams): Promise<AIJobHandle>;
generateImage(params: AIImageParams): Promise<AIJobHandle>;
summarize(params: AISummarizeParams): Promise<AIJobHandle>;
cancel(jobId: string): Promise<void>;
}
interface AIJobHandle {
jobId: string; // authoring-service local job id
aiGatewayJobId: string; // correlates with ai-gateway-service
streamUrl: string;
estimatedDurationMs?: number;
}
5. RemoteAIClient (HTTP → ai-gateway)
class RemoteAIClient implements AIClient {
constructor(
private readonly http: HttpClient,
private readonly config: AIGatewayConfig,
private readonly logger: Logger,
) {}
async generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle> {
const req = {
promptId: 'authoring/block_from_intent',
promptVersion: '1.0.0',
tenantId: params.tenantId,
callerService: 'authoring',
callerRef: {
draftId: params.draftId,
lessonId: params.lessonId,
},
inputs: {
intent: params.intent,
context: this.buildContext(params),
targetKind: params.targetKind,
locale: params.locale,
},
streaming: true,
moderation: { pre: true, post: true },
budget: { maxTokensOut: 2000, maxCostMicroUSD: 50_000 },
};
const resp = await this.http.post('/api/v1/completions', req);
return {
jobId: resp.data.jobId,
aiGatewayJobId: resp.data.aiGatewayJobId,
streamUrl: resp.data.streamUrl,
estimatedDurationMs: resp.data.estimatedDurationMs,
};
}
// ... other methods
}
6. AI Job Lifecycle
User clicks "Generate Quiz"
│
▼
┌─────────────────────────────────┐
│ POST /drafts/{id}/ai/generate- │
│ quiz │
└──────────────┬──────────────────┘
│
▼
┌──────────────────────┐ ┌─────────────────┐
│ RequestAIQuiz │───────►│ Create ai_job │
│ use case │ │ row (status= │
└──────────────────────┘ │ queued) │
│ └────────┬────────┘
│ │
▼ │
┌──────────────────────┐ │
│ AIClient.generate- │ │
│ Quiz() │ │
└──────────┬───────────┘ │
│ │
▼ │
┌──────────────────────┐ │
│ ai-gateway-service │ │
│ (async job) │ │
└──────────┬───────────┘ │
│ │
stream progress Return 202 Accepted
│ with streamUrl to client
▼
┌─────────────────────────────┐
│ ai.completion.finished.v1 │──┐
└─────────────────────────────┘ │
│
▼
┌───────────────────────┐
│ AIJobManager handler │
└──────────┬────────────┘
│
▼
┌───────────────────────┐
│ Create AIBlock with │
│ status=draft_ai, │
│ aiProvenance filled │
└──────────┬────────────┘
│
▼
┌───────────────────────┐
│ Emit block.ai_ │
│ generated.v1 │
└───────────────────────┘
7. AIProvenance Tracking
Every AI-generated artifact carries full provenance. Stored on the blocks.ai_provenance JSONB column.
interface AIProvenance {
model: string;
version?: string;
promptId: string;
promptVersion: SemVer;
traceId: string;
decisionId?: string; // HITL audit link
local: boolean;
generatedAt: ISODate;
reviewedBy?: UserId;
reviewedAt?: ISODate;
cost?: {
microUSD: number;
tokens: { in: number; out: number };
};
// Additional fields set on acceptance
acceptedVerbatim?: boolean; // true if user accepted without edits
editDistance?: number; // semantic distance from AI output to final
}
8. HITL (Human-in-the-Loop) Workflow
┌─────────────────────────────────────────────────────────┐
│ 1. AI generates block with status='draft_ai' │
│ Block is visible in draft but marked as pending │
│ review. Provenance badge visible in editor. │
├─────────────────────────────────────────────────────────┤
│ 2. User options: │
│ a. Accept verbatim → status='reviewed' │
│ b. Edit then accept → status='reviewed' │
│ provenance.acceptedVerbatim│
│ = false │
│ c. Reject → block deleted │
├─────────────────────────────────────────────────────────┤
│ 3. On accept: │
│ - aiProvenance.reviewedBy = userId │
│ - aiProvenance.reviewedAt = now │
│ - Emit block.reviewed.v1 {decision:'accepted'} │
│ - Cross-service HITL record created via │
│ ai-gateway-service audit │
├─────────────────────────────────────────────────────────┤
│ 4. Required blocks cannot be 'draft_ai' (INV-6) │
│ Publish gate rejects drafts with any 'draft_ai' │
│ required block │
└─────────────────────────────────────────────────────────┘
9. Streaming
AI generations use SSE for progressive rendering.
GET /api/v1/drafts/{id}/ai/stream?jobId={id}
Accept: text/event-stream
Event stream:
event: queued
data: {"jobId":"aij_01H...","position":3}
event: started
data: {"jobId":"aij_01H...","model":"claude-sonnet-4-20250514"}
event: chunk
data: {"jobId":"aij_01H...","text":"Here is a quiz about "}
event: chunk
data: {"jobId":"aij_01H...","text":"list comprehensions:\n\n"}
event: structured
data: {"jobId":"aij_01H...","partial":{"questions":[...]}}
event: moderation
data: {"jobId":"aij_01H...","verdict":"clean"}
event: complete
data: {"jobId":"aij_01H...","blockId":"blk_01H...","provenance":{...}}
Error events:
event: error
data: {"jobId":"...","code":"moderation_blocked","detail":"Content violated safety policy"}
event: error
data: {"jobId":"...","code":"budget_exceeded","detail":"Tenant AI budget exhausted"}
event: error
data: {"jobId":"...","code":"provider_error","detail":"Upstream provider returned 503"}
10. Prompts & Pinning
Prompts are pinned per tenant to guarantee reproducibility. Pinned in ai-gateway-service tenant config:
Tenant 'acme-corp':
authoring/block_from_intent → 1.0.0
assessment/quiz_from_lesson → 1.0.2
i18n/translate → 1.1.0
The authoring-service never hardcodes prompt content. It sends only promptId + promptVersion + inputs. The ai-gateway-service resolves content.
11. Safety Layers
11.1 Pre-Moderation
Every AI call runs a pre-flight classifier on inputs (at ai-gateway). If flagged, the call is refused with moderation_blocked before any token cost is incurred.
11.2 Prompt Injection Defense
User-supplied context is sanitized and wrapped in dedicated delimiters. System prompts remain isolated.
11.3 Post-Moderation
Every completion is scanned for:
- Toxic content
- PII leakage
- Prompt injection artifacts
- Hallucinated URLs / citations
Post-moderation failures result in ai.completion.finished.v1 with payload.completion.rejected=true. The authoring-service surfaces a non-blocking inline notice; no AIBlock is created.
11.4 Refusal Handling
If the model refuses (out-of-scope, policy violation), the UI shows: "The AI declined this request. [Why?]" with detail from the gateway.
12. Budget & Cost Controls
| Layer | Mechanism |
|---|---|
| Per-request | budget.maxTokensOut, budget.maxCostMicroUSD on every call |
| Per-user | Rate limit: 30 AI generations / min |
| Per-tenant | Monthly AI budget (config); enforced by ai-gateway; authoring-service receives budget_exceeded errors |
| Per-flow | Each flow has a default cost envelope; exceeds require opt-in |
Cost observed via ai_jobs.cost and aggregated into metrics (authoring_ai_cost_microusd_total{tenant,flow,model}).
13. Offline AI (S5)
class AIClientRouter implements AIClient {
constructor(
private readonly remote: RemoteAIClient,
private readonly local: LocalAIClient,
private readonly networkStatus: NetworkStatusPort,
) {}
async generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle> {
if (await this.networkStatus.isOnline()) {
return this.remote.generateBlock(params);
}
if (await this.local.available()) {
const result = await this.local.generateBlock(params);
// Mark provenance as local:true
return this.wrapLocal(result);
}
throw new AIError.OfflineUnavailable();
}
}
Local model is a smaller quantized model (e.g. Phi-3 mini, 3B params) baked into the authoring desktop app. Feature parity with cloud is not required — offline focuses on text improvement and simple generation.
14. Caching
- Exact-prompt cache: Per-tenant, 7-day TTL. Keyed by
(promptId, promptVersion, inputsHash, tenantId). - Semantic cache: Per-tenant, pgvector embedding of inputs. Threshold 0.92 cosine similarity.
- Cache bypass: Flag
noCache: trueon request.
Cache is at ai-gateway-service. Authoring receives cached: true in the completion event for analytics.
15. Prompt Regression Testing
Every prompt has:
- Golden test set: ~20 curated inputs with expected output structure
- Structural assertions: e.g. "quiz must have >= 3 questions", "text must be < 500 words"
- Safety eval set: red-team inputs expected to trigger refusals
- Cost budget: automated check that avg cost per generation doesn't regress
Run in CI on every prompt change. Any regression blocks the prompt version bump.
16. Observability
Metrics
| Metric | Type | Labels |
|---|---|---|
authoring_ai_requests_total | counter | tenant, flow, status |
authoring_ai_duration_seconds | histogram | tenant, flow |
authoring_ai_tokens_total | counter | tenant, flow, direction (in/out) |
authoring_ai_cost_microusd_total | counter | tenant, flow, model |
authoring_ai_acceptance_rate | gauge | tenant, flow |
authoring_ai_moderation_blocks_total | counter | tenant, flow, verdict |
authoring_ai_local_fallback_total | counter | tenant |
Traces
- Every AI use case emits a root span
ai.authoring.{flow} - Child spans include: gateway call, moderation, streaming, persist
- Trace ID propagates to ai-gateway-service and the LLM provider
17. Audit & Compliance
- Every AI call logged with:
(tenantId, userId, draftId, promptId, promptVersion, traceId, decisionId, tokens, cost, moderationVerdict) - HITL decisions persisted:
(blockId, decision, reviewedBy, reviewedAt, editDistance) - Compliance Officers have read access via internal API
- Audit trail retention: 7 years (regulated class)
18. Failure Modes
| Failure | Behavior |
|---|---|
| ai-gateway down | Circuit breaker opens; AI requests return 503 with retry hint |
| Prompt version mismatch | 422 with upgrade notice to client |
| Moderation blocked | Surface as inline notice; no block created; emit ai.moderation.blocked.v1 |
| Budget exceeded | 429 with clear error; admin notified |
| Provider timeout | Retry once with different model; if still fails, surface error |
| Partial completion (client disconnects) | Job continues server-side; result attached when reconnected |