Skip to main content

AI Integration

:::info Source Sourced from services/authoring-service/08-AI_INTEGRATION.md in the documentation repo. :::

Companion: ai-gateway-service · 10 Authoring Tool · 02 Domain Model


1. Mission

The authoring-service is the primary consumer of AI on the platform. The AI Co-Author is the dominant user-facing differentiator. Every AI interaction here must be:

  1. Gateway-routed — zero direct provider calls; all calls go through ai-gateway-service.
  2. HITL-enforced — AI output never reaches learners without human review.
  3. Provenance-tracked — every AI artifact carries AIProvenance persisted on the block.
  4. Tenant-scoped — prompts, budgets, models, caches are all per-tenant.
  5. Cost-observed — AI cost and token burn are first-class metrics.
  6. Safe — pre- and post-moderation; refusals surfaced transparently.
  7. Offline-capable (S5) — falls back to on-device model when disconnected.

2. Architecture

┌─────────────────── authoring-service ──────────────────────┐
│ │
│ ┌─────────────────┐ ┌────────────────────────────┐ │
│ │ AI Use Cases │─────►│ AIClient (port) │ │
│ │ (generate, │ │ │ │
│ │ improve, │ │ ┌──────────────────────┐ │ │
│ │ quiz, etc.) │ │ │ RemoteAIClient │ │ │
│ └─────────────────┘ │ │ (HTTP → ai-gateway) │ │ │
│ │ │ └──────────────────────┘ │ │
│ │ │ ┌──────────────────────┐ │ │
│ │ │ │ LocalAIClient (S5) │ │ │
│ │ │ │ (on-device model) │ │ │
│ │ │ └──────────────────────┘ │ │
│ │ └────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ AIJobManager │◄── ai.completion.finished.v1 │
│ │ (background │ │
│ │ orchestrator) │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ ▲
▼ │
┌──────────────────┐ ┌────────────────┐
│ │ │ │
│ ai-gateway- │───────►│ LLM provider │
│ service │ │ (Claude/GPT/ │
│ │ │ local) │
└──────────────────┘ └────────────────┘

3. AI Flow Catalog

FlowPrompt IDFrontend entryOutput block kind
Generate block from intentauthoring/block_from_intent@1.0.0Inline /ai command, canvas buttonAny (text, quiz, image, branching)
Lesson from PDFauthoring/lesson_from_pdf@1.0.0Import panelMultiple blocks
Improve text blockauthoring/simplify@1.0.0Inline rewrite menuText
Quiz from lessonassessment/quiz_from_lesson@1.0.0"Add Quiz" on lessonQuiz (→ assessment-service)
Branching scenarioassessment/branching@1.0.0"Add Branching" on lessonBranching (→ assessment-service)
Translate drafti18n/translate@1.0.0Locale dropdownDuplicate all text/image-alt per locale
Narrate (TTS)media/tts@1.0.0"Add Narration" on textAudio (via media-service)
Generate diagrammedia/diagram_from_text@1.0.0"Add Image" → "AI Diagram"Image (via media-service)
Auto-captionmedia/captions@1.0.0Video block actionsCaption track on video
Learning objectivesauthoring/objectives@1.0.0Outline panelMetadata on course
Summarize lessonauthoring/summarize@1.0.0Lesson actionsText block

3.1 Co-author → assessment-service (EPIC-AIT-001, US-62 / US-63)

Structured co-author flows use POST /api/v1/ai/structured on ai-gateway with input.kind authoring.coauthor.quiz_bank or authoring.coauthor.scenario. When ASSESSMENT_SERVICE_BASE_URL is set on authoring-service, the same request cycle persists the gateway-validated JSON into assessment-service:

  • Quiz: POST /api/v1/quiz-banks (empty items) then PATCH /api/v1/quiz-banks/{id}/items with MCQ-shaped items (id, choices[].id, correct) aligned with quiz attempt scoring.
  • Scenario: POST /api/v1/scenarios with a graph { nodes, edges } where branching edges use choiceId (mapped from the model’s label when needed).

If the env var is unset (e.g. local proposal-only), responses still include proposal but omit quizBankId / scenarioId.

4. AIClient Port

interface AIClient {
generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle>;
improveBlock(params: AIImproveBlockParams): Promise<AIJobHandle>;
generateQuiz(params: AIQuizParams): Promise<AIJobHandle>;
generateBranching(params: AIBranchingParams): Promise<AIJobHandle>;
translateContent(params: AITranslateParams): Promise<AIJobHandle>;
narrateText(params: AITTSParams): Promise<AIJobHandle>;
generateImage(params: AIImageParams): Promise<AIJobHandle>;
summarize(params: AISummarizeParams): Promise<AIJobHandle>;
cancel(jobId: string): Promise<void>;
}

interface AIJobHandle {
jobId: string; // authoring-service local job id
aiGatewayJobId: string; // correlates with ai-gateway-service
streamUrl: string;
estimatedDurationMs?: number;
}

5. RemoteAIClient (HTTP → ai-gateway)

class RemoteAIClient implements AIClient {
constructor(
private readonly http: HttpClient,
private readonly config: AIGatewayConfig,
private readonly logger: Logger,
) {}

async generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle> {
const req = {
promptId: 'authoring/block_from_intent',
promptVersion: '1.0.0',
tenantId: params.tenantId,
callerService: 'authoring',
callerRef: {
draftId: params.draftId,
lessonId: params.lessonId,
},
inputs: {
intent: params.intent,
context: this.buildContext(params),
targetKind: params.targetKind,
locale: params.locale,
},
streaming: true,
moderation: { pre: true, post: true },
budget: { maxTokensOut: 2000, maxCostMicroUSD: 50_000 },
};

const resp = await this.http.post('/api/v1/completions', req);
return {
jobId: resp.data.jobId,
aiGatewayJobId: resp.data.aiGatewayJobId,
streamUrl: resp.data.streamUrl,
estimatedDurationMs: resp.data.estimatedDurationMs,
};
}

// ... other methods
}

6. AI Job Lifecycle

User clicks "Generate Quiz"


┌─────────────────────────────────┐
│ POST /drafts/{id}/ai/generate- │
│ quiz │
└──────────────┬──────────────────┘


┌──────────────────────┐ ┌─────────────────┐
│ RequestAIQuiz │───────►│ Create ai_job │
│ use case │ │ row (status= │
└──────────────────────┘ │ queued) │
│ └────────┬────────┘
│ │
▼ │
┌──────────────────────┐ │
│ AIClient.generate- │ │
│ Quiz() │ │
└──────────┬───────────┘ │
│ │
▼ │
┌──────────────────────┐ │
│ ai-gateway-service │ │
│ (async job) │ │
└──────────┬───────────┘ │
│ │
stream progress Return 202 Accepted
│ with streamUrl to client

┌─────────────────────────────┐
│ ai.completion.finished.v1 │──┐
└─────────────────────────────┘ │


┌───────────────────────┐
│ AIJobManager handler │
└──────────┬────────────┘


┌───────────────────────┐
│ Create AIBlock with │
│ status=draft_ai, │
│ aiProvenance filled │
└──────────┬────────────┘


┌───────────────────────┐
│ Emit block.ai_ │
│ generated.v1 │
└───────────────────────┘

7. AIProvenance Tracking

Every AI-generated artifact carries full provenance. Stored on the blocks.ai_provenance JSONB column.

interface AIProvenance {
model: string;
version?: string;
promptId: string;
promptVersion: SemVer;
traceId: string;
decisionId?: string; // HITL audit link
local: boolean;
generatedAt: ISODate;
reviewedBy?: UserId;
reviewedAt?: ISODate;
cost?: {
microUSD: number;
tokens: { in: number; out: number };
};
// Additional fields set on acceptance
acceptedVerbatim?: boolean; // true if user accepted without edits
editDistance?: number; // semantic distance from AI output to final
}

8. HITL (Human-in-the-Loop) Workflow

┌─────────────────────────────────────────────────────────┐
│ 1. AI generates block with status='draft_ai' │
│ Block is visible in draft but marked as pending │
│ review. Provenance badge visible in editor. │
├─────────────────────────────────────────────────────────┤
│ 2. User options: │
│ a. Accept verbatim → status='reviewed' │
│ b. Edit then accept → status='reviewed' │
│ provenance.acceptedVerbatim│
│ = false │
│ c. Reject → block deleted │
├─────────────────────────────────────────────────────────┤
│ 3. On accept: │
│ - aiProvenance.reviewedBy = userId │
│ - aiProvenance.reviewedAt = now │
│ - Emit block.reviewed.v1 {decision:'accepted'} │
│ - Cross-service HITL record created via │
│ ai-gateway-service audit │
├─────────────────────────────────────────────────────────┤
│ 4. Required blocks cannot be 'draft_ai' (INV-6) │
│ Publish gate rejects drafts with any 'draft_ai' │
│ required block │
└─────────────────────────────────────────────────────────┘

9. Streaming

AI generations use SSE for progressive rendering.

GET /api/v1/drafts/{id}/ai/stream?jobId={id}
Accept: text/event-stream

Event stream:

event: queued
data: {"jobId":"aij_01H...","position":3}

event: started
data: {"jobId":"aij_01H...","model":"claude-sonnet-4-20250514"}

event: chunk
data: {"jobId":"aij_01H...","text":"Here is a quiz about "}

event: chunk
data: {"jobId":"aij_01H...","text":"list comprehensions:\n\n"}

event: structured
data: {"jobId":"aij_01H...","partial":{"questions":[...]}}

event: moderation
data: {"jobId":"aij_01H...","verdict":"clean"}

event: complete
data: {"jobId":"aij_01H...","blockId":"blk_01H...","provenance":{...}}

Error events:

event: error
data: {"jobId":"...","code":"moderation_blocked","detail":"Content violated safety policy"}

event: error
data: {"jobId":"...","code":"budget_exceeded","detail":"Tenant AI budget exhausted"}

event: error
data: {"jobId":"...","code":"provider_error","detail":"Upstream provider returned 503"}

10. Prompts & Pinning

Prompts are pinned per tenant to guarantee reproducibility. Pinned in ai-gateway-service tenant config:

Tenant 'acme-corp':
authoring/block_from_intent → 1.0.0
assessment/quiz_from_lesson → 1.0.2
i18n/translate → 1.1.0

The authoring-service never hardcodes prompt content. It sends only promptId + promptVersion + inputs. The ai-gateway-service resolves content.

11. Safety Layers

11.1 Pre-Moderation

Every AI call runs a pre-flight classifier on inputs (at ai-gateway). If flagged, the call is refused with moderation_blocked before any token cost is incurred.

11.2 Prompt Injection Defense

User-supplied context is sanitized and wrapped in dedicated delimiters. System prompts remain isolated.

11.3 Post-Moderation

Every completion is scanned for:

  • Toxic content
  • PII leakage
  • Prompt injection artifacts
  • Hallucinated URLs / citations

Post-moderation failures result in ai.completion.finished.v1 with payload.completion.rejected=true. The authoring-service surfaces a non-blocking inline notice; no AIBlock is created.

11.4 Refusal Handling

If the model refuses (out-of-scope, policy violation), the UI shows: "The AI declined this request. [Why?]" with detail from the gateway.

12. Budget & Cost Controls

LayerMechanism
Per-requestbudget.maxTokensOut, budget.maxCostMicroUSD on every call
Per-userRate limit: 30 AI generations / min
Per-tenantMonthly AI budget (config); enforced by ai-gateway; authoring-service receives budget_exceeded errors
Per-flowEach flow has a default cost envelope; exceeds require opt-in

Cost observed via ai_jobs.cost and aggregated into metrics (authoring_ai_cost_microusd_total{tenant,flow,model}).

13. Offline AI (S5)

class AIClientRouter implements AIClient {
constructor(
private readonly remote: RemoteAIClient,
private readonly local: LocalAIClient,
private readonly networkStatus: NetworkStatusPort,
) {}

async generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle> {
if (await this.networkStatus.isOnline()) {
return this.remote.generateBlock(params);
}
if (await this.local.available()) {
const result = await this.local.generateBlock(params);
// Mark provenance as local:true
return this.wrapLocal(result);
}
throw new AIError.OfflineUnavailable();
}
}

Local model is a smaller quantized model (e.g. Phi-3 mini, 3B params) baked into the authoring desktop app. Feature parity with cloud is not required — offline focuses on text improvement and simple generation.

14. Caching

  • Exact-prompt cache: Per-tenant, 7-day TTL. Keyed by (promptId, promptVersion, inputsHash, tenantId).
  • Semantic cache: Per-tenant, pgvector embedding of inputs. Threshold 0.92 cosine similarity.
  • Cache bypass: Flag noCache: true on request.

Cache is at ai-gateway-service. Authoring receives cached: true in the completion event for analytics.

15. Prompt Regression Testing

Every prompt has:

  • Golden test set: ~20 curated inputs with expected output structure
  • Structural assertions: e.g. "quiz must have >= 3 questions", "text must be < 500 words"
  • Safety eval set: red-team inputs expected to trigger refusals
  • Cost budget: automated check that avg cost per generation doesn't regress

Run in CI on every prompt change. Any regression blocks the prompt version bump.

16. Observability

Metrics

MetricTypeLabels
authoring_ai_requests_totalcountertenant, flow, status
authoring_ai_duration_secondshistogramtenant, flow
authoring_ai_tokens_totalcountertenant, flow, direction (in/out)
authoring_ai_cost_microusd_totalcountertenant, flow, model
authoring_ai_acceptance_rategaugetenant, flow
authoring_ai_moderation_blocks_totalcountertenant, flow, verdict
authoring_ai_local_fallback_totalcountertenant

Traces

  • Every AI use case emits a root span ai.authoring.{flow}
  • Child spans include: gateway call, moderation, streaming, persist
  • Trace ID propagates to ai-gateway-service and the LLM provider

17. Audit & Compliance

  • Every AI call logged with: (tenantId, userId, draftId, promptId, promptVersion, traceId, decisionId, tokens, cost, moderationVerdict)
  • HITL decisions persisted: (blockId, decision, reviewedBy, reviewedAt, editDistance)
  • Compliance Officers have read access via internal API
  • Audit trail retention: 7 years (regulated class)

18. Failure Modes

FailureBehavior
ai-gateway downCircuit breaker opens; AI requests return 503 with retry hint
Prompt version mismatch422 with upgrade notice to client
Moderation blockedSurface as inline notice; no block created; emit ai.moderation.blocked.v1
Budget exceeded429 with clear error; admin notified
Provider timeoutRetry once with different model; if still fails, surface error
Partial completion (client disconnects)Job continues server-side; result attached when reconnected