AI Integration

:::info Source Sourced from services/authoring-service/08-AI_INTEGRATION.md in the documentation repo. :::

Companion: ai-gateway-service · 10 Authoring Tool · 02 Domain Model

1. Mission

The authoring-service is the primary consumer of AI on the platform. The AI Co-Author is the dominant user-facing differentiator. Every AI interaction here must be:

Gateway-routed — zero direct provider calls; all calls go through ai-gateway-service.
HITL-enforced — AI output never reaches learners without human review.
Provenance-tracked — every AI artifact carries AIProvenance persisted on the block.
Tenant-scoped — prompts, budgets, models, caches are all per-tenant.
Cost-observed — AI cost and token burn are first-class metrics.
Safe — pre- and post-moderation; refusals surfaced transparently.
Offline-capable (S5) — falls back to on-device model when disconnected.

2. Architecture

┌─────────────────── authoring-service ──────────────────────┐
│                                                             │
│  ┌─────────────────┐      ┌────────────────────────────┐   │
│  │  AI Use Cases   │─────►│  AIClient (port)           │   │
│  │  (generate,     │      │                            │   │
│  │   improve,      │      │  ┌──────────────────────┐  │   │
│  │   quiz, etc.)   │      │  │  RemoteAIClient       │  │   │
│  └─────────────────┘      │  │  (HTTP → ai-gateway)  │  │   │
│         │                 │  └──────────────────────┘  │   │
│         │                 │  ┌──────────────────────┐  │   │
│         │                 │  │  LocalAIClient (S5)   │  │   │
│         │                 │  │  (on-device model)    │  │   │
│         │                 │  └──────────────────────┘  │   │
│         │                 └────────────────────────────┘   │
│         │                                                   │
│         ▼                                                   │
│  ┌─────────────────┐                                       │
│  │   AIJobManager  │◄── ai.completion.finished.v1          │
│  │  (background    │                                       │
│  │   orchestrator) │                                       │
│  └─────────────────┘                                       │
└─────────────────────────────────────────────────────────────┘
          │                          ▲
          ▼                          │
┌──────────────────┐        ┌────────────────┐
│                  │        │                │
│  ai-gateway-     │───────►│  LLM provider  │
│  service         │        │  (Claude/GPT/  │
│                  │        │   local)       │
└──────────────────┘        └────────────────┘

3. AI Flow Catalog

Flow	Prompt ID	Frontend entry	Output block kind
Generate block from intent	`authoring/block_from_intent@1.0.0`	Inline `/ai` command, canvas button	Any (text, quiz, image, branching)
Lesson from PDF	`authoring/lesson_from_pdf@1.0.0`	Import panel	Multiple blocks
Improve text block	`authoring/simplify@1.0.0`	Inline rewrite menu	Text
Quiz from lesson	`assessment/quiz_from_lesson@1.0.0`	"Add Quiz" on lesson	Quiz (→ assessment-service)
Branching scenario	`assessment/branching@1.0.0`	"Add Branching" on lesson	Branching (→ assessment-service)
Translate draft	`i18n/translate@1.0.0`	Locale dropdown	Duplicate all text/image-alt per locale
Narrate (TTS)	`media/tts@1.0.0`	"Add Narration" on text	Audio (via media-service)
Generate diagram	`media/diagram_from_text@1.0.0`	"Add Image" → "AI Diagram"	Image (via media-service)
Auto-caption	`media/captions@1.0.0`	Video block actions	Caption track on video
Learning objectives	`authoring/objectives@1.0.0`	Outline panel	Metadata on course
Summarize lesson	`authoring/summarize@1.0.0`	Lesson actions	Text block

3.1 Co-author → assessment-service (EPIC-AIT-001, US-62 / US-63)

Structured co-author flows use POST /api/v1/ai/structured on ai-gateway with input.kind authoring.coauthor.quiz_bank or authoring.coauthor.scenario. When ASSESSMENT_SERVICE_BASE_URL is set on authoring-service, the same request cycle persists the gateway-validated JSON into assessment-service:

Quiz: POST /api/v1/quiz-banks (empty items) then PATCH /api/v1/quiz-banks/{id}/items with MCQ-shaped items (id, choices[].id, correct) aligned with quiz attempt scoring.
Scenario: POST /api/v1/scenarios with a graph { nodes, edges } where branching edges use choiceId (mapped from the model’s label when needed).

If the env var is unset (e.g. local proposal-only), responses still include proposal but omit quizBankId / scenarioId.

4. AIClient Port

interface AIClient {
  generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle>;
  improveBlock(params: AIImproveBlockParams): Promise<AIJobHandle>;
  generateQuiz(params: AIQuizParams): Promise<AIJobHandle>;
  generateBranching(params: AIBranchingParams): Promise<AIJobHandle>;
  translateContent(params: AITranslateParams): Promise<AIJobHandle>;
  narrateText(params: AITTSParams): Promise<AIJobHandle>;
  generateImage(params: AIImageParams): Promise<AIJobHandle>;
  summarize(params: AISummarizeParams): Promise<AIJobHandle>;
  cancel(jobId: string): Promise<void>;
}

interface AIJobHandle {
  jobId: string;                   // authoring-service local job id
  aiGatewayJobId: string;          // correlates with ai-gateway-service
  streamUrl: string;
  estimatedDurationMs?: number;
}

5. RemoteAIClient (HTTP → ai-gateway)

class RemoteAIClient implements AIClient {
  constructor(
    private readonly http: HttpClient,
    private readonly config: AIGatewayConfig,
    private readonly logger: Logger,
  ) {}

  async generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle> {
    const req = {
      promptId: 'authoring/block_from_intent',
      promptVersion: '1.0.0',
      tenantId: params.tenantId,
      callerService: 'authoring',
      callerRef: {
        draftId: params.draftId,
        lessonId: params.lessonId,
      },
      inputs: {
        intent: params.intent,
        context: this.buildContext(params),
        targetKind: params.targetKind,
        locale: params.locale,
      },
      streaming: true,
      moderation: { pre: true, post: true },
      budget: { maxTokensOut: 2000, maxCostMicroUSD: 50_000 },
    };

    const resp = await this.http.post('/api/v1/completions', req);
    return {
      jobId: resp.data.jobId,
      aiGatewayJobId: resp.data.aiGatewayJobId,
      streamUrl: resp.data.streamUrl,
      estimatedDurationMs: resp.data.estimatedDurationMs,
    };
  }

  // ... other methods
}

6. AI Job Lifecycle

 User clicks "Generate Quiz"
            │
            ▼
┌─────────────────────────────────┐
│  POST /drafts/{id}/ai/generate- │
│       quiz                       │
└──────────────┬──────────────────┘
               │
               ▼
    ┌──────────────────────┐        ┌─────────────────┐
    │  RequestAIQuiz       │───────►│ Create ai_job   │
    │  use case            │        │ row (status=    │
    └──────────────────────┘        │ queued)         │
               │                    └────────┬────────┘
               │                             │
               ▼                             │
    ┌──────────────────────┐                 │
    │  AIClient.generate-  │                 │
    │  Quiz()              │                 │
    └──────────┬───────────┘                 │
               │                             │
               ▼                             │
    ┌──────────────────────┐                 │
    │  ai-gateway-service   │                 │
    │  (async job)          │                 │
    └──────────┬───────────┘                 │
               │                             │
         stream progress              Return 202 Accepted
               │                      with streamUrl to client
               ▼
 ┌─────────────────────────────┐
 │  ai.completion.finished.v1  │──┐
 └─────────────────────────────┘  │
                                  │
                                  ▼
                  ┌───────────────────────┐
                  │  AIJobManager handler │
                  └──────────┬────────────┘
                             │
                             ▼
                  ┌───────────────────────┐
                  │  Create AIBlock with   │
                  │  status=draft_ai,      │
                  │  aiProvenance filled  │
                  └──────────┬────────────┘
                             │
                             ▼
                  ┌───────────────────────┐
                  │  Emit block.ai_       │
                  │  generated.v1         │
                  └───────────────────────┘

7. AIProvenance Tracking

Every AI-generated artifact carries full provenance. Stored on the blocks.ai_provenance JSONB column.

interface AIProvenance {
  model: string;
  version?: string;
  promptId: string;
  promptVersion: SemVer;
  traceId: string;
  decisionId?: string;            // HITL audit link
  local: boolean;
  generatedAt: ISODate;
  reviewedBy?: UserId;
  reviewedAt?: ISODate;
  cost?: {
    microUSD: number;
    tokens: { in: number; out: number };
  };
  // Additional fields set on acceptance
  acceptedVerbatim?: boolean;     // true if user accepted without edits
  editDistance?: number;          // semantic distance from AI output to final
}

8. HITL (Human-in-the-Loop) Workflow

┌─────────────────────────────────────────────────────────┐
│ 1. AI generates block with status='draft_ai'           │
│    Block is visible in draft but marked as pending     │
│    review. Provenance badge visible in editor.         │
├─────────────────────────────────────────────────────────┤
│ 2. User options:                                       │
│    a. Accept verbatim     → status='reviewed'          │
│    b. Edit then accept    → status='reviewed'          │
│                             provenance.acceptedVerbatim│
│                             = false                    │
│    c. Reject              → block deleted              │
├─────────────────────────────────────────────────────────┤
│ 3. On accept:                                          │
│    - aiProvenance.reviewedBy = userId                  │
│    - aiProvenance.reviewedAt = now                     │
│    - Emit block.reviewed.v1 {decision:'accepted'}      │
│    - Cross-service HITL record created via            │
│      ai-gateway-service audit                          │
├─────────────────────────────────────────────────────────┤
│ 4. Required blocks cannot be 'draft_ai' (INV-6)        │
│    Publish gate rejects drafts with any 'draft_ai'     │
│    required block                                      │
└─────────────────────────────────────────────────────────┘

9. Streaming

AI generations use SSE for progressive rendering.

GET /api/v1/drafts/{id}/ai/stream?jobId={id}
Accept: text/event-stream

Event stream:

event: queued
data: {"jobId":"aij_01H...","position":3}

event: started
data: {"jobId":"aij_01H...","model":"claude-sonnet-4-20250514"}

event: chunk
data: {"jobId":"aij_01H...","text":"Here is a quiz about "}

event: chunk
data: {"jobId":"aij_01H...","text":"list comprehensions:\n\n"}

event: structured
data: {"jobId":"aij_01H...","partial":{"questions":[...]}}

event: moderation
data: {"jobId":"aij_01H...","verdict":"clean"}

event: complete
data: {"jobId":"aij_01H...","blockId":"blk_01H...","provenance":{...}}

Error events:

event: error
data: {"jobId":"...","code":"moderation_blocked","detail":"Content violated safety policy"}

event: error
data: {"jobId":"...","code":"budget_exceeded","detail":"Tenant AI budget exhausted"}

event: error
data: {"jobId":"...","code":"provider_error","detail":"Upstream provider returned 503"}

10. Prompts & Pinning

Prompts are pinned per tenant to guarantee reproducibility. Pinned in ai-gateway-service tenant config:

Tenant 'acme-corp':
  authoring/block_from_intent → 1.0.0
  assessment/quiz_from_lesson → 1.0.2
  i18n/translate              → 1.1.0

The authoring-service never hardcodes prompt content. It sends only promptId + promptVersion + inputs. The ai-gateway-service resolves content.

11. Safety Layers

11.1 Pre-Moderation

Every AI call runs a pre-flight classifier on inputs (at ai-gateway). If flagged, the call is refused with moderation_blocked before any token cost is incurred.

11.2 Prompt Injection Defense

User-supplied context is sanitized and wrapped in dedicated delimiters. System prompts remain isolated.

11.3 Post-Moderation

Every completion is scanned for:

Toxic content
PII leakage
Prompt injection artifacts
Hallucinated URLs / citations

Post-moderation failures result in ai.completion.finished.v1 with payload.completion.rejected=true. The authoring-service surfaces a non-blocking inline notice; no AIBlock is created.

11.4 Refusal Handling

If the model refuses (out-of-scope, policy violation), the UI shows: "The AI declined this request. [Why?]" with detail from the gateway.

12. Budget & Cost Controls

Layer	Mechanism
Per-request	`budget.maxTokensOut`, `budget.maxCostMicroUSD` on every call
Per-user	Rate limit: 30 AI generations / min
Per-tenant	Monthly AI budget (config); enforced by ai-gateway; authoring-service receives `budget_exceeded` errors
Per-flow	Each flow has a default cost envelope; exceeds require opt-in

Cost observed via ai_jobs.cost and aggregated into metrics (authoring_ai_cost_microusd_total{tenant,flow,model}).

13. Offline AI (S5)

class AIClientRouter implements AIClient {
  constructor(
    private readonly remote: RemoteAIClient,
    private readonly local: LocalAIClient,
    private readonly networkStatus: NetworkStatusPort,
  ) {}

  async generateBlock(params: AIGenerateBlockParams): Promise<AIJobHandle> {
    if (await this.networkStatus.isOnline()) {
      return this.remote.generateBlock(params);
    }
    if (await this.local.available()) {
      const result = await this.local.generateBlock(params);
      // Mark provenance as local:true
      return this.wrapLocal(result);
    }
    throw new AIError.OfflineUnavailable();
  }
}

Local model is a smaller quantized model (e.g. Phi-3 mini, 3B params) baked into the authoring desktop app. Feature parity with cloud is not required — offline focuses on text improvement and simple generation.

14. Caching

Exact-prompt cache: Per-tenant, 7-day TTL. Keyed by (promptId, promptVersion, inputsHash, tenantId).
Semantic cache: Per-tenant, pgvector embedding of inputs. Threshold 0.92 cosine similarity.
Cache bypass: Flag noCache: true on request.

Cache is at ai-gateway-service. Authoring receives cached: true in the completion event for analytics.

15. Prompt Regression Testing

Every prompt has:

Golden test set: ~20 curated inputs with expected output structure
Structural assertions: e.g. "quiz must have >= 3 questions", "text must be < 500 words"
Safety eval set: red-team inputs expected to trigger refusals
Cost budget: automated check that avg cost per generation doesn't regress

Run in CI on every prompt change. Any regression blocks the prompt version bump.

16. Observability

Metrics

Metric	Type	Labels
`authoring_ai_requests_total`	counter	tenant, flow, status
`authoring_ai_duration_seconds`	histogram	tenant, flow
`authoring_ai_tokens_total`	counter	tenant, flow, direction (in/out)
`authoring_ai_cost_microusd_total`	counter	tenant, flow, model
`authoring_ai_acceptance_rate`	gauge	tenant, flow
`authoring_ai_moderation_blocks_total`	counter	tenant, flow, verdict
`authoring_ai_local_fallback_total`	counter	tenant

Traces

Every AI use case emits a root span ai.authoring.{flow}
Child spans include: gateway call, moderation, streaming, persist
Trace ID propagates to ai-gateway-service and the LLM provider

17. Audit & Compliance

Every AI call logged with: (tenantId, userId, draftId, promptId, promptVersion, traceId, decisionId, tokens, cost, moderationVerdict)
HITL decisions persisted: (blockId, decision, reviewedBy, reviewedAt, editDistance)
Compliance Officers have read access via internal API
Audit trail retention: 7 years (regulated class)

18. Failure Modes

Failure	Behavior
ai-gateway down	Circuit breaker opens; AI requests return 503 with retry hint
Prompt version mismatch	422 with upgrade notice to client
Moderation blocked	Surface as inline notice; no block created; emit `ai.moderation.blocked.v1`
Budget exceeded	429 with clear error; admin notified
Provider timeout	Retry once with different model; if still fails, surface error
Partial completion (client disconnects)	Job continues server-side; result attached when reconnected

1. Mission​

2. Architecture​

3. AI Flow Catalog​

3.1 Co-author → assessment-service (EPIC-AIT-001, US-62 / US-63)​

4. AIClient Port​

5. RemoteAIClient (HTTP → ai-gateway)​

6. AI Job Lifecycle​

7. AIProvenance Tracking​

8. HITL (Human-in-the-Loop) Workflow​

9. Streaming​

10. Prompts & Pinning​

11. Safety Layers​

11.1 Pre-Moderation​

11.2 Prompt Injection Defense​

11.3 Post-Moderation​

11.4 Refusal Handling​

12. Budget & Cost Controls​

13. Offline AI (S5)​

14. Caching​

15. Prompt Regression Testing​

16. Observability​

Metrics​

Traces​

17. Audit & Compliance​

18. Failure Modes​