Skip to main content

Application Logic

:::info Source Sourced from services/ai-gateway-service/APPLICATION_LOGIC.md in the documentation repo. :::

1. Application Services

  • CompletionService — chat completions via AIClient.
  • EmbeddingService — vector embeddings.
  • ModerationService — standalone moderation (for non-completion uses).
  • ImageGenService — image synthesis.
  • AudioGenService — TTS.
  • STTService — speech-to-text.
  • PromptRegistryService — CRUD of prompts, versioning, eval integration.
  • BudgetService — per-tenant budget tracking + alerting.
  • ModelRouterService — selects model per request.
  • SafetyPipelineService — pre + post-call pipeline.
  • CacheService — Redis-backed cache by (promptHash, modelId, tenantId).
  • AuditService — append audit entries + provenance.

2. Commands

CommandTrigger
CompleteCaller via AIClient (most common)
EmbedCaller
ModerateStandalone safety check
GenerateImage / GenerateAudio / TranscribeDomain callers
PublishPromptVersionAdmin (after eval pass)
DeprecatePromptVersionAdmin
UpdateBudgetPlatform admin or billing
RecordAuditEntryInternal

3. Queries

  • getPrompt(id, version?) — resolve active version.
  • listModels(family?, locality?, residency?).
  • getBudget(tenantId, period).
  • auditQuery(tenantId, filters) — compliance officer.
  • similaritySearch(tenantId, vector, filters, k) — for RAG.

4. Sagas / Policies

  • GDPR Erasure Saga: delete embeddings + audit entries for user (audit retained for legal hold if applicable per tenant policy).
  • Prompt Rollout Saga (M4+): new prompt version → eval pass → canary % → full.

5. Policies

  • Budget hard-cap: refuse request with ai.refused.budget.
  • Safety block: refuse with ai.refused.safety; log AuditEntry.
  • Model selection: prefer local if eligible + meets quality; else cloud.
  • Cache: same (promptHash + modelId + tenantId) returns cached result (respect maxAge per prompt).
  • Per-prompt rate limit (e.g., tutor: 60/min/user).

6. Use Case Flows

6.1 Completion

Caller: client.complete({ promptId, input, context })


1. Resolve prompt version (tenant pin or active).
2. Validate input against inputSchema.
3. Check budget (soft alert at 80%, hard-cap at 100%).
4. Input safety: moderation + PII redaction + injection shield.
- Block if configured; emit refused event.
5. Cache lookup (promptHash + modelId + tenantId + input fingerprint).
- Hit → return cached.
6. Model routing: iterate preferences; skip if locality/residency mismatch or BAA required but missing.
7. Call provider with baggage-stripped context.
8. Output safety: moderation + schema validation.
- Block if violates policy.
9. Store completion record + cost debit.
10. Cache output (if cacheable).
11. Emit ai.gateway.call.completed.v1.
12. Return (output, provenance).

6.2 Embedding

Caller: client.embed({ sourceRef, content, modelPreference })


1. Safety: basic classification.
2. Budget check.
3. Model select (per-tenant default embedding model).
4. Generate vector.
5. Store in embeddings table (tenant partition).
6. Emit ai.embedding.created.v1.
7. Return (vector, id, provenance).
Caller: client.knn({ vector, sourceKind, filters, k })


1. Tenant filter mandatory.
2. pgvector HNSW query within tenant partition.
3. Return matches with (score, sourceRef).

6.4 Prompt Publishing

Admin: publishPromptVersion(promptId, newVersion)


1. Validate template syntax + schemas.
2. Run eval set (ai-gateway owns eval harness).
3. If pass: mark 'active'; deprecate prior active.
4. Emit ai.prompt.version_published.v1.
5. Optional: canary rollout (M4+).

6.5 Budget Exhaustion

Incoming request → budget check → exceeded


Refuse with ai.refused.budget.
Emit ai.budget.exhausted.v1.
Notify tenant admin.
On reset (daily/monthly boundary), clear + resume.

7. Structured completions — authoring.coauthor.* (EPIC-AIT-001)

Authoring calls POST /api/v1/ai/structured with input.kind set to one of the co-author kinds below. Local / test environments use StubStructuredLlmAdapter, which returns deterministic JSON matching the caller’s outputSchema (no live LLM). Production uses the provider pipeline + schema validation.

input.kindPurpose
authoring.coauthor.lesson_blocksIntent → markdown blocks
authoring.coauthor.pdf_sectionsPDF ingest stub → section markdown
authoring.coauthor.translateLocale variants
authoring.coauthor.rewriteTone / audience rewrite
authoring.coauthor.metadataTitle, description, objectives
authoring.coauthor.narrateTranscript + placeholder asset id
authoring.coauthor.imageAlt text + placeholder asset id
authoring.coauthor.quiz_bankQuiz proposal (assessment persistence separate)
authoring.coauthor.scenarioBranching scenario graph proposal

Note: Adding a new kind requires updating the stub adapter (or real router), authoring DraftAiCoauthorService, and (when applicable) EVENT_SCHEMAS / API contracts in the documentation repo.