Application Logic

:::info Source Sourced from services/ai-gateway-service/APPLICATION_LOGIC.md in the documentation repo. :::

1. Application Services

CompletionService — chat completions via AIClient.
EmbeddingService — vector embeddings.
ModerationService — standalone moderation (for non-completion uses).
ImageGenService — image synthesis.
AudioGenService — TTS.
STTService — speech-to-text.
PromptRegistryService — CRUD of prompts, versioning, eval integration.
BudgetService — per-tenant budget tracking + alerting.
ModelRouterService — selects model per request.
SafetyPipelineService — pre + post-call pipeline.
CacheService — Redis-backed cache by (promptHash, modelId, tenantId).
AuditService — append audit entries + provenance.

2. Commands

Command	Trigger
`Complete`	Caller via AIClient (most common)
`Embed`	Caller
`Moderate`	Standalone safety check
`GenerateImage` / `GenerateAudio` / `Transcribe`	Domain callers
`PublishPromptVersion`	Admin (after eval pass)
`DeprecatePromptVersion`	Admin
`UpdateBudget`	Platform admin or billing
`RecordAuditEntry`	Internal

3. Queries

getPrompt(id, version?) — resolve active version.
listModels(family?, locality?, residency?).
getBudget(tenantId, period).
auditQuery(tenantId, filters) — compliance officer.
similaritySearch(tenantId, vector, filters, k) — for RAG.

4. Sagas / Policies

GDPR Erasure Saga: delete embeddings + audit entries for user (audit retained for legal hold if applicable per tenant policy).
Prompt Rollout Saga (M4+): new prompt version → eval pass → canary % → full.

5. Policies

Budget hard-cap: refuse request with ai.refused.budget.
Safety block: refuse with ai.refused.safety; log AuditEntry.
Model selection: prefer local if eligible + meets quality; else cloud.
Cache: same (promptHash + modelId + tenantId) returns cached result (respect maxAge per prompt).
Per-prompt rate limit (e.g., tutor: 60/min/user).

6. Use Case Flows

6.1 Completion

Caller: client.complete({ promptId, input, context })
 │
 ▼
1. Resolve prompt version (tenant pin or active).
2. Validate input against inputSchema.
3. Check budget (soft alert at 80%, hard-cap at 100%).
4. Input safety: moderation + PII redaction + injection shield.
   - Block if configured; emit refused event.
5. Cache lookup (promptHash + modelId + tenantId + input fingerprint).
   - Hit → return cached.
6. Model routing: iterate preferences; skip if locality/residency mismatch or BAA required but missing.
7. Call provider with baggage-stripped context.
8. Output safety: moderation + schema validation.
   - Block if violates policy.
9. Store completion record + cost debit.
10. Cache output (if cacheable).
11. Emit ai.gateway.call.completed.v1.
12. Return (output, provenance).

6.2 Embedding

Caller: client.embed({ sourceRef, content, modelPreference })
 │
 ▼
Safety: basic classification.
Budget check.
Model select (per-tenant default embedding model).
Generate vector.
Store in embeddings table (tenant partition).
Emit ai.embedding.created.v1.
Return (vector, id, provenance).

6.3 RAG Similarity Search

Caller: client.knn({ vector, sourceKind, filters, k })
 │
 ▼
1. Tenant filter mandatory.
2. pgvector HNSW query within tenant partition.
3. Return matches with (score, sourceRef).

6.4 Prompt Publishing

Admin: publishPromptVersion(promptId, newVersion)
 │
 ▼
1. Validate template syntax + schemas.
2. Run eval set (ai-gateway owns eval harness).
3. If pass: mark 'active'; deprecate prior active.
4. Emit ai.prompt.version_published.v1.
5. Optional: canary rollout (M4+).

6.5 Budget Exhaustion

Incoming request → budget check → exceeded
 │
 ▼
Refuse with ai.refused.budget.
Emit ai.budget.exhausted.v1.
Notify tenant admin.
On reset (daily/monthly boundary), clear + resume.

7. Structured completions — `authoring.coauthor.*` (EPIC-AIT-001)

Authoring calls POST /api/v1/ai/structured with input.kind set to one of the co-author kinds below. Local / test environments use StubStructuredLlmAdapter, which returns deterministic JSON matching the caller’s outputSchema (no live LLM). Production uses the provider pipeline + schema validation.

`input.kind`	Purpose
`authoring.coauthor.lesson_blocks`	Intent → markdown blocks
`authoring.coauthor.pdf_sections`	PDF ingest stub → section markdown
`authoring.coauthor.translate`	Locale variants
`authoring.coauthor.rewrite`	Tone / audience rewrite
`authoring.coauthor.metadata`	Title, description, objectives
`authoring.coauthor.narrate`	Transcript + placeholder asset id
`authoring.coauthor.image`	Alt text + placeholder asset id
`authoring.coauthor.quiz_bank`	Quiz proposal (assessment persistence separate)
`authoring.coauthor.scenario`	Branching scenario graph proposal

Note: Adding a new kind requires updating the stub adapter (or real router), authoring DraftAiCoauthorService, and (when applicable) EVENT_SCHEMAS / API contracts in the documentation repo.

1. Application Services​

2. Commands​

3. Queries​

4. Sagas / Policies​

5. Policies​

6. Use Case Flows​

6.1 Completion​

6.2 Embedding​

6.3 RAG Similarity Search​

6.4 Prompt Publishing​

6.5 Budget Exhaustion​

7. Structured completions — authoring.coauthor.* (EPIC-AIT-001)​