Application Logic
:::info Source
Sourced from services/ai-gateway-service/APPLICATION_LOGIC.md in the documentation repo.
:::
1. Application Services
CompletionService— chat completions via AIClient.EmbeddingService— vector embeddings.ModerationService— standalone moderation (for non-completion uses).ImageGenService— image synthesis.AudioGenService— TTS.STTService— speech-to-text.PromptRegistryService— CRUD of prompts, versioning, eval integration.BudgetService— per-tenant budget tracking + alerting.ModelRouterService— selects model per request.SafetyPipelineService— pre + post-call pipeline.CacheService— Redis-backed cache by (promptHash, modelId, tenantId).AuditService— append audit entries + provenance.
2. Commands
| Command | Trigger |
|---|---|
Complete | Caller via AIClient (most common) |
Embed | Caller |
Moderate | Standalone safety check |
GenerateImage / GenerateAudio / Transcribe | Domain callers |
PublishPromptVersion | Admin (after eval pass) |
DeprecatePromptVersion | Admin |
UpdateBudget | Platform admin or billing |
RecordAuditEntry | Internal |
3. Queries
getPrompt(id, version?)— resolve active version.listModels(family?, locality?, residency?).getBudget(tenantId, period).auditQuery(tenantId, filters)— compliance officer.similaritySearch(tenantId, vector, filters, k)— for RAG.
4. Sagas / Policies
- GDPR Erasure Saga: delete embeddings + audit entries for user (audit retained for legal hold if applicable per tenant policy).
- Prompt Rollout Saga (M4+): new prompt version → eval pass → canary % → full.
5. Policies
- Budget hard-cap: refuse request with
ai.refused.budget. - Safety block: refuse with
ai.refused.safety; log AuditEntry. - Model selection: prefer local if eligible + meets quality; else cloud.
- Cache: same (promptHash + modelId + tenantId) returns cached result (respect maxAge per prompt).
- Per-prompt rate limit (e.g., tutor: 60/min/user).
6. Use Case Flows
6.1 Completion
Caller: client.complete({ promptId, input, context })
│
▼
1. Resolve prompt version (tenant pin or active).
2. Validate input against inputSchema.
3. Check budget (soft alert at 80%, hard-cap at 100%).
4. Input safety: moderation + PII redaction + injection shield.
- Block if configured; emit refused event.
5. Cache lookup (promptHash + modelId + tenantId + input fingerprint).
- Hit → return cached.
6. Model routing: iterate preferences; skip if locality/residency mismatch or BAA required but missing.
7. Call provider with baggage-stripped context.
8. Output safety: moderation + schema validation.
- Block if violates policy.
9. Store completion record + cost debit.
10. Cache output (if cacheable).
11. Emit ai.gateway.call.completed.v1.
12. Return (output, provenance).
6.2 Embedding
Caller: client.embed({ sourceRef, content, modelPreference })
│
▼
1. Safety: basic classification.
2. Budget check.
3. Model select (per-tenant default embedding model).
4. Generate vector.
5. Store in embeddings table (tenant partition).
6. Emit ai.embedding.created.v1.
7. Return (vector, id, provenance).
6.3 RAG Similarity Search
Caller: client.knn({ vector, sourceKind, filters, k })
│
▼
1. Tenant filter mandatory.
2. pgvector HNSW query within tenant partition.
3. Return matches with (score, sourceRef).
6.4 Prompt Publishing
Admin: publishPromptVersion(promptId, newVersion)
│
▼
1. Validate template syntax + schemas.
2. Run eval set (ai-gateway owns eval harness).
3. If pass: mark 'active'; deprecate prior active.
4. Emit ai.prompt.version_published.v1.
5. Optional: canary rollout (M4+).
6.5 Budget Exhaustion
Incoming request → budget check → exceeded
│
▼
Refuse with ai.refused.budget.
Emit ai.budget.exhausted.v1.
Notify tenant admin.
On reset (daily/monthly boundary), clear + resume.
7. Structured completions — authoring.coauthor.* (EPIC-AIT-001)
Authoring calls POST /api/v1/ai/structured with input.kind set to one of the co-author kinds below. Local / test environments use StubStructuredLlmAdapter, which returns deterministic JSON matching the caller’s outputSchema (no live LLM). Production uses the provider pipeline + schema validation.
input.kind | Purpose |
|---|---|
authoring.coauthor.lesson_blocks | Intent → markdown blocks |
authoring.coauthor.pdf_sections | PDF ingest stub → section markdown |
authoring.coauthor.translate | Locale variants |
authoring.coauthor.rewrite | Tone / audience rewrite |
authoring.coauthor.metadata | Title, description, objectives |
authoring.coauthor.narrate | Transcript + placeholder asset id |
authoring.coauthor.image | Alt text + placeholder asset id |
authoring.coauthor.quiz_bank | Quiz proposal (assessment persistence separate) |
authoring.coauthor.scenario | Branching scenario graph proposal |
Note: Adding a new kind requires updating the stub adapter (or real router), authoring DraftAiCoauthorService, and (when applicable) EVENT_SCHEMAS / API contracts in the documentation repo.