ai-orchestrator-service — AI Integration
Companion to:
SERVICE_OVERVIEW.md· 08 AI Architecture
This is a meta-integration doc: the AI service describes how it integrates with itself. Everything else flows through this service, so the integration here covers (a) self-evaluation, (b) the dog-fooded "AI tutor for backoffice" capability, (c) how this service uses its own RunInferenceUseCase to power the prompt linter and capability registration assistant, and (d) the canonical capability catalog.
1. Capability catalog (canonical)
Every capability is a row in capabilities and is registered via the admin API. The columns below are mirrored in DOMAIN_MODEL.md and the canonical detailed list is in docs/08-ai-architecture.md. Summary view here:
| Capability key | Bounded context | Primary cloud model | Edge model | Fallback chain | HITL | p95 latency | Cost class | Output schema | Eval suite |
|---|---|---|---|---|---|---|---|---|---|
pricing.suggest | Pricing | gemini-1.5-flash | — | claude-sonnet → gpt-4o-mini → deterministic | yes if Δ > 5% from BAR | 1.5 s | M | PricingSuggestion | EVAL_PRICING_001 |
pricing.demand_forecast | Pricing | tabular + gemini-1.5-flash-8b | — | last-year-naive | no | 1 h batch | L | DemandForecast | EVAL_FORECAST_001 |
housekeeping.route_optimize | Housekeeping | OR-tools + gemini-1.5-flash (explanations) | OR-tools (deterministic) | none | no | 800 ms | L | RoutePlan | EVAL_HK_001 |
staff.shift_optimize | Staff | gemini-1.5-flash | — | deterministic LP | no | 2 s | M | ShiftPlan | EVAL_STAFF_001 |
anomaly.detect | Multi (IAM, Payment) | gemini-1.5-flash + features | logistic-regression on-device | rule-based | no | 600 ms | L | AnomalySignal | EVAL_ANOMALY_001 |
upsell.recommend | Reservation | gemini-1.5-flash | — | top-N popular per segment | yes (auto-send only after 3 successful auto-sends are reviewed) | 900 ms | M | UpsellList | EVAL_UPSELL_001 |
message.draft | Notification | gemini-1.5-flash | phi-3-mini-4k-instruct (int4) | claude-sonnet → template | yes for guest-facing first send per channel/template | 2 s online / 4 s edge | M | MessageDraft | EVAL_MSG_001 |
review.summarize | Reservation/CRM | gemini-1.5-flash | — | extractive heuristic | no | 3 s | M | ReviewSummary | EVAL_REVIEW_001 |
vision.id_ocr | IAM/Reservation | gemini-1.5-pro-vision | — | tesseract + manual | yes for low-confidence | 3 s | H | IdDocFields | EVAL_OCR_001 |
audio.transcribe | Notification/Calls | chirp-2 (Vertex Speech) | — | whisper-tiny on edge / queue for cloud | no | 1× real-time | M | Transcript | EVAL_ASR_001 |
content.description | Property | gemini-1.5-flash | — | manual | yes (publish gate) | 4 s | M | RoomDescription | EVAL_DESC_001 |
content.translate | Property/Notification | gemini-1.5-flash | — | DeepL → manual | yes (publish gate) | 2 s | M | Translation | EVAL_TRANS_001 |
tutor.answer | Backoffice (any) | gemini-1.5-flash + RAG | phi-3-mini-4k-instruct + edge RAG | extractive snippet | no | 3 s | M | TutorAnswer | EVAL_TUTOR_001 |
internal.prompt_lint | AI orchestrator (self) | gemini-1.5-flash | — | static linter | no | 2 s | L | PromptLintReport | EVAL_PROMPT_LINT_001 |
internal.capability_register_assist | AI orchestrator (self) | gemini-1.5-pro | — | none | yes (always — admin reviews) | 5 s | M | CapabilityDraft | EVAL_CAP_REG_001 |
2. Self-integration: the prompt linter
When an admin posts a new prompt version (POST /api/v1/ai/prompts/:id/versions), the controller invokes the internal.prompt_lint capability against the candidate's system + few-shot + output schema. The linter's output schema (PromptLintReport) drives the response:
interface PromptLintReport {
severity: 'pass' | 'warn' | 'block';
findings: Array<{
code: 'INSTRUCTION_DRIFT' | 'PII_LEAK' | 'INJECTION_VECTOR' | 'OUTPUT_SCHEMA_INCONSISTENT' | 'TONE_DRIFT' | 'COST_INEFFICIENT' | 'AMBIGUOUS_INSTRUCTION';
message: string;
span?: { startLine: number; endLine: number };
suggestedFix?: string;
}>;
approxTokens: { system: number; fewshot: number };
}
severity = 'block' returns 422 MELMASTOON.AI.PROMPT_LINT_FAILED and refuses to persist the draft. This is the only place where the service AI-evaluates AI configuration.
3. Self-integration: capability registration assist
A new capability requires authoring 14 fields. The admin UI calls internal.capability_register_assist with a free-text description; the model returns a CapabilityDraft (a partial Capability object with proposed prompt skeleton, fallback chain, HITL config, eval suite seeds, latency target, cost class). HITL is always open here: an AI engineer must approve the draft before it becomes a row in capabilities. The HITL gate carries policyKey: 'capability.register' and a 7-day timeout (auto-reject, no defaults).
4. Eval harness: the AI service evaluating itself
The eval harness is described in detail in TESTING_STRATEGY.md. Key self-integration points:
- Eval runs are themselves inference jobs routed through
RunInferenceUseCasewithpurpose='eval'so they hit the same provider adapters, the same caching path (eval bypasses cache via header), and the same provenance pipeline. - A nightly job runs every active prompt version's eval suite against the shadow cloud model (the next-generation candidate) and emits
melmastoon.ai_orchestrator.eval.run_completed.v1with a delta report. - A "promote candidate" workflow is HITL-gated and only fires when the candidate beats the active by ≥ the suite's
delta_promote_threshold.
5. AI tutor for backoffice (the dog-food capability)
The tutor capability (tutor.answer) demonstrates the entire stack end-to-end:
- The user types a question in the backoffice help drawer (Electron renderer process).
- The renderer calls
POST /api/v1/ai/completewithcapabilityKey: 'tutor.answer'andinputs: { question, context: { route, recentClicks }}. - The service performs RAG against the tenant's
tutornamespace + the platform'skb_globalnamespace (read-only, owned by docs CI). - The response includes provenance + RAG citations.
- If the desktop is offline, the renderer falls back to local
phi-3-mini-4k-instructagainst the bundled edge RAG (a curated subset of the tutor namespace). - The audit row is pushed to the cloud on next sync (see
SYNC_CONTRACT.md).
6. Prompts are versioned artifacts
A prompt is a first-class aggregate. Every published version has:
system(the system message),userTemplate(Mustache-style; placeholders match theinputSchema),inputSchemaandoutputSchemaJson(JSON Schema draft 2020-12),safetyHints(e.g. "do not reveal tenant id"),fewShot(array of{ input, output }),evalSuiteId,metadata.cost_class,metadata.tone,metadata.locale_targets.
Promotion is two-phase: draft → candidate (via lint + author review) → active (via eval pass + admin approval). Roll-forward only — no in-place mutation of active.
7. PromptVersion → Capability binding
Capabilities reference prompts by promptId + a routing rule. The active version is resolved at request time via:
const promptVersion = await prompts.resolveActive(capability.promptId, tenantId, abAssignmentFor(tenantId, capability.id));
This indirection is what allows A/B testing without changing capability definitions.
8. Provenance is mandatory
Every persisted AI artifact in the platform must carry an AIProvenance value object. The service writes the row in provenances and emits the inference event with the same payload. Persistence in any service that omits provenance is a MELMASTOON.AI.PROVENANCE_MISSING failure raised by the persistence-layer guard. See docs/08-ai-architecture.md §6.