Skip to main content

ai-orchestrator-service — AI Integration

Companion to: SERVICE_OVERVIEW.md · 08 AI Architecture

This is a meta-integration doc: the AI service describes how it integrates with itself. Everything else flows through this service, so the integration here covers (a) self-evaluation, (b) the dog-fooded "AI tutor for backoffice" capability, (c) how this service uses its own RunInferenceUseCase to power the prompt linter and capability registration assistant, and (d) the canonical capability catalog.

1. Capability catalog (canonical)

Every capability is a row in capabilities and is registered via the admin API. The columns below are mirrored in DOMAIN_MODEL.md and the canonical detailed list is in docs/08-ai-architecture.md. Summary view here:

Capability keyBounded contextPrimary cloud modelEdge modelFallback chainHITLp95 latencyCost classOutput schemaEval suite
pricing.suggestPricinggemini-1.5-flashclaude-sonnet → gpt-4o-mini → deterministicyes if Δ > 5% from BAR1.5 sMPricingSuggestionEVAL_PRICING_001
pricing.demand_forecastPricingtabular + gemini-1.5-flash-8blast-year-naiveno1 h batchLDemandForecastEVAL_FORECAST_001
housekeeping.route_optimizeHousekeepingOR-tools + gemini-1.5-flash (explanations)OR-tools (deterministic)noneno800 msLRoutePlanEVAL_HK_001
staff.shift_optimizeStaffgemini-1.5-flashdeterministic LPno2 sMShiftPlanEVAL_STAFF_001
anomaly.detectMulti (IAM, Payment)gemini-1.5-flash + featureslogistic-regression on-devicerule-basedno600 msLAnomalySignalEVAL_ANOMALY_001
upsell.recommendReservationgemini-1.5-flashtop-N popular per segmentyes (auto-send only after 3 successful auto-sends are reviewed)900 msMUpsellListEVAL_UPSELL_001
message.draftNotificationgemini-1.5-flashphi-3-mini-4k-instruct (int4)claude-sonnet → templateyes for guest-facing first send per channel/template2 s online / 4 s edgeMMessageDraftEVAL_MSG_001
review.summarizeReservation/CRMgemini-1.5-flashextractive heuristicno3 sMReviewSummaryEVAL_REVIEW_001
vision.id_ocrIAM/Reservationgemini-1.5-pro-visiontesseract + manualyes for low-confidence3 sHIdDocFieldsEVAL_OCR_001
audio.transcribeNotification/Callschirp-2 (Vertex Speech)whisper-tiny on edge / queue for cloudno1× real-timeMTranscriptEVAL_ASR_001
content.descriptionPropertygemini-1.5-flashmanualyes (publish gate)4 sMRoomDescriptionEVAL_DESC_001
content.translateProperty/Notificationgemini-1.5-flashDeepL → manualyes (publish gate)2 sMTranslationEVAL_TRANS_001
tutor.answerBackoffice (any)gemini-1.5-flash + RAGphi-3-mini-4k-instruct + edge RAGextractive snippetno3 sMTutorAnswerEVAL_TUTOR_001
internal.prompt_lintAI orchestrator (self)gemini-1.5-flashstatic linterno2 sLPromptLintReportEVAL_PROMPT_LINT_001
internal.capability_register_assistAI orchestrator (self)gemini-1.5-prononeyes (always — admin reviews)5 sMCapabilityDraftEVAL_CAP_REG_001

2. Self-integration: the prompt linter

When an admin posts a new prompt version (POST /api/v1/ai/prompts/:id/versions), the controller invokes the internal.prompt_lint capability against the candidate's system + few-shot + output schema. The linter's output schema (PromptLintReport) drives the response:

interface PromptLintReport {
severity: 'pass' | 'warn' | 'block';
findings: Array<{
code: 'INSTRUCTION_DRIFT' | 'PII_LEAK' | 'INJECTION_VECTOR' | 'OUTPUT_SCHEMA_INCONSISTENT' | 'TONE_DRIFT' | 'COST_INEFFICIENT' | 'AMBIGUOUS_INSTRUCTION';
message: string;
span?: { startLine: number; endLine: number };
suggestedFix?: string;
}>;
approxTokens: { system: number; fewshot: number };
}

severity = 'block' returns 422 MELMASTOON.AI.PROMPT_LINT_FAILED and refuses to persist the draft. This is the only place where the service AI-evaluates AI configuration.

3. Self-integration: capability registration assist

A new capability requires authoring 14 fields. The admin UI calls internal.capability_register_assist with a free-text description; the model returns a CapabilityDraft (a partial Capability object with proposed prompt skeleton, fallback chain, HITL config, eval suite seeds, latency target, cost class). HITL is always open here: an AI engineer must approve the draft before it becomes a row in capabilities. The HITL gate carries policyKey: 'capability.register' and a 7-day timeout (auto-reject, no defaults).

4. Eval harness: the AI service evaluating itself

The eval harness is described in detail in TESTING_STRATEGY.md. Key self-integration points:

  • Eval runs are themselves inference jobs routed through RunInferenceUseCase with purpose='eval' so they hit the same provider adapters, the same caching path (eval bypasses cache via header), and the same provenance pipeline.
  • A nightly job runs every active prompt version's eval suite against the shadow cloud model (the next-generation candidate) and emits melmastoon.ai_orchestrator.eval.run_completed.v1 with a delta report.
  • A "promote candidate" workflow is HITL-gated and only fires when the candidate beats the active by ≥ the suite's delta_promote_threshold.

5. AI tutor for backoffice (the dog-food capability)

The tutor capability (tutor.answer) demonstrates the entire stack end-to-end:

  1. The user types a question in the backoffice help drawer (Electron renderer process).
  2. The renderer calls POST /api/v1/ai/complete with capabilityKey: 'tutor.answer' and inputs: { question, context: { route, recentClicks }}.
  3. The service performs RAG against the tenant's tutor namespace + the platform's kb_global namespace (read-only, owned by docs CI).
  4. The response includes provenance + RAG citations.
  5. If the desktop is offline, the renderer falls back to local phi-3-mini-4k-instruct against the bundled edge RAG (a curated subset of the tutor namespace).
  6. The audit row is pushed to the cloud on next sync (see SYNC_CONTRACT.md).

6. Prompts are versioned artifacts

A prompt is a first-class aggregate. Every published version has:

  • system (the system message),
  • userTemplate (Mustache-style; placeholders match the inputSchema),
  • inputSchema and outputSchemaJson (JSON Schema draft 2020-12),
  • safetyHints (e.g. "do not reveal tenant id"),
  • fewShot (array of { input, output }),
  • evalSuiteId,
  • metadata.cost_class, metadata.tone, metadata.locale_targets.

Promotion is two-phase: draftcandidate (via lint + author review) → active (via eval pass + admin approval). Roll-forward only — no in-place mutation of active.

7. PromptVersion → Capability binding

Capabilities reference prompts by promptId + a routing rule. The active version is resolved at request time via:

const promptVersion = await prompts.resolveActive(capability.promptId, tenantId, abAssignmentFor(tenantId, capability.id));

This indirection is what allows A/B testing without changing capability definitions.

8. Provenance is mandatory

Every persisted AI artifact in the platform must carry an AIProvenance value object. The service writes the row in provenances and emits the inference event with the same payload. Persistence in any service that omits provenance is a MELMASTOON.AI.PROVENANCE_MISSING failure raised by the persistence-layer guard. See docs/08-ai-architecture.md §6.