ai-orchestrator-service — AI Integration

Companion to: SERVICE_OVERVIEW.md · 08 AI Architecture

This is a meta-integration doc: the AI service describes how it integrates with itself. Everything else flows through this service, so the integration here covers (a) self-evaluation, (b) the dog-fooded "AI tutor for backoffice" capability, (c) how this service uses its own RunInferenceUseCase to power the prompt linter and capability registration assistant, and (d) the canonical capability catalog.

1. Capability catalog (canonical)

Every capability is a row in capabilities and is registered via the admin API. The columns below are mirrored in DOMAIN_MODEL.md and the canonical detailed list is in docs/08-ai-architecture.md. Summary view here:

Capability key	Bounded context	Primary cloud model	Edge model	Fallback chain	HITL	p95 latency	Cost class	Output schema	Eval suite
`pricing.suggest`	Pricing	`gemini-1.5-flash`	—	claude-sonnet → gpt-4o-mini → deterministic	yes if Δ > 5% from BAR	1.5 s	M	`PricingSuggestion`	`EVAL_PRICING_001`
`pricing.demand_forecast`	Pricing	tabular + `gemini-1.5-flash-8b`	—	last-year-naive	no	1 h batch	L	`DemandForecast`	`EVAL_FORECAST_001`
`housekeeping.route_optimize`	Housekeeping	OR-tools + `gemini-1.5-flash` (explanations)	OR-tools (deterministic)	none	no	800 ms	L	`RoutePlan`	`EVAL_HK_001`
`staff.shift_optimize`	Staff	`gemini-1.5-flash`	—	deterministic LP	no	2 s	M	`ShiftPlan`	`EVAL_STAFF_001`
`anomaly.detect`	Multi (IAM, Payment)	`gemini-1.5-flash` + features	logistic-regression on-device	rule-based	no	600 ms	L	`AnomalySignal`	`EVAL_ANOMALY_001`
`upsell.recommend`	Reservation	`gemini-1.5-flash`	—	top-N popular per segment	yes (auto-send only after 3 successful auto-sends are reviewed)	900 ms	M	`UpsellList`	`EVAL_UPSELL_001`
`message.draft`	Notification	`gemini-1.5-flash`	`phi-3-mini-4k-instruct` (int4)	claude-sonnet → template	yes for guest-facing first send per channel/template	2 s online / 4 s edge	M	`MessageDraft`	`EVAL_MSG_001`
`review.summarize`	Reservation/CRM	`gemini-1.5-flash`	—	extractive heuristic	no	3 s	M	`ReviewSummary`	`EVAL_REVIEW_001`
`vision.id_ocr`	IAM/Reservation	`gemini-1.5-pro-vision`	—	tesseract + manual	yes for low-confidence	3 s	H	`IdDocFields`	`EVAL_OCR_001`
`audio.transcribe`	Notification/Calls	`chirp-2` (Vertex Speech)	—	whisper-tiny on edge / queue for cloud	no	1× real-time	M	`Transcript`	`EVAL_ASR_001`
`content.description`	Property	`gemini-1.5-flash`	—	manual	yes (publish gate)	4 s	M	`RoomDescription`	`EVAL_DESC_001`
`content.translate`	Property/Notification	`gemini-1.5-flash`	—	DeepL → manual	yes (publish gate)	2 s	M	`Translation`	`EVAL_TRANS_001`
`tutor.answer`	Backoffice (any)	`gemini-1.5-flash` + RAG	`phi-3-mini-4k-instruct` + edge RAG	extractive snippet	no	3 s	M	`TutorAnswer`	`EVAL_TUTOR_001`
`internal.prompt_lint`	AI orchestrator (self)	`gemini-1.5-flash`	—	static linter	no	2 s	L	`PromptLintReport`	`EVAL_PROMPT_LINT_001`
`internal.capability_register_assist`	AI orchestrator (self)	`gemini-1.5-pro`	—	none	yes (always — admin reviews)	5 s	M	`CapabilityDraft`	`EVAL_CAP_REG_001`

2. Self-integration: the prompt linter

When an admin posts a new prompt version (POST /api/v1/ai/prompts/:id/versions), the controller invokes the internal.prompt_lint capability against the candidate's system + few-shot + output schema. The linter's output schema (PromptLintReport) drives the response:

interface PromptLintReport {
  severity: 'pass' | 'warn' | 'block';
  findings: Array<{
    code: 'INSTRUCTION_DRIFT' | 'PII_LEAK' | 'INJECTION_VECTOR' | 'OUTPUT_SCHEMA_INCONSISTENT' | 'TONE_DRIFT' | 'COST_INEFFICIENT' | 'AMBIGUOUS_INSTRUCTION';
    message: string;
    span?: { startLine: number; endLine: number };
    suggestedFix?: string;
  }>;
  approxTokens: { system: number; fewshot: number };
}

severity = 'block' returns 422 MELMASTOON.AI.PROMPT_LINT_FAILED and refuses to persist the draft. This is the only place where the service AI-evaluates AI configuration.

3. Self-integration: capability registration assist

A new capability requires authoring 14 fields. The admin UI calls internal.capability_register_assist with a free-text description; the model returns a CapabilityDraft (a partial Capability object with proposed prompt skeleton, fallback chain, HITL config, eval suite seeds, latency target, cost class). HITL is always open here: an AI engineer must approve the draft before it becomes a row in capabilities. The HITL gate carries policyKey: 'capability.register' and a 7-day timeout (auto-reject, no defaults).

4. Eval harness: the AI service evaluating itself

The eval harness is described in detail in TESTING_STRATEGY.md. Key self-integration points:

Eval runs are themselves inference jobs routed through RunInferenceUseCase with purpose='eval' so they hit the same provider adapters, the same caching path (eval bypasses cache via header), and the same provenance pipeline.
A nightly job runs every active prompt version's eval suite against the shadow cloud model (the next-generation candidate) and emits melmastoon.ai_orchestrator.eval.run_completed.v1 with a delta report.
A "promote candidate" workflow is HITL-gated and only fires when the candidate beats the active by ≥ the suite's delta_promote_threshold.

5. AI tutor for backoffice (the dog-food capability)

The tutor capability (tutor.answer) demonstrates the entire stack end-to-end:

The user types a question in the backoffice help drawer (Electron renderer process).
The renderer calls POST /api/v1/ai/complete with capabilityKey: 'tutor.answer' and inputs: { question, context: { route, recentClicks }}.
The service performs RAG against the tenant's tutor namespace + the platform's kb_global namespace (read-only, owned by docs CI).
The response includes provenance + RAG citations.
If the desktop is offline, the renderer falls back to local phi-3-mini-4k-instruct against the bundled edge RAG (a curated subset of the tutor namespace).
The audit row is pushed to the cloud on next sync (see SYNC_CONTRACT.md).

6. Prompts are versioned artifacts

A prompt is a first-class aggregate. Every published version has:

system (the system message),
userTemplate (Mustache-style; placeholders match the inputSchema),
inputSchema and outputSchemaJson (JSON Schema draft 2020-12),
safetyHints (e.g. "do not reveal tenant id"),
fewShot (array of { input, output }),
evalSuiteId,
metadata.cost_class, metadata.tone, metadata.locale_targets.

Promotion is two-phase: draft → candidate (via lint + author review) → active (via eval pass + admin approval). Roll-forward only — no in-place mutation of active.

7. PromptVersion → Capability binding

Capabilities reference prompts by promptId + a routing rule. The active version is resolved at request time via:

const promptVersion = await prompts.resolveActive(capability.promptId, tenantId, abAssignmentFor(tenantId, capability.id));

This indirection is what allows A/B testing without changing capability definitions.

8. Provenance is mandatory

Every persisted AI artifact in the platform must carry an AIProvenance value object. The service writes the row in provenances and emits the inference event with the same payload. Persistence in any service that omits provenance is a MELMASTOON.AI.PROVENANCE_MISSING failure raised by the persistence-layer guard. See docs/08-ai-architecture.md §6.

1. Capability catalog (canonical)​

2. Self-integration: the prompt linter​

3. Self-integration: capability registration assist​

4. Eval harness: the AI service evaluating itself​

5. AI tutor for backoffice (the dog-food capability)​

6. Prompts are versioned artifacts​

7. PromptVersion → Capability binding​

8. Provenance is mandatory​