Skip to main content

ai-orchestrator-service

Bounded Context: AI (Core) · Owner: AI Platform squad · Phase: 0 (gateway + minimal use cases) → 4 (self-tuning) · Storage: Cloud SQL Postgres + pgvector + Memorystore Redis + GCS · Bundle: services/ai-orchestrator-service/ · Canonical AI thesis: 08 AI Architecture

ai-orchestrator-service is the single AI gateway of Ghasi Melmastoon — the multi-tenant hotel SaaS platform whose backoffice is an Electron offline-first desktop app and whose cloud is GCP with Vertex AI as the primary model provider. No other service is allowed to import a model SDK (@google-cloud/vertexai, @anthropic-ai/sdk, openai, onnxruntime-node). Every AI capability — dynamic pricing suggestion, demand forecast, housekeeping route optimization, anomaly detection, upsell, smart guest message draft, review summarization, OCR for ID scan, voice transcription, description generation, translation drafts, AI tutor — funnels through this service via REST or Pub/Sub event request/reply.

The service owns the capability catalog, prompt registry (with semver versioning, A/B rollout, deprecation policy), model catalog (cloud + edge), provider routing (Vertex AI primary, Anthropic + OpenAI fallback adapters, ONNX Runtime Node on Electron for edge), cost & budget control (per-tenant token caps, soft + hard, per-feature quotas), content moderation (pre + post), PII redaction, eval harness (golden sets, A/B promotion gates), RAG over per-tenant pgvector namespaces, provenance metadata generation (AIProvenance stamped on every artifact), HITL gate orchestration (request, capture decision, audit), and the edge model manifest — the signed list of ONNX models packaged with the Electron installer with SHA-256 + signature integrity check at every load.

Purpose

  • Be the only path between Melmastoon and any model provider. CI fails any service whose dependency graph reaches a model SDK outside this service.
  • Enforce uniform provenance, moderation, redaction, budget, cache, and eval so a feature team building "review summarization" cannot ship a regression on those concerns.
  • Provide a typed, discoverable capability catalog (one row per capability with prompt, model, latency target, cost class, fallback chain, HITL gate config, eval suite) so other services and BFFs request inference by capability id, never by model name.
  • Carry the edge AI surface for the Electron desktop: ship signed ONNX models with the installer, expose window.melmastoon.ai.infer(capability, input) via preload, replay edge-inference audit events on next sync.
  • Run the eval harness that gates any prompt or model rollout (draft → 5% A/B → active), and surface drift on the active set.

Key responsibilities

  1. Capability catalog management — versioned registry of every AI capability (pricing.suggest, housekeeping.route, anomaly.detect, upsell.recommend, message.draft, review.summarize, ocr.id_scan, stt.transcribe, description.generate, translation.draft, tutor.answer, etc.) with all attributes pinned at the gateway.
  2. Prompt registry with semver, eval suite reference, ownership, deprecation timeline (activedeprecated ≥14 d → retired).
  3. Model routingpickProvider(capability, context) chooses cloud vs edge, with a per-capability fallback chain executed on provider error or unhealthy circuit.
  4. Per-tenant + per-feature cost control — soft cap (warn at 80%) + hard cap (degrade to deterministic fallback at 100%); per-feature sub-budgets.
  5. RAG — per-tenant pgvector namespaces; HNSW indexes; cross-tenant query path strictly forbidden by RLS + session GUC + assertion.
  6. Provenance — every persisted artifact carries AIProvenance; CI gate refuses persistence in any sibling service that omits it.
  7. HITL gate orchestration — opens a HitlGate row, notifies the right role, captures the HitlDecision (accept/modify/reject) within SLA, audits both the request and the decision.
  8. Content moderation — pre-call on input, post-call on output; blocked outputs return deterministic fallback and emit melmastoon.ai_orchestrator.moderation.flagged.v1.
  9. Embedding generation — batched via Vertex text-embedding-004 (cloud) or all-MiniLM-L6-v2 (edge); written to embeddings_* per-tenant tables.
  10. Eval harness — golden sets per capability; precision/recall + acceptance metrics; A/B promotion gate; drift alerts.
  11. Edge model manifest — signed JSON manifest of ONNX models packaged with the Electron installer; integrity verified on every model load by the desktop main process.
  12. Telemetry — token counts, latency, cost, cache hits per (tenant_id, capability, model) to BigQuery melmastoon_analytics_prod.ai_calls_fact.
  13. Cache — per-tenant prompt+input hash cache in Memorystore; TTL per capability; cache hit returns instantly with cacheHit: true provenance.
  14. GDPR participation — purges per-tenant embeddings + RAG corpora + cached prompt artifacts on melmastoon.tenant.guest.erasure_requested.v1 within 7 days.

Hotel-specific shape

  • Edge inference is critical — target markets (Afghanistan, Tajikistan rural, Iran provinces, Pakistan KPK) routinely lose connectivity. The Electron desktop must keep producing message drafts, anomaly flags, and offline RAG answers when the cloud is unreachable.
  • Phi-3-mini-4k-instruct (INT4, ~2.4 GB) ships with the installer for offline drafting; all-MiniLM-L6-v2 (FP16) for offline embeddings; melmastoon-edge-anomaly-v3.onnx for booking/payment/lock anomaly heuristics; melmastoon-edge-hkt-v2.onnx for housekeeping route optimization; mobilenet-v3-small-image-quality.onnx for property photo upload triage.
  • All edge models are packaged with the installer (no first-launch download — onboarding may be offline), signed by the Melmastoon release key, verified on every load.
  • Cloud-first when available: same capability id, gateway picks Vertex AI Gemini family by default; falls back to edge only when explicitly requested or when cloud is unhealthy.
  • Multilingual by default — Pashto, Dari, Arabic (RTL), English, French (LTR). Translation drafts always HITL.
  • Hard rules — edge inference never sees PCI data; never dispatches guest-facing messages without HITL; never executes destructive actions even when tutor is invoked.

Aggregates owned

AggregateCardinalityPurposeIdentity prefix
Capability1 per capability idDeclarative row binding capability id → prompt template + default model + fallback chain + HITL config + eval suite + latency target + cost classcap_
Prompt1 per (domain, ordinal)Logical prompt; carries pointer to active versionprm_
PromptVersion1..N per PromptImmutable system + user template + output schema; draft → active → deprecated → retiredpmv_
Model1 per registered modelModel catalog row (provider, modality, context, cost class, latency class)mdl_
ModelDeployment1 per active deploymentPer-region deployment + traffic share (used during model rollouts)mdp_
Provider1 per providervertex, anthropic, openai, onnx-edge health + circuit stateprv_
InferenceRequest1 per callCaptured input hash + capability + tenant + caller; PII-redactedifr_
InferenceResult1 per requestCaptured output + provenance reference; PII-redactedifs_
Provenance1 per resultAIProvenance row (model, prompt, tokens, cost, safety verdicts, decision)prv_p_
EvalSuite1 per suiteGolden set + scoring rubric per capabilityeva_
EvalRun1 per scheduled or ad-hoc runSuite + prompt version + model + scoresevr_
RAGCorpus1 per (tenant, namespace)Logical corpus of policies / FAQ / SOPs / amenity catalograg_
Embedding1 per chunk768-dim (cloud) or 384-dim (edge) vector + chunk text(composite)
BudgetCounter1 per (tenant, period, scope)Real-time token + cost burn vs capbdg_
HitlGate1 per gated artifactOpen request to a human; carries SLA timerhgt_
HitlDecision1 per gateaccepted / modified / rejected + justification + reviewerdec_
EdgeModelManifest1 per published manifestSigned JSON of ONNX models packaged with installeremm_

Key APIs (REST, /api/v1/ai/*)

MethodPathPurposeAuth
POST/api/v1/ai/completeSynchronous completion for a capabilityservice-to-service mTLS or BFF JWT
POST/api/v1/ai/embedEmbedding generation (single or batch)service-to-service
POST/api/v1/ai/moderateStandalone moderation passservice-to-service
POST/api/v1/ai/rag/queryRAG retrieval for a tenant corpusservice-to-service
POST/api/v1/ai/visionVision capability (photo quality, OCR)service-to-service
POST/api/v1/ai/transcribeSTT (cloud or edge fallback)service-to-service
GET/api/v1/ai/capabilitiesList capability catalog visible to callerservice or BFF
GET/api/v1/ai/capabilities/:capabilityIdCapability detailservice or BFF
POST/api/v1/ai/promptsCreate new prompt or version (admin)platform admin
POST/api/v1/ai/prompts/:id/promotePromote draft to active after eval greenplatform admin
POST/api/v1/ai/prompts/:id/deprecateMark active row deprecated (≥14 d before retire)platform admin
POST/api/v1/ai/eval/runsTrigger an eval run for a capability + prompt version + modelAI team
GET/api/v1/ai/eval/runs/:runIdEval run resultsAI team
POST/api/v1/ai/hitl/gates/:gateId/decisionSubmit HITL decision (accept/modify/reject)reviewer (RBAC)
GET/api/v1/ai/hitl/gatesList open HITL gates for the caller's role + tenanttenant member
GET/api/v1/ai/budgetPer-tenant + per-feature budget snapshottenant owner / gm
GET/api/v1/ai/edge-model-manifestCurrent signed manifest (consumed by Electron installer + runtime check)desktop device-bound
POST/bff/backoffice/v1/ai/tutor/askAI tutor question (BFF entry)tenant member

Top events published

EventWhen
melmastoon.ai_orchestrator.inference.requested.v1On every accepted call
melmastoon.ai_orchestrator.inference.completed.v1On every successful return
melmastoon.ai_orchestrator.inference.failed.v1On call failure (provider error, schema invalid after repair)
melmastoon.ai_orchestrator.inference.cached_hit.v1On cache hit (cost = 0)
melmastoon.ai_orchestrator.suggestion.dynamic_pricing.v1Pricing suggestion produced
melmastoon.ai_orchestrator.suggestion.demand_forecast.v1Forecast produced
melmastoon.ai_orchestrator.suggestion.housekeeping_routing.v1Route suggested
melmastoon.ai_orchestrator.suggestion.shift_optimization.v1Shift schedule suggested
melmastoon.ai_orchestrator.anomaly.detected.v1Anomaly flagged
melmastoon.ai_orchestrator.upsell.recommended.v1Upsell produced
melmastoon.ai_orchestrator.message.drafted.v1Guest-message draft produced (always HITL)
melmastoon.ai_orchestrator.review.summarized.v1Review summary produced
melmastoon.ai_orchestrator.ocr.completed.v1OCR + structured extraction returned
melmastoon.ai_orchestrator.transcription.completed.v1STT returned
melmastoon.ai_orchestrator.description.drafted.v1Description draft produced
melmastoon.ai_orchestrator.translation.drafted.v1Translation draft produced
melmastoon.ai_orchestrator.hitl.gate_opened.v1HITL gate opened, SLA timer started
melmastoon.ai_orchestrator.hitl.gate_decided.v1Reviewer accepted / modified / rejected
melmastoon.ai_orchestrator.budget.warning.v180% soft cap crossed
melmastoon.ai_orchestrator.budget.exceeded.v1100% hard cap crossed; degraded to deterministic fallback
melmastoon.ai_orchestrator.eval.run_completed.v1Scheduled or ad-hoc eval finished
melmastoon.ai_orchestrator.prompt.version_published.v1Prompt version promoted to active
melmastoon.ai_orchestrator.model.deployment_changed.v1Model traffic shift or deployment activation
melmastoon.ai_orchestrator.edge_model.manifest_updated.v1New signed manifest published for installer
melmastoon.ai_orchestrator.moderation.flagged.v1Pre or post moderation blocked content

Top events consumed

EventTriggers capability
melmastoon.reservation.booking.confirmed.v1upsell.recommend (immediate + pre-arrival), anomaly.detect
melmastoon.reservation.booking.cancelled.v1anomaly.detect (cancellation pattern)
melmastoon.iam.user.login_failed.v1anomaly.detect (credential stuffing)
melmastoon.payment.transaction.failed.v1anomaly.detect (payment fraud signal)
melmastoon.payment.intent.captured.v1anomaly.detect (rapid-fire pattern)
melmastoon.lock_integration.key_credential.issued.v1anomaly.detect (key issuance pattern)
melmastoon.lock_integration.key_credential.not_returned.v1anomaly.detect (key-not-returned)
melmastoon.housekeeping.task.assigned.v1 (batch)housekeeping.route
melmastoon.inventory.allocation.committed.v1 (occupancy ≥ 70%)pricing.suggest
melmastoon.tenant.guest.erasure_requested.v1Purge embeddings + cached artifacts (saga participant)

Storage

  • Cloud SQL Postgres (HA, regional) with the pgvector extension: capability catalog, prompt registry, model catalog, inference + result audit, provenance, HITL gates + decisions, budget counters, eval suites + runs, RAG corpora + chunks + embeddings (HNSW indexes per tenant namespace).
  • Memorystore Redis (HA): prompt+input hash cache (per-tenant keyspace), result cache, hot capability config, in-flight HITL SLA timers, rate limiters.
  • GCS bucket gs://melmastoon-ai-artifacts-<env>: eval datasets (versioned), large prompt fixtures, model artifacts (signed ONNX models for edge installer), RAG source documents pre-chunking.
  • BigQuery melmastoon_analytics_prod.ai_calls_fact: long-tail analytics; per-call fact row (token counts, latency, cost) for cost dashboards and capacity planning.
  • Vertex AI: cloud inference + embeddings + Document AI for OCR + Speech-to-Text; private VPC connectivity; CMEK; region pinned to nearest available (me-central1 preferred for AF/IR; europe-west4 failover).
  • ONNX Runtime Node: only on the Electron desktop main process (@ghasi/app-desktop-backoffice); never inside this service.

Multi-tenancy & residency

  • Every persisted row carries tenant_id + RLS policy <table>_tenant_isolation; session GUC app.tenant_id is set on every connection.
  • Per-tenant residency preference (region_pin) honored by the router; tenants pinned to me-central1 never have their data egressed to us-*.
  • Per-tenant model preference (Plus + Enterprise plans) honored — e.g., "Anthropic-only" tenants never route to OpenAI.

Edge cases & invariants

  • Budget hard-cap exceeded → router selects the deterministic fallback registered for the capability (template fill, rule-based ranker, BAR baseline) and stamps costUsd: 0, model: 'fallback-deterministic'.
  • Provider down → fallback chain executed in order; melmastoon.ai_orchestrator.inference.failed.v1 emitted only after the chain is exhausted.
  • HITL gate timeout → conservative default applied (reject the AI suggestion; decision: 'rejected', reason: 'timeout'); hitl.gate_decided.v1 emitted with auto: true.
  • Edge model integrity fail → ONNX Runtime refuses to load; capability falls back to cloud or deterministic; emits model.deployment_changed.v1 with degradation: 'edge_signature_invalid'.
  • Prompt-injection attempt → input length cap + system prompt isolation + output schema validation; offending output replaced with deterministic fallback; emits moderation.flagged.v1.
  • RAG cross-tenant leak attempt → namespace assertion fails before query reaches pgvector; returns MELMASTOON.GENERAL.CROSS_TENANT_REFERENCE; pages on-call.
  • A/B routingdraft prompt versions get exactly 5% sticky-by-tenant traffic; promotion requires green eval + 7-day production metric review.

Non-functional targets

ConcernTarget
Availability (cloud gateway)99.9% monthly
Latency p95 — gemini-1.5-flash capabilities< 1.5 s end-to-end (gateway overhead + provider)
Latency p95 — edge phi-3-mini on M1/i7 baseline< 4 s
Edge model load (cold)< 1.5 s for MiniLM, < 8 s for Phi-3-mini INT4
Cache hit rate (target)≥ 35% across all capabilities
Provenance completeness100% of persisted artifacts
Eval drift detection latency≤ 24 h on active prompts
GDPR purge SLA7 days from tenant.guest.erasure_requested.v1
Budget enforcement accuracy≤ 1% over hard cap before degradation
Cross-tenant leak0 incidents (CI + RLS + assertion)