Skip to main content

ai-orchestrator-service — API Contracts

Companion to: APPLICATION_LOGIC.md · EVENT_SCHEMAS.md · Standards: 05 API Design · ERROR_CODES · NAMING

REST surface under /api/v1/ai/* plus a small BFF passthrough under /bff/backoffice/v1/ai/*. Service-to-service calls use mTLS; BFF calls use JWT issued by iam-service with claims tenant_id, user_id, roles, surface. All responses use the canonical error envelope from ERROR_CODES.md.

1. Common conventions

ConcernDetail
Base URLhttps://ai.svc.melmastoon.internal (mTLS) and https://api.melmastoon.ghasi.io for BFF passthrough
AuthmTLS (service callers) or Authorization: Bearer <jwt> (BFF)
Tenant scopingX-Tenant-Id: tnt_… required on every call; rejected with MELMASTOON.TENANT.NOT_FOUND if absent or unknown
IdempotencyIdempotency-Key: <ULID> on every POST that mutates or invokes a model; 24 h replay
CorrelationTraceparent (W3C); X-Request-Id: req_<ULID> echoed in every response
Rate limitPer (tenantId, capability) token bucket; surfaced as X-RateLimit-Remaining, X-RateLimit-Reset; 429 with MELMASTOON.GENERAL.RATE_LIMITED
Pagination?cursor=<opaque>&limit=<int ≤ 100>; response carries nextCursor, hasMore
Versioning/api/v1/... is the only stable surface; breaking changes require /api/v2
Content typeapplication/json; charset=utf-8
ProvenanceEvery response carrying an AI artifact includes provenance block; raw model responses are never returned

2. Inference endpoints

2.1 POST /api/v1/ai/complete

Synchronous completion for a capability.

Request:

{
"capability": "pricing.suggest",
"tenantId": "tnt_01H8ZC0X8M0K6F9YV6T7RZWQS5",
"input": {
"propertyId": "ppt_01H8...",
"roomTypeId": "rmt_01H8...",
"date": "2026-05-12",
"occupancyPct": 0.78,
"baselineAmountMicros": 4500000000,
"currency": "USD",
"seasonalSignal": "shoulder"
},
"context": {
"local": false,
"regionPin": "me-central1",
"callerService": "pricing-service",
"callerSurface": "backoffice"
},
"timeoutMs": 4000,
"fallback": "deterministic",
"correlation": { "traceId": "00-...-00", "requestId": "req_01H8..." }
}

Response 200:

{
"capability": "pricing.suggest",
"output": {
"suggestedAmountMicros": 4725000000,
"currency": "USD",
"deviationPctFromBaseline": 0.05,
"rationale": "Occupancy 78% with shoulder-season trend; suggests +5%.",
"confidence": 0.74
},
"cached": false,
"fallbackApplied": false,
"hitl": {
"required": true,
"gateId": "hgt_01H8...",
"slaDeadline": "2026-05-12T03:00:00.000Z"
},
"provenance": {
"id": "prv_p_01H8...",
"promptId": "pmv_01H8...",
"promptCanonicalCode": "PRMP_PRICING_001_v3",
"model": { "provider": "vertex", "name": "gemini-1.5-flash" },
"tokens": { "input": 612, "output": 184 },
"costMicros": 412,
"local": false,
"cacheHit": false,
"safety": { "input": "pass", "output": "pass" },
"occurredAt": "2026-05-12T01:31:09.412Z"
}
}

Errors: see §11. Most common are MELMASTOON.AI.REFUSED_BUDGET, MELMASTOON.AI.REFUSED_SAFETY, MELMASTOON.AI.PROVIDER_UNAVAILABLE, MELMASTOON.AI.OUTPUT_INVALID, MELMASTOON.GENERAL.RATE_LIMITED.

2.2 POST /api/v1/ai/embed

Embedding generation. Single or batch.

Request:

{
"tenantId": "tnt_01H8...",
"capability": "internal.rag_ingest",
"inputs": ["chunk text 1", "chunk text 2"],
"context": { "local": false }
}

Response 200:

{
"embeddings": [
{ "vector": [0.0123, -0.0456, "..."], "tokens": 9, "model": { "provider": "vertex", "name": "text-embedding-004" } }
],
"provenance": { "id": "prv_p_...", "tokens": { "input": 18, "output": 0 }, "costMicros": 4, "local": false, "cacheHit": false }
}

2.3 POST /api/v1/ai/moderate

Standalone moderation pass.

Request: { "tenantId": "...", "input": "string", "axis": ["hate", "sexual", "dangerous", "self_harm", "pii_exposed"] }

Response 200: { "verdict": "pass" | "flag_low" | "flag_high" | "block", "scores": { "hate": 0.01, ... } }

2.4 POST /api/v1/ai/rag/query

RAG retrieval over a tenant corpus.

Request:

{
"tenantId": "tnt_01H8...",
"corpusId": "rag_01H8...",
"query": "What is the cancellation policy for non-refundable rates after the cutoff?",
"topK": 5,
"filter": { "namespace": "policies", "language": "en" }
}

Response 200:

{
"hits": [
{
"chunkId": "01H8...",
"score": 0.832,
"text": "Non-refundable rates ...",
"sourceUri": "gs://melmastoon-tenant-content/.../policies/cancellation.md#L23",
"metadata": { "language": "en", "section": "cancellation" }
}
],
"provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "text-embedding-004" }, "tokens": { "input": 22, "output": 0 }, "costMicros": 5 }
}

2.5 POST /api/v1/ai/vision

Vision capability — photo quality scoring or visual classification.

Request:

{
"tenantId": "tnt_01H8...",
"capability": "vision.photo_quality",
"imageUri": "gs://melmastoon-tenant-media/.../room-12.jpg",
"context": { "local": true }
}

Response 200:

{
"output": {
"score": 0.72,
"issues": ["low_resolution"],
"verdict": "acceptable"
},
"provenance": { "id": "prv_p_...", "model": { "provider": "onnx-edge", "name": "mobilenet-v3-small-image-quality" }, "local": true, "costMicros": 0 }
}

2.6 POST /api/v1/ai/transcribe

Speech-to-text.

Request:

{
"tenantId": "tnt_01H8...",
"audioUri": "gs://melmastoon-tenant-media/.../voice/01H8....opus",
"languageHint": "ps",
"context": { "local": false }
}

Response 200:

{
"output": {
"transcript": "اتاق 204 تمیز شد",
"language": "ps",
"intent": { "action": "housekeeping.mark_clean", "roomNumber": "204" },
"confidence": 0.91
},
"provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "speech-to-text-v2" } }
}

3. Capability catalog

3.1 GET /api/v1/ai/capabilities

List capabilities visible to the caller. Service callers see all active rows; tenant callers see rows enabled for their plan.

Query params: status, domain, cursor, limit.

Response 200:

{
"items": [
{
"key": "pricing.suggest",
"displayName": "Dynamic pricing suggestion",
"status": "active",
"defaultModel": { "provider": "vertex", "name": "gemini-1.5-flash" },
"latencyClass": "low",
"costClass": "medium",
"hitl": { "required": true, "trigger": { "kind": "threshold", "field": "deviationPctFromBaseline", "comparator": "gt", "value": 0.05 }, "slaSeconds": 3600 },
"outputSchemaUri": "https://schemas.melmastoon.ghasi.io/ai/pricing-suggestion.v1.json"
}
],
"nextCursor": null,
"hasMore": false
}

3.2 GET /api/v1/ai/capabilities/:capabilityKey

Detailed view including fallbackChain, evalSuiteId, cacheTtlSeconds, current BudgetCounter snapshot for the caller's tenant.

4. Prompt registry (admin)

Auth: requires JWT scope melmastoon:ai:admin.

4.1 POST /api/v1/ai/prompts

Create a new prompt logical row (first time) or a new draft version.

Request:

{
"domain": "PRICING",
"ordinal": 1,
"displayName": "Dynamic pricing suggestion",
"capabilityKey": "pricing.suggest",
"systemPrompt": "You are a pricing analyst...",
"userTemplate": "Property {{propertyId}} ...",
"outputSchemaJson": { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "...": "..." },
"defaultModel": { "provider": "vertex", "name": "gemini-1.5-flash" },
"evalSuiteId": "eva_01H8...",
"notes": "Tightened deviation rationale section."
}

Response 201:

{
"promptVersionId": "pmv_01H8...",
"canonicalCode": "PRMP_PRICING_001_v4",
"status": "draft"
}

4.2 GET /api/v1/ai/prompts

List prompts with filters: domain, capabilityKey, status.

4.3 GET /api/v1/ai/prompts/:promptVersionId

Retrieve a specific version (immutable).

4.4 POST /api/v1/ai/prompts/:promptVersionId/promote

Promote a draft to active. Requires a green EvalRun reference and ≥7 days of A/B traffic at 5%.

Request: { "evalRunId": "evr_01H8...", "abReportRef": "..." }

Response 200: { "status": "active", "activatedAt": "2026-05-12T01:31:00Z", "deprecatedVersionId": "pmv_01H8..." }

Errors: MELMASTOON.GENERAL.PRECONDITION_FAILED if eval not green or A/B window too short.

4.5 POST /api/v1/ai/prompts/:promptVersionId/deprecate

Mark an active version deprecated.

4.6 POST /api/v1/ai/prompts/:promptVersionId/retire

Move deprecated to retired. Refused if (now - deprecatedAt) < 14 days (MELMASTOON.GENERAL.PRECONDITION_FAILED).

5. Eval harness

5.1 POST /api/v1/ai/eval/runs

Trigger an eval run.

Request:

{
"suiteId": "eva_01H8...",
"promptVersionId": "pmv_01H8...",
"modelRef": { "provider": "vertex", "name": "gemini-1.5-flash" }
}

Response 202: { "runId": "evr_01H8...", "status": "queued" }

5.2 GET /api/v1/ai/eval/runs/:runId

Returns metrics:

{
"runId": "evr_01H8...",
"status": "completed",
"promptVersionId": "pmv_01H8...",
"modelRef": { "provider": "vertex", "name": "gemini-1.5-flash" },
"scores": {
"directionAccuracy": 0.79,
"schemaConformance": 1.0,
"adversarialBlocked": 1.0
},
"comparison": { "baseline": "pmv_01H8...prior", "delta": { "directionAccuracy": +0.04 } },
"verdict": "green",
"completedAt": "2026-05-12T02:14:00Z"
}

5.3 GET /api/v1/ai/eval/suites

List suites; GET /api/v1/ai/eval/suites/:id for detail.

6. HITL gates

6.1 GET /api/v1/ai/hitl/gates

List open gates for the caller's role + tenant.

Query: status, capability, cursor, limit.

Response 200:

{
"items": [
{
"gateId": "hgt_01H8...",
"tenantId": "tnt_01H8...",
"capability": "pricing.suggest",
"artifactRef": { "kind": "pricing-suggestion", "id": "prv_p_01H8..." },
"openedAt": "2026-05-12T03:00:00Z",
"slaDeadline": "2026-05-12T04:00:00Z",
"draftJson": { "suggestedAmountMicros": 4725000000, "deviationPctFromBaseline": 0.05 },
"reviewerRoles": ["gm", "owner"]
}
]
}

6.2 POST /api/v1/ai/hitl/gates/:gateId/decision

Submit decision.

Request:

{
"outcome": "modified",
"modifiedJson": { "suggestedAmountMicros": 4600000000, "deviationPctFromBaseline": 0.022 },
"justification": "Adjusted closer to baseline; competitor moved last hour."
}

Response 200:

{
"decisionId": "dec_01H8...",
"outcome": "modified",
"decidedAt": "2026-05-12T03:14:21Z",
"gateStatus": "decided"
}

Errors: MELMASTOON.IDENTITY.PERMISSION_DENIED if reviewer lacks an allowed role; MELMASTOON.GENERAL.PRECONDITION_FAILED if gate not open.

6.3 GET /api/v1/ai/hitl/gates/:gateId

Detailed view including notification dispatch attempts and remaining SLA.

7. Budget

7.1 GET /api/v1/ai/budget

Per-tenant budget snapshot.

Query: period=2026-05 (default current month), scope=tenant_total|capability:pricing.suggest|feature:pricing.

Response 200:

{
"tenantId": "tnt_01H8...",
"period": "2026-05",
"scopes": [
{
"scope": { "kind": "tenant_total" },
"tokensUsed": 1230000,
"tokensCap": 5000000,
"costMicrosUsed": 9412000,
"costMicrosCap": 50000000,
"softCapPct": 80,
"hardCapPct": 100,
"softCapWarnedAt": null,
"hardCapTrippedAt": null,
"resetsAt": "2026-06-01T00:00:00Z"
},
{
"scope": { "kind": "capability", "capabilityKey": "pricing.suggest" },
"tokensUsed": 240000,
"tokensCap": 1000000,
"costMicrosUsed": 1800000,
"costMicrosCap": 10000000
}
]
}

8. Edge model manifest

8.1 GET /api/v1/ai/edge-model-manifest

Returns the current published manifest. Auth: device-bound JWT only (MELMASTOON.IDENTITY.DEVICE_NOT_BOUND otherwise).

Response 200:

{
"manifestId": "emm_01H8...",
"version": "2.4.1",
"publishedAt": "2026-05-09T18:22:00Z",
"models": [
{
"modelKey": "phi-3-mini-4k-instruct",
"fileName": "phi-3-mini-int4.onnx",
"sha256": "f3e7...e1",
"bytes": 2415412938,
"minRamMb": 2048,
"idleUnloadMinutes": 10,
"capabilities": ["message.draft", "tutor.answer"]
},
{
"modelKey": "all-MiniLM-L6-v2",
"fileName": "minilm-l6-fp16.onnx",
"sha256": "9a1c...77",
"bytes": 96214120,
"minRamMb": 256,
"idleUnloadMinutes": 30,
"capabilities": ["internal.rag_query_edge"]
}
],
"signature": {
"kmsKeyId": "projects/melmastoon-prod/locations/global/keyRings/edge/cryptoKeys/manifest-signer/cryptoKeyVersions/4",
"algorithm": "RSASSA_PSS_SHA_256",
"valueB64": "MIIB..."
}
}

The Electron desktop verifies signature against the KMS public key embedded in the binary at startup. If verification fails, the desktop refuses to load any edge model.

8.2 POST /api/v1/ai/edge-model-manifest

Publish a new manifest. Admin only.

Request:

{
"version": "2.4.2",
"models": [ { "modelKey": "...", "fileName": "...", "sha256": "...", "bytes": 0, "minRamMb": 0, "idleUnloadMinutes": 0, "capabilities": [] } ],
"notes": "Added phi-3-mini retrained on hospitality corpus."
}

Response 201: { "manifestId": "emm_01H8...", "version": "2.4.2", "status": "published", "supersedesId": "emm_01H8...prior" }

9. RAG corpora (admin / tenant authoring)

9.1 POST /api/v1/ai/rag/corpora

Create a corpus for a tenant namespace.

Request: { "tenantId": "...", "namespace": "policies", "chunkStrategy": { "method": "fixed", "targetTokens": 384, "overlap": 64 }, "embeddingModel": { "provider": "vertex", "name": "text-embedding-004" } }

Response 201: { "corpusId": "rag_01H8...", "status": "provisioning" }

9.2 POST /api/v1/ai/rag/corpora/:corpusId/ingest

Ingest documents (URIs or inline text).

Request: { "documents": [{ "uri": "gs://...", "metadata": { "language": "en", "section": "cancellation" } }] }

Response 202: { "jobId": "...", "ingested": 0, "queued": 12 } (asynchronous job; status via GET /api/v1/ai/rag/jobs/:jobId).

9.3 DELETE /api/v1/ai/rag/corpora/:corpusId

Soft-delete (status → archived); embeddings remain queryable for 30 days then physically purged.

10. BFF passthrough

10.1 POST /bff/backoffice/v1/ai/tutor/ask

Tenant member asks the AI tutor.

Request: { "question": "How do I issue a digital key for a walk-in?", "context": { "screenId": "reservations.walkin" } }

Response 200:

{
"answer": "1) Open Reservations → Walk-in...",
"links": [ { "label": "Walk-in flow", "screenId": "reservations.walkin" } ],
"thumbsKey": "tutor_answer_01H8...",
"provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "gemini-1.5-flash-8b" }, "local": false }
}

11. Error catalog (response envelope per ERROR_CODES.md)

CodeHTTPWhen
MELMASTOON.AI.REFUSED_SAFETY422Pre or post moderation block; or schema invalid after one repair attempt
MELMASTOON.AI.REFUSED_BUDGET429Hard cap crossed; deterministic fallback applied (response carries output if available; surfaces error code only when no fallback applies)
MELMASTOON.AI.PROVIDER_UNAVAILABLE502Fallback chain exhausted
MELMASTOON.AI.HITL_REQUIRED403Caller attempted to commit a state change before HITL decision
MELMASTOON.AI.PROVENANCE_MISSING422Defensive — should never reach a sibling service
MELMASTOON.AI.OUTPUT_INVALID502Structured output failed schema after repair
MELMASTOON.GENERAL.RATE_LIMITED429Per-(tenant, capability) rate limit exceeded
MELMASTOON.GENERAL.VALIDATION_FAILED422Request schema invalid
MELMASTOON.GENERAL.PRECONDITION_FAILED412Optimistic-concurrency / lifecycle precondition (e.g., promote without green eval)
MELMASTOON.GENERAL.CROSS_TENANT_REFERENCE422Embedding query / corpus reference cross-tenant
MELMASTOON.GENERAL.RESOURCE_NOT_FOUND404Capability / prompt / corpus unknown for this tenant scope
MELMASTOON.IDENTITY.PERMISSION_DENIED403Reviewer lacks allowed role; admin endpoint without melmastoon:ai:admin scope
MELMASTOON.IDENTITY.DEVICE_NOT_BOUND403GET /edge-model-manifest from non-device JWT
MELMASTOON.TENANT.SUSPENDED403Tenant suspended; only catalog reads allowed
MELMASTOON.TENANT.PLAN_LIMIT_EXCEEDED402Capability not enabled for plan

Sample envelope:

{
"error": {
"type": "https://errors.melmastoon.ghasi.io/ai/refused-budget",
"code": "MELMASTOON.AI.REFUSED_BUDGET",
"title": "AI budget exceeded",
"status": 429,
"detail": "Monthly AI budget for tenant exceeded; deterministic fallback applied.",
"instance": "/api/v1/ai/complete",
"errors": [],
"traceId": "00-...-00",
"requestId": "req_01H8...",
"tenantId": "tnt_01H8...",
"retriable": true,
"retryAfter": 86400,
"userMessageKey": "errors.ai.refused_budget",
"docUrl": "https://docs.melmastoon.ghasi.io/errors/ai/refused-budget",
"runbook": "https://runbooks.melmastoon.ghasi.io/ai/refused-budget"
}
}

12. OpenAPI

openapi.json is generated from controllers via nestjs/swagger. CI gate:

  • Diff against the previous main snapshot.
  • Any breaking change without a /api/v2 bump fails the build (pnpm openapi:diff).
  • Schemas are exported to @ghasi/api-contracts/ai-orchestrator/v1 so consumers compile-time-bind to typed clients.

13. Routing diagram

┌──────────────────────────┐
│ /api/v1/ai/complete │
└─────────────┬────────────┘

┌────────────────┴─────────────────┐
│ pre-call: moderate / redact / │
│ budget reserve / pin prompt / │
│ hash + cache lookup │
└────────────────┬─────────────────┘

cache hit ◀────────┤
│ miss

┌────────────────────────────────────────┐
│ pickProvider(capability, context) │
│ │
│ if context.local && hasEdge: edge │
│ else if regionPin matches: vertex │
│ else: walk capability.fallbackChain │
└────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
Vertex Anthropic OpenAI ── (or) ──▶ ONNX edge passthrough
│ │ │ (returns 200 with provenance.local=true,
▼ ▼ ▼ caller is the desktop main process)
┌────────────────────────────────────────┐
│ post-call: moderate / schema validate /│
│ stamp provenance / commit budget / │
│ open HITL / outbox / cache put │
└────────────────────────────────────────┘


200 response