ai-orchestrator-service — API Contracts
Companion to:
APPLICATION_LOGIC.md·EVENT_SCHEMAS.md· Standards: 05 API Design · ERROR_CODES · NAMING
REST surface under /api/v1/ai/* plus a small BFF passthrough under /bff/backoffice/v1/ai/*. Service-to-service calls use mTLS; BFF calls use JWT issued by iam-service with claims tenant_id, user_id, roles, surface. All responses use the canonical error envelope from ERROR_CODES.md.
1. Common conventions
| Concern | Detail |
|---|---|
| Base URL | https://ai.svc.melmastoon.internal (mTLS) and https://api.melmastoon.ghasi.io for BFF passthrough |
| Auth | mTLS (service callers) or Authorization: Bearer <jwt> (BFF) |
| Tenant scoping | X-Tenant-Id: tnt_… required on every call; rejected with MELMASTOON.TENANT.NOT_FOUND if absent or unknown |
| Idempotency | Idempotency-Key: <ULID> on every POST that mutates or invokes a model; 24 h replay |
| Correlation | Traceparent (W3C); X-Request-Id: req_<ULID> echoed in every response |
| Rate limit | Per (tenantId, capability) token bucket; surfaced as X-RateLimit-Remaining, X-RateLimit-Reset; 429 with MELMASTOON.GENERAL.RATE_LIMITED |
| Pagination | ?cursor=<opaque>&limit=<int ≤ 100>; response carries nextCursor, hasMore |
| Versioning | /api/v1/... is the only stable surface; breaking changes require /api/v2 |
| Content type | application/json; charset=utf-8 |
| Provenance | Every response carrying an AI artifact includes provenance block; raw model responses are never returned |
2. Inference endpoints
2.1 POST /api/v1/ai/complete
Synchronous completion for a capability.
Request:
{
"capability": "pricing.suggest",
"tenantId": "tnt_01H8ZC0X8M0K6F9YV6T7RZWQS5",
"input": {
"propertyId": "ppt_01H8...",
"roomTypeId": "rmt_01H8...",
"date": "2026-05-12",
"occupancyPct": 0.78,
"baselineAmountMicros": 4500000000,
"currency": "USD",
"seasonalSignal": "shoulder"
},
"context": {
"local": false,
"regionPin": "me-central1",
"callerService": "pricing-service",
"callerSurface": "backoffice"
},
"timeoutMs": 4000,
"fallback": "deterministic",
"correlation": { "traceId": "00-...-00", "requestId": "req_01H8..." }
}
Response 200:
{
"capability": "pricing.suggest",
"output": {
"suggestedAmountMicros": 4725000000,
"currency": "USD",
"deviationPctFromBaseline": 0.05,
"rationale": "Occupancy 78% with shoulder-season trend; suggests +5%.",
"confidence": 0.74
},
"cached": false,
"fallbackApplied": false,
"hitl": {
"required": true,
"gateId": "hgt_01H8...",
"slaDeadline": "2026-05-12T03:00:00.000Z"
},
"provenance": {
"id": "prv_p_01H8...",
"promptId": "pmv_01H8...",
"promptCanonicalCode": "PRMP_PRICING_001_v3",
"model": { "provider": "vertex", "name": "gemini-1.5-flash" },
"tokens": { "input": 612, "output": 184 },
"costMicros": 412,
"local": false,
"cacheHit": false,
"safety": { "input": "pass", "output": "pass" },
"occurredAt": "2026-05-12T01:31:09.412Z"
}
}
Errors: see §11. Most common are MELMASTOON.AI.REFUSED_BUDGET, MELMASTOON.AI.REFUSED_SAFETY, MELMASTOON.AI.PROVIDER_UNAVAILABLE, MELMASTOON.AI.OUTPUT_INVALID, MELMASTOON.GENERAL.RATE_LIMITED.
2.2 POST /api/v1/ai/embed
Embedding generation. Single or batch.
Request:
{
"tenantId": "tnt_01H8...",
"capability": "internal.rag_ingest",
"inputs": ["chunk text 1", "chunk text 2"],
"context": { "local": false }
}
Response 200:
{
"embeddings": [
{ "vector": [0.0123, -0.0456, "..."], "tokens": 9, "model": { "provider": "vertex", "name": "text-embedding-004" } }
],
"provenance": { "id": "prv_p_...", "tokens": { "input": 18, "output": 0 }, "costMicros": 4, "local": false, "cacheHit": false }
}
2.3 POST /api/v1/ai/moderate
Standalone moderation pass.
Request: { "tenantId": "...", "input": "string", "axis": ["hate", "sexual", "dangerous", "self_harm", "pii_exposed"] }
Response 200: { "verdict": "pass" | "flag_low" | "flag_high" | "block", "scores": { "hate": 0.01, ... } }
2.4 POST /api/v1/ai/rag/query
RAG retrieval over a tenant corpus.
Request:
{
"tenantId": "tnt_01H8...",
"corpusId": "rag_01H8...",
"query": "What is the cancellation policy for non-refundable rates after the cutoff?",
"topK": 5,
"filter": { "namespace": "policies", "language": "en" }
}
Response 200:
{
"hits": [
{
"chunkId": "01H8...",
"score": 0.832,
"text": "Non-refundable rates ...",
"sourceUri": "gs://melmastoon-tenant-content/.../policies/cancellation.md#L23",
"metadata": { "language": "en", "section": "cancellation" }
}
],
"provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "text-embedding-004" }, "tokens": { "input": 22, "output": 0 }, "costMicros": 5 }
}
2.5 POST /api/v1/ai/vision
Vision capability — photo quality scoring or visual classification.
Request:
{
"tenantId": "tnt_01H8...",
"capability": "vision.photo_quality",
"imageUri": "gs://melmastoon-tenant-media/.../room-12.jpg",
"context": { "local": true }
}
Response 200:
{
"output": {
"score": 0.72,
"issues": ["low_resolution"],
"verdict": "acceptable"
},
"provenance": { "id": "prv_p_...", "model": { "provider": "onnx-edge", "name": "mobilenet-v3-small-image-quality" }, "local": true, "costMicros": 0 }
}
2.6 POST /api/v1/ai/transcribe
Speech-to-text.
Request:
{
"tenantId": "tnt_01H8...",
"audioUri": "gs://melmastoon-tenant-media/.../voice/01H8....opus",
"languageHint": "ps",
"context": { "local": false }
}
Response 200:
{
"output": {
"transcript": "اتاق 204 تمیز شد",
"language": "ps",
"intent": { "action": "housekeeping.mark_clean", "roomNumber": "204" },
"confidence": 0.91
},
"provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "speech-to-text-v2" } }
}
3. Capability catalog
3.1 GET /api/v1/ai/capabilities
List capabilities visible to the caller. Service callers see all active rows; tenant callers see rows enabled for their plan.
Query params: status, domain, cursor, limit.
Response 200:
{
"items": [
{
"key": "pricing.suggest",
"displayName": "Dynamic pricing suggestion",
"status": "active",
"defaultModel": { "provider": "vertex", "name": "gemini-1.5-flash" },
"latencyClass": "low",
"costClass": "medium",
"hitl": { "required": true, "trigger": { "kind": "threshold", "field": "deviationPctFromBaseline", "comparator": "gt", "value": 0.05 }, "slaSeconds": 3600 },
"outputSchemaUri": "https://schemas.melmastoon.ghasi.io/ai/pricing-suggestion.v1.json"
}
],
"nextCursor": null,
"hasMore": false
}
3.2 GET /api/v1/ai/capabilities/:capabilityKey
Detailed view including fallbackChain, evalSuiteId, cacheTtlSeconds, current BudgetCounter snapshot for the caller's tenant.
4. Prompt registry (admin)
Auth: requires JWT scope melmastoon:ai:admin.
4.1 POST /api/v1/ai/prompts
Create a new prompt logical row (first time) or a new draft version.
Request:
{
"domain": "PRICING",
"ordinal": 1,
"displayName": "Dynamic pricing suggestion",
"capabilityKey": "pricing.suggest",
"systemPrompt": "You are a pricing analyst...",
"userTemplate": "Property {{propertyId}} ...",
"outputSchemaJson": { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "...": "..." },
"defaultModel": { "provider": "vertex", "name": "gemini-1.5-flash" },
"evalSuiteId": "eva_01H8...",
"notes": "Tightened deviation rationale section."
}
Response 201:
{
"promptVersionId": "pmv_01H8...",
"canonicalCode": "PRMP_PRICING_001_v4",
"status": "draft"
}
4.2 GET /api/v1/ai/prompts
List prompts with filters: domain, capabilityKey, status.
4.3 GET /api/v1/ai/prompts/:promptVersionId
Retrieve a specific version (immutable).
4.4 POST /api/v1/ai/prompts/:promptVersionId/promote
Promote a draft to active. Requires a green EvalRun reference and ≥7 days of A/B traffic at 5%.
Request: { "evalRunId": "evr_01H8...", "abReportRef": "..." }
Response 200: { "status": "active", "activatedAt": "2026-05-12T01:31:00Z", "deprecatedVersionId": "pmv_01H8..." }
Errors: MELMASTOON.GENERAL.PRECONDITION_FAILED if eval not green or A/B window too short.
4.5 POST /api/v1/ai/prompts/:promptVersionId/deprecate
Mark an active version deprecated.
4.6 POST /api/v1/ai/prompts/:promptVersionId/retire
Move deprecated to retired. Refused if (now - deprecatedAt) < 14 days (MELMASTOON.GENERAL.PRECONDITION_FAILED).
5. Eval harness
5.1 POST /api/v1/ai/eval/runs
Trigger an eval run.
Request:
{
"suiteId": "eva_01H8...",
"promptVersionId": "pmv_01H8...",
"modelRef": { "provider": "vertex", "name": "gemini-1.5-flash" }
}
Response 202: { "runId": "evr_01H8...", "status": "queued" }
5.2 GET /api/v1/ai/eval/runs/:runId
Returns metrics:
{
"runId": "evr_01H8...",
"status": "completed",
"promptVersionId": "pmv_01H8...",
"modelRef": { "provider": "vertex", "name": "gemini-1.5-flash" },
"scores": {
"directionAccuracy": 0.79,
"schemaConformance": 1.0,
"adversarialBlocked": 1.0
},
"comparison": { "baseline": "pmv_01H8...prior", "delta": { "directionAccuracy": +0.04 } },
"verdict": "green",
"completedAt": "2026-05-12T02:14:00Z"
}
5.3 GET /api/v1/ai/eval/suites
List suites; GET /api/v1/ai/eval/suites/:id for detail.
6. HITL gates
6.1 GET /api/v1/ai/hitl/gates
List open gates for the caller's role + tenant.
Query: status, capability, cursor, limit.
Response 200:
{
"items": [
{
"gateId": "hgt_01H8...",
"tenantId": "tnt_01H8...",
"capability": "pricing.suggest",
"artifactRef": { "kind": "pricing-suggestion", "id": "prv_p_01H8..." },
"openedAt": "2026-05-12T03:00:00Z",
"slaDeadline": "2026-05-12T04:00:00Z",
"draftJson": { "suggestedAmountMicros": 4725000000, "deviationPctFromBaseline": 0.05 },
"reviewerRoles": ["gm", "owner"]
}
]
}
6.2 POST /api/v1/ai/hitl/gates/:gateId/decision
Submit decision.
Request:
{
"outcome": "modified",
"modifiedJson": { "suggestedAmountMicros": 4600000000, "deviationPctFromBaseline": 0.022 },
"justification": "Adjusted closer to baseline; competitor moved last hour."
}
Response 200:
{
"decisionId": "dec_01H8...",
"outcome": "modified",
"decidedAt": "2026-05-12T03:14:21Z",
"gateStatus": "decided"
}
Errors: MELMASTOON.IDENTITY.PERMISSION_DENIED if reviewer lacks an allowed role; MELMASTOON.GENERAL.PRECONDITION_FAILED if gate not open.
6.3 GET /api/v1/ai/hitl/gates/:gateId
Detailed view including notification dispatch attempts and remaining SLA.
7. Budget
7.1 GET /api/v1/ai/budget
Per-tenant budget snapshot.
Query: period=2026-05 (default current month), scope=tenant_total|capability:pricing.suggest|feature:pricing.
Response 200:
{
"tenantId": "tnt_01H8...",
"period": "2026-05",
"scopes": [
{
"scope": { "kind": "tenant_total" },
"tokensUsed": 1230000,
"tokensCap": 5000000,
"costMicrosUsed": 9412000,
"costMicrosCap": 50000000,
"softCapPct": 80,
"hardCapPct": 100,
"softCapWarnedAt": null,
"hardCapTrippedAt": null,
"resetsAt": "2026-06-01T00:00:00Z"
},
{
"scope": { "kind": "capability", "capabilityKey": "pricing.suggest" },
"tokensUsed": 240000,
"tokensCap": 1000000,
"costMicrosUsed": 1800000,
"costMicrosCap": 10000000
}
]
}
8. Edge model manifest
8.1 GET /api/v1/ai/edge-model-manifest
Returns the current published manifest. Auth: device-bound JWT only (MELMASTOON.IDENTITY.DEVICE_NOT_BOUND otherwise).
Response 200:
{
"manifestId": "emm_01H8...",
"version": "2.4.1",
"publishedAt": "2026-05-09T18:22:00Z",
"models": [
{
"modelKey": "phi-3-mini-4k-instruct",
"fileName": "phi-3-mini-int4.onnx",
"sha256": "f3e7...e1",
"bytes": 2415412938,
"minRamMb": 2048,
"idleUnloadMinutes": 10,
"capabilities": ["message.draft", "tutor.answer"]
},
{
"modelKey": "all-MiniLM-L6-v2",
"fileName": "minilm-l6-fp16.onnx",
"sha256": "9a1c...77",
"bytes": 96214120,
"minRamMb": 256,
"idleUnloadMinutes": 30,
"capabilities": ["internal.rag_query_edge"]
}
],
"signature": {
"kmsKeyId": "projects/melmastoon-prod/locations/global/keyRings/edge/cryptoKeys/manifest-signer/cryptoKeyVersions/4",
"algorithm": "RSASSA_PSS_SHA_256",
"valueB64": "MIIB..."
}
}
The Electron desktop verifies signature against the KMS public key embedded in the binary at startup. If verification fails, the desktop refuses to load any edge model.
8.2 POST /api/v1/ai/edge-model-manifest
Publish a new manifest. Admin only.
Request:
{
"version": "2.4.2",
"models": [ { "modelKey": "...", "fileName": "...", "sha256": "...", "bytes": 0, "minRamMb": 0, "idleUnloadMinutes": 0, "capabilities": [] } ],
"notes": "Added phi-3-mini retrained on hospitality corpus."
}
Response 201: { "manifestId": "emm_01H8...", "version": "2.4.2", "status": "published", "supersedesId": "emm_01H8...prior" }
9. RAG corpora (admin / tenant authoring)
9.1 POST /api/v1/ai/rag/corpora
Create a corpus for a tenant namespace.
Request: { "tenantId": "...", "namespace": "policies", "chunkStrategy": { "method": "fixed", "targetTokens": 384, "overlap": 64 }, "embeddingModel": { "provider": "vertex", "name": "text-embedding-004" } }
Response 201: { "corpusId": "rag_01H8...", "status": "provisioning" }
9.2 POST /api/v1/ai/rag/corpora/:corpusId/ingest
Ingest documents (URIs or inline text).
Request: { "documents": [{ "uri": "gs://...", "metadata": { "language": "en", "section": "cancellation" } }] }
Response 202: { "jobId": "...", "ingested": 0, "queued": 12 } (asynchronous job; status via GET /api/v1/ai/rag/jobs/:jobId).
9.3 DELETE /api/v1/ai/rag/corpora/:corpusId
Soft-delete (status → archived); embeddings remain queryable for 30 days then physically purged.
10. BFF passthrough
10.1 POST /bff/backoffice/v1/ai/tutor/ask
Tenant member asks the AI tutor.
Request: { "question": "How do I issue a digital key for a walk-in?", "context": { "screenId": "reservations.walkin" } }
Response 200:
{
"answer": "1) Open Reservations → Walk-in...",
"links": [ { "label": "Walk-in flow", "screenId": "reservations.walkin" } ],
"thumbsKey": "tutor_answer_01H8...",
"provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "gemini-1.5-flash-8b" }, "local": false }
}
11. Error catalog (response envelope per ERROR_CODES.md)
| Code | HTTP | When |
|---|---|---|
MELMASTOON.AI.REFUSED_SAFETY | 422 | Pre or post moderation block; or schema invalid after one repair attempt |
MELMASTOON.AI.REFUSED_BUDGET | 429 | Hard cap crossed; deterministic fallback applied (response carries output if available; surfaces error code only when no fallback applies) |
MELMASTOON.AI.PROVIDER_UNAVAILABLE | 502 | Fallback chain exhausted |
MELMASTOON.AI.HITL_REQUIRED | 403 | Caller attempted to commit a state change before HITL decision |
MELMASTOON.AI.PROVENANCE_MISSING | 422 | Defensive — should never reach a sibling service |
MELMASTOON.AI.OUTPUT_INVALID | 502 | Structured output failed schema after repair |
MELMASTOON.GENERAL.RATE_LIMITED | 429 | Per-(tenant, capability) rate limit exceeded |
MELMASTOON.GENERAL.VALIDATION_FAILED | 422 | Request schema invalid |
MELMASTOON.GENERAL.PRECONDITION_FAILED | 412 | Optimistic-concurrency / lifecycle precondition (e.g., promote without green eval) |
MELMASTOON.GENERAL.CROSS_TENANT_REFERENCE | 422 | Embedding query / corpus reference cross-tenant |
MELMASTOON.GENERAL.RESOURCE_NOT_FOUND | 404 | Capability / prompt / corpus unknown for this tenant scope |
MELMASTOON.IDENTITY.PERMISSION_DENIED | 403 | Reviewer lacks allowed role; admin endpoint without melmastoon:ai:admin scope |
MELMASTOON.IDENTITY.DEVICE_NOT_BOUND | 403 | GET /edge-model-manifest from non-device JWT |
MELMASTOON.TENANT.SUSPENDED | 403 | Tenant suspended; only catalog reads allowed |
MELMASTOON.TENANT.PLAN_LIMIT_EXCEEDED | 402 | Capability not enabled for plan |
Sample envelope:
{
"error": {
"type": "https://errors.melmastoon.ghasi.io/ai/refused-budget",
"code": "MELMASTOON.AI.REFUSED_BUDGET",
"title": "AI budget exceeded",
"status": 429,
"detail": "Monthly AI budget for tenant exceeded; deterministic fallback applied.",
"instance": "/api/v1/ai/complete",
"errors": [],
"traceId": "00-...-00",
"requestId": "req_01H8...",
"tenantId": "tnt_01H8...",
"retriable": true,
"retryAfter": 86400,
"userMessageKey": "errors.ai.refused_budget",
"docUrl": "https://docs.melmastoon.ghasi.io/errors/ai/refused-budget",
"runbook": "https://runbooks.melmastoon.ghasi.io/ai/refused-budget"
}
}
12. OpenAPI
openapi.json is generated from controllers via nestjs/swagger. CI gate:
- Diff against the previous
mainsnapshot. - Any breaking change without a
/api/v2bump fails the build (pnpm openapi:diff). - Schemas are exported to
@ghasi/api-contracts/ai-orchestrator/v1so consumers compile-time-bind to typed clients.
13. Routing diagram
┌──────────────────────────┐
│ /api/v1/ai/complete │
└─────────────┬────────────┘
│
┌────────────────┴─────────────────┐
│ pre-call: moderate / redact / │
│ budget reserve / pin prompt / │
│ hash + cache lookup │
└────────────────┬─────────────────┘
│
cache hit ◀────────┤
│ miss
▼
┌────────────────────────────────────────┐
│ pickProvider(capability, context) │
│ │
│ if context.local && hasEdge: edge │
│ else if regionPin matches: vertex │
│ else: walk capability.fallbackChain │
└────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
Vertex Anthropic OpenAI ── (or) ──▶ ONNX edge passthrough
│ │ │ (returns 200 with provenance.local=true,
▼ ▼ ▼ caller is the desktop main process)
┌────────────────────────────────────────┐
│ post-call: moderate / schema validate /│
│ stamp provenance / commit budget / │
│ open HITL / outbox / cache put │
└────────────────────────────────────────┘
│
▼
200 response