ai-orchestrator-service — API Contracts

Companion to: APPLICATION_LOGIC.md · EVENT_SCHEMAS.md · Standards: 05 API Design · ERROR_CODES · NAMING

REST surface under /api/v1/ai/* plus a small BFF passthrough under /bff/backoffice/v1/ai/*. Service-to-service calls use mTLS; BFF calls use JWT issued by iam-service with claims tenant_id, user_id, roles, surface. All responses use the canonical error envelope from ERROR_CODES.md.

1. Common conventions

Concern	Detail
Base URL	`https://ai.svc.melmastoon.internal` (mTLS) and `https://api.melmastoon.ghasi.io` for BFF passthrough
Auth	mTLS (service callers) or `Authorization: Bearer <jwt>` (BFF)
Tenant scoping	`X-Tenant-Id: tnt_…` required on every call; rejected with `MELMASTOON.TENANT.NOT_FOUND` if absent or unknown
Idempotency	`Idempotency-Key: <ULID>` on every POST that mutates or invokes a model; 24 h replay
Correlation	`Traceparent` (W3C); `X-Request-Id: req_<ULID>` echoed in every response
Rate limit	Per `(tenantId, capability)` token bucket; surfaced as `X-RateLimit-Remaining`, `X-RateLimit-Reset`; 429 with `MELMASTOON.GENERAL.RATE_LIMITED`
Pagination	`?cursor=<opaque>&limit=<int ≤ 100>`; response carries `nextCursor`, `hasMore`
Versioning	`/api/v1/...` is the only stable surface; breaking changes require `/api/v2`
Content type	`application/json; charset=utf-8`
Provenance	Every response carrying an AI artifact includes `provenance` block; raw model responses are never returned

2. Inference endpoints

2.1 `POST /api/v1/ai/complete`

Synchronous completion for a capability.

Request:

{
  "capability": "pricing.suggest",
  "tenantId": "tnt_01H8ZC0X8M0K6F9YV6T7RZWQS5",
  "input": {
    "propertyId": "ppt_01H8...",
    "roomTypeId": "rmt_01H8...",
    "date": "2026-05-12",
    "occupancyPct": 0.78,
    "baselineAmountMicros": 4500000000,
    "currency": "USD",
    "seasonalSignal": "shoulder"
  },
  "context": {
    "local": false,
    "regionPin": "me-central1",
    "callerService": "pricing-service",
    "callerSurface": "backoffice"
  },
  "timeoutMs": 4000,
  "fallback": "deterministic",
  "correlation": { "traceId": "00-...-00", "requestId": "req_01H8..." }
}

Response 200:

{
  "capability": "pricing.suggest",
  "output": {
    "suggestedAmountMicros": 4725000000,
    "currency": "USD",
    "deviationPctFromBaseline": 0.05,
    "rationale": "Occupancy 78% with shoulder-season trend; suggests +5%.",
    "confidence": 0.74
  },
  "cached": false,
  "fallbackApplied": false,
  "hitl": {
    "required": true,
    "gateId": "hgt_01H8...",
    "slaDeadline": "2026-05-12T03:00:00.000Z"
  },
  "provenance": {
    "id": "prv_p_01H8...",
    "promptId": "pmv_01H8...",
    "promptCanonicalCode": "PRMP_PRICING_001_v3",
    "model": { "provider": "vertex", "name": "gemini-1.5-flash" },
    "tokens": { "input": 612, "output": 184 },
    "costMicros": 412,
    "local": false,
    "cacheHit": false,
    "safety": { "input": "pass", "output": "pass" },
    "occurredAt": "2026-05-12T01:31:09.412Z"
  }
}

Errors: see §11. Most common are MELMASTOON.AI.REFUSED_BUDGET, MELMASTOON.AI.REFUSED_SAFETY, MELMASTOON.AI.PROVIDER_UNAVAILABLE, MELMASTOON.AI.OUTPUT_INVALID, MELMASTOON.GENERAL.RATE_LIMITED.

2.2 `POST /api/v1/ai/embed`

Embedding generation. Single or batch.

Request:

{
  "tenantId": "tnt_01H8...",
  "capability": "internal.rag_ingest",
  "inputs": ["chunk text 1", "chunk text 2"],
  "context": { "local": false }
}

Response 200:

{
  "embeddings": [
    { "vector": [0.0123, -0.0456, "..."], "tokens": 9, "model": { "provider": "vertex", "name": "text-embedding-004" } }
  ],
  "provenance": { "id": "prv_p_...", "tokens": { "input": 18, "output": 0 }, "costMicros": 4, "local": false, "cacheHit": false }
}

2.3 `POST /api/v1/ai/moderate`

Standalone moderation pass.

Request: { "tenantId": "...", "input": "string", "axis": ["hate", "sexual", "dangerous", "self_harm", "pii_exposed"] }

Response 200: { "verdict": "pass" | "flag_low" | "flag_high" | "block", "scores": { "hate": 0.01, ... } }

2.4 `POST /api/v1/ai/rag/query`

RAG retrieval over a tenant corpus.

Request:

{
  "tenantId": "tnt_01H8...",
  "corpusId": "rag_01H8...",
  "query": "What is the cancellation policy for non-refundable rates after the cutoff?",
  "topK": 5,
  "filter": { "namespace": "policies", "language": "en" }
}

Response 200:

{
  "hits": [
    {
      "chunkId": "01H8...",
      "score": 0.832,
      "text": "Non-refundable rates ...",
      "sourceUri": "gs://melmastoon-tenant-content/.../policies/cancellation.md#L23",
      "metadata": { "language": "en", "section": "cancellation" }
    }
  ],
  "provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "text-embedding-004" }, "tokens": { "input": 22, "output": 0 }, "costMicros": 5 }
}

2.5 `POST /api/v1/ai/vision`

Vision capability — photo quality scoring or visual classification.

Request:

{
  "tenantId": "tnt_01H8...",
  "capability": "vision.photo_quality",
  "imageUri": "gs://melmastoon-tenant-media/.../room-12.jpg",
  "context": { "local": true }
}

Response 200:

{
  "output": {
    "score": 0.72,
    "issues": ["low_resolution"],
    "verdict": "acceptable"
  },
  "provenance": { "id": "prv_p_...", "model": { "provider": "onnx-edge", "name": "mobilenet-v3-small-image-quality" }, "local": true, "costMicros": 0 }
}

2.6 `POST /api/v1/ai/transcribe`

Speech-to-text.

Request:

{
  "tenantId": "tnt_01H8...",
  "audioUri": "gs://melmastoon-tenant-media/.../voice/01H8....opus",
  "languageHint": "ps",
  "context": { "local": false }
}

Response 200:

{
  "output": {
    "transcript": "اتاق 204 تمیز شد",
    "language": "ps",
    "intent": { "action": "housekeeping.mark_clean", "roomNumber": "204" },
    "confidence": 0.91
  },
  "provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "speech-to-text-v2" } }
}

3. Capability catalog

3.1 `GET /api/v1/ai/capabilities`

List capabilities visible to the caller. Service callers see all active rows; tenant callers see rows enabled for their plan.

Query params: status, domain, cursor, limit.

Response 200:

{
  "items": [
    {
      "key": "pricing.suggest",
      "displayName": "Dynamic pricing suggestion",
      "status": "active",
      "defaultModel": { "provider": "vertex", "name": "gemini-1.5-flash" },
      "latencyClass": "low",
      "costClass": "medium",
      "hitl": { "required": true, "trigger": { "kind": "threshold", "field": "deviationPctFromBaseline", "comparator": "gt", "value": 0.05 }, "slaSeconds": 3600 },
      "outputSchemaUri": "https://schemas.melmastoon.ghasi.io/ai/pricing-suggestion.v1.json"
    }
  ],
  "nextCursor": null,
  "hasMore": false
}

3.2 `GET /api/v1/ai/capabilities/:capabilityKey`

Detailed view including fallbackChain, evalSuiteId, cacheTtlSeconds, current BudgetCounter snapshot for the caller's tenant.

4. Prompt registry (admin)

Auth: requires JWT scope melmastoon:ai:admin.

4.1 `POST /api/v1/ai/prompts`

Create a new prompt logical row (first time) or a new draft version.

Request:

{
  "domain": "PRICING",
  "ordinal": 1,
  "displayName": "Dynamic pricing suggestion",
  "capabilityKey": "pricing.suggest",
  "systemPrompt": "You are a pricing analyst...",
  "userTemplate": "Property {{propertyId}} ...",
  "outputSchemaJson": { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "...": "..." },
  "defaultModel": { "provider": "vertex", "name": "gemini-1.5-flash" },
  "evalSuiteId": "eva_01H8...",
  "notes": "Tightened deviation rationale section."
}

Response 201:

{
  "promptVersionId": "pmv_01H8...",
  "canonicalCode": "PRMP_PRICING_001_v4",
  "status": "draft"
}

4.2 `GET /api/v1/ai/prompts`

List prompts with filters: domain, capabilityKey, status.

4.3 `GET /api/v1/ai/prompts/:promptVersionId`

Retrieve a specific version (immutable).

4.4 `POST /api/v1/ai/prompts/:promptVersionId/promote`

Promote a draft to active. Requires a green EvalRun reference and ≥7 days of A/B traffic at 5%.

Request: { "evalRunId": "evr_01H8...", "abReportRef": "..." }

Response 200: { "status": "active", "activatedAt": "2026-05-12T01:31:00Z", "deprecatedVersionId": "pmv_01H8..." }

Errors: MELMASTOON.GENERAL.PRECONDITION_FAILED if eval not green or A/B window too short.

4.5 `POST /api/v1/ai/prompts/:promptVersionId/deprecate`

Mark an active version deprecated.

4.6 `POST /api/v1/ai/prompts/:promptVersionId/retire`

Move deprecated to retired. Refused if (now - deprecatedAt) < 14 days (MELMASTOON.GENERAL.PRECONDITION_FAILED).

5. Eval harness

5.1 `POST /api/v1/ai/eval/runs`

Trigger an eval run.

Request:

{
  "suiteId": "eva_01H8...",
  "promptVersionId": "pmv_01H8...",
  "modelRef": { "provider": "vertex", "name": "gemini-1.5-flash" }
}

Response 202: { "runId": "evr_01H8...", "status": "queued" }

5.2 `GET /api/v1/ai/eval/runs/:runId`

Returns metrics:

{
  "runId": "evr_01H8...",
  "status": "completed",
  "promptVersionId": "pmv_01H8...",
  "modelRef": { "provider": "vertex", "name": "gemini-1.5-flash" },
  "scores": {
    "directionAccuracy": 0.79,
    "schemaConformance": 1.0,
    "adversarialBlocked": 1.0
  },
  "comparison": { "baseline": "pmv_01H8...prior", "delta": { "directionAccuracy": +0.04 } },
  "verdict": "green",
  "completedAt": "2026-05-12T02:14:00Z"
}

5.3 `GET /api/v1/ai/eval/suites`

List suites; GET /api/v1/ai/eval/suites/:id for detail.

6. HITL gates

6.1 `GET /api/v1/ai/hitl/gates`

List open gates for the caller's role + tenant.

Query: status, capability, cursor, limit.

Response 200:

{
  "items": [
    {
      "gateId": "hgt_01H8...",
      "tenantId": "tnt_01H8...",
      "capability": "pricing.suggest",
      "artifactRef": { "kind": "pricing-suggestion", "id": "prv_p_01H8..." },
      "openedAt": "2026-05-12T03:00:00Z",
      "slaDeadline": "2026-05-12T04:00:00Z",
      "draftJson": { "suggestedAmountMicros": 4725000000, "deviationPctFromBaseline": 0.05 },
      "reviewerRoles": ["gm", "owner"]
    }
  ]
}

6.2 `POST /api/v1/ai/hitl/gates/:gateId/decision`

Submit decision.

Request:

{
  "outcome": "modified",
  "modifiedJson": { "suggestedAmountMicros": 4600000000, "deviationPctFromBaseline": 0.022 },
  "justification": "Adjusted closer to baseline; competitor moved last hour."
}

Response 200:

{
  "decisionId": "dec_01H8...",
  "outcome": "modified",
  "decidedAt": "2026-05-12T03:14:21Z",
  "gateStatus": "decided"
}

Errors: MELMASTOON.IDENTITY.PERMISSION_DENIED if reviewer lacks an allowed role; MELMASTOON.GENERAL.PRECONDITION_FAILED if gate not open.

6.3 `GET /api/v1/ai/hitl/gates/:gateId`

Detailed view including notification dispatch attempts and remaining SLA.

7. Budget

7.1 `GET /api/v1/ai/budget`

Per-tenant budget snapshot.

Query: period=2026-05 (default current month), scope=tenant_total|capability:pricing.suggest|feature:pricing.

Response 200:

{
  "tenantId": "tnt_01H8...",
  "period": "2026-05",
  "scopes": [
    {
      "scope": { "kind": "tenant_total" },
      "tokensUsed": 1230000,
      "tokensCap": 5000000,
      "costMicrosUsed": 9412000,
      "costMicrosCap": 50000000,
      "softCapPct": 80,
      "hardCapPct": 100,
      "softCapWarnedAt": null,
      "hardCapTrippedAt": null,
      "resetsAt": "2026-06-01T00:00:00Z"
    },
    {
      "scope": { "kind": "capability", "capabilityKey": "pricing.suggest" },
      "tokensUsed": 240000,
      "tokensCap": 1000000,
      "costMicrosUsed": 1800000,
      "costMicrosCap": 10000000
    }
  ]
}

8. Edge model manifest

8.1 `GET /api/v1/ai/edge-model-manifest`

Returns the current published manifest. Auth: device-bound JWT only (MELMASTOON.IDENTITY.DEVICE_NOT_BOUND otherwise).

Response 200:

{
  "manifestId": "emm_01H8...",
  "version": "2.4.1",
  "publishedAt": "2026-05-09T18:22:00Z",
  "models": [
    {
      "modelKey": "phi-3-mini-4k-instruct",
      "fileName": "phi-3-mini-int4.onnx",
      "sha256": "f3e7...e1",
      "bytes": 2415412938,
      "minRamMb": 2048,
      "idleUnloadMinutes": 10,
      "capabilities": ["message.draft", "tutor.answer"]
    },
    {
      "modelKey": "all-MiniLM-L6-v2",
      "fileName": "minilm-l6-fp16.onnx",
      "sha256": "9a1c...77",
      "bytes": 96214120,
      "minRamMb": 256,
      "idleUnloadMinutes": 30,
      "capabilities": ["internal.rag_query_edge"]
    }
  ],
  "signature": {
    "kmsKeyId": "projects/melmastoon-prod/locations/global/keyRings/edge/cryptoKeys/manifest-signer/cryptoKeyVersions/4",
    "algorithm": "RSASSA_PSS_SHA_256",
    "valueB64": "MIIB..."
  }
}

The Electron desktop verifies signature against the KMS public key embedded in the binary at startup. If verification fails, the desktop refuses to load any edge model.

8.2 `POST /api/v1/ai/edge-model-manifest`

Publish a new manifest. Admin only.

Request:

{
  "version": "2.4.2",
  "models": [ { "modelKey": "...", "fileName": "...", "sha256": "...", "bytes": 0, "minRamMb": 0, "idleUnloadMinutes": 0, "capabilities": [] } ],
  "notes": "Added phi-3-mini retrained on hospitality corpus."
}

Response 201: { "manifestId": "emm_01H8...", "version": "2.4.2", "status": "published", "supersedesId": "emm_01H8...prior" }

9. RAG corpora (admin / tenant authoring)

9.1 `POST /api/v1/ai/rag/corpora`

Create a corpus for a tenant namespace.

Request: { "tenantId": "...", "namespace": "policies", "chunkStrategy": { "method": "fixed", "targetTokens": 384, "overlap": 64 }, "embeddingModel": { "provider": "vertex", "name": "text-embedding-004" } }

Response 201: { "corpusId": "rag_01H8...", "status": "provisioning" }

9.2 `POST /api/v1/ai/rag/corpora/:corpusId/ingest`

Ingest documents (URIs or inline text).

Request: { "documents": [{ "uri": "gs://...", "metadata": { "language": "en", "section": "cancellation" } }] }

Response 202: { "jobId": "...", "ingested": 0, "queued": 12 } (asynchronous job; status via GET /api/v1/ai/rag/jobs/:jobId).

9.3 `DELETE /api/v1/ai/rag/corpora/:corpusId`

Soft-delete (status → archived); embeddings remain queryable for 30 days then physically purged.

10. BFF passthrough

10.1 `POST /bff/backoffice/v1/ai/tutor/ask`

Tenant member asks the AI tutor.

Request: { "question": "How do I issue a digital key for a walk-in?", "context": { "screenId": "reservations.walkin" } }

Response 200:

{
  "answer": "1) Open Reservations → Walk-in...",
  "links": [ { "label": "Walk-in flow", "screenId": "reservations.walkin" } ],
  "thumbsKey": "tutor_answer_01H8...",
  "provenance": { "id": "prv_p_...", "model": { "provider": "vertex", "name": "gemini-1.5-flash-8b" }, "local": false }
}

11. Error catalog (response envelope per ERROR_CODES.md)

Code	HTTP	When
`MELMASTOON.AI.REFUSED_SAFETY`	422	Pre or post moderation block; or schema invalid after one repair attempt
`MELMASTOON.AI.REFUSED_BUDGET`	429	Hard cap crossed; deterministic fallback applied (response carries `output` if available; surfaces error code only when no fallback applies)
`MELMASTOON.AI.PROVIDER_UNAVAILABLE`	502	Fallback chain exhausted
`MELMASTOON.AI.HITL_REQUIRED`	403	Caller attempted to commit a state change before HITL decision
`MELMASTOON.AI.PROVENANCE_MISSING`	422	Defensive — should never reach a sibling service
`MELMASTOON.AI.OUTPUT_INVALID`	502	Structured output failed schema after repair
`MELMASTOON.GENERAL.RATE_LIMITED`	429	Per-`(tenant, capability)` rate limit exceeded
`MELMASTOON.GENERAL.VALIDATION_FAILED`	422	Request schema invalid
`MELMASTOON.GENERAL.PRECONDITION_FAILED`	412	Optimistic-concurrency / lifecycle precondition (e.g., promote without green eval)
`MELMASTOON.GENERAL.CROSS_TENANT_REFERENCE`	422	Embedding query / corpus reference cross-tenant
`MELMASTOON.GENERAL.RESOURCE_NOT_FOUND`	404	Capability / prompt / corpus unknown for this tenant scope
`MELMASTOON.IDENTITY.PERMISSION_DENIED`	403	Reviewer lacks allowed role; admin endpoint without `melmastoon:ai:admin` scope
`MELMASTOON.IDENTITY.DEVICE_NOT_BOUND`	403	`GET /edge-model-manifest` from non-device JWT
`MELMASTOON.TENANT.SUSPENDED`	403	Tenant suspended; only catalog reads allowed
`MELMASTOON.TENANT.PLAN_LIMIT_EXCEEDED`	402	Capability not enabled for plan

Sample envelope:

{
  "error": {
    "type": "https://errors.melmastoon.ghasi.io/ai/refused-budget",
    "code": "MELMASTOON.AI.REFUSED_BUDGET",
    "title": "AI budget exceeded",
    "status": 429,
    "detail": "Monthly AI budget for tenant exceeded; deterministic fallback applied.",
    "instance": "/api/v1/ai/complete",
    "errors": [],
    "traceId": "00-...-00",
    "requestId": "req_01H8...",
    "tenantId": "tnt_01H8...",
    "retriable": true,
    "retryAfter": 86400,
    "userMessageKey": "errors.ai.refused_budget",
    "docUrl": "https://docs.melmastoon.ghasi.io/errors/ai/refused-budget",
    "runbook": "https://runbooks.melmastoon.ghasi.io/ai/refused-budget"
  }
}

12. OpenAPI

openapi.json is generated from controllers via nestjs/swagger. CI gate:

Diff against the previous main snapshot.
Any breaking change without a /api/v2 bump fails the build (pnpm openapi:diff).
Schemas are exported to @ghasi/api-contracts/ai-orchestrator/v1 so consumers compile-time-bind to typed clients.

13. Routing diagram

                   ┌──────────────────────────┐
                   │ /api/v1/ai/complete      │
                   └─────────────┬────────────┘
                                 │
                ┌────────────────┴─────────────────┐
                │ pre-call: moderate / redact /    │
                │ budget reserve / pin prompt /    │
                │ hash + cache lookup              │
                └────────────────┬─────────────────┘
                                 │
              cache hit ◀────────┤
                                 │ miss
                                 ▼
            ┌────────────────────────────────────────┐
            │  pickProvider(capability, context)     │
            │                                         │
            │   if context.local && hasEdge: edge    │
            │   else if regionPin matches: vertex     │
            │   else: walk capability.fallbackChain   │
            └────────────────────────────────────────┘
                          │            │            │
                          ▼            ▼            ▼
                       Vertex     Anthropic     OpenAI ── (or) ──▶ ONNX edge passthrough
                          │            │            │                 (returns 200 with provenance.local=true,
                          ▼            ▼            ▼                  caller is the desktop main process)
            ┌────────────────────────────────────────┐
            │ post-call: moderate / schema validate /│
            │ stamp provenance / commit budget /     │
            │ open HITL / outbox / cache put         │
            └────────────────────────────────────────┘
                                 │
                                 ▼
                            200 response

1. Common conventions​

2. Inference endpoints​

2.1 POST /api/v1/ai/complete​

2.2 POST /api/v1/ai/embed​

2.3 POST /api/v1/ai/moderate​

2.4 POST /api/v1/ai/rag/query​

2.5 POST /api/v1/ai/vision​

2.6 POST /api/v1/ai/transcribe​

3. Capability catalog​

3.1 GET /api/v1/ai/capabilities​

3.2 GET /api/v1/ai/capabilities/:capabilityKey​

4. Prompt registry (admin)​

4.1 POST /api/v1/ai/prompts​

4.2 GET /api/v1/ai/prompts​

4.3 GET /api/v1/ai/prompts/:promptVersionId​

4.4 POST /api/v1/ai/prompts/:promptVersionId/promote​

4.5 POST /api/v1/ai/prompts/:promptVersionId/deprecate​

4.6 POST /api/v1/ai/prompts/:promptVersionId/retire​

5. Eval harness​

5.1 POST /api/v1/ai/eval/runs​

5.2 GET /api/v1/ai/eval/runs/:runId​

5.3 GET /api/v1/ai/eval/suites​

6. HITL gates​

6.1 GET /api/v1/ai/hitl/gates​

6.2 POST /api/v1/ai/hitl/gates/:gateId/decision​

6.3 GET /api/v1/ai/hitl/gates/:gateId​

7. Budget​

7.1 GET /api/v1/ai/budget​

8. Edge model manifest​

8.1 GET /api/v1/ai/edge-model-manifest​

8.2 POST /api/v1/ai/edge-model-manifest​

9. RAG corpora (admin / tenant authoring)​

9.1 POST /api/v1/ai/rag/corpora​

9.2 POST /api/v1/ai/rag/corpora/:corpusId/ingest​

9.3 DELETE /api/v1/ai/rag/corpora/:corpusId​

10. BFF passthrough​

10.1 POST /bff/backoffice/v1/ai/tutor/ask​

11. Error catalog (response envelope per ERROR_CODES.md)​

12. OpenAPI​

13. Routing diagram​

1. Common conventions

2. Inference endpoints

2.1 `POST /api/v1/ai/complete`

2.2 `POST /api/v1/ai/embed`

2.3 `POST /api/v1/ai/moderate`

2.4 `POST /api/v1/ai/rag/query`

2.5 `POST /api/v1/ai/vision`

2.6 `POST /api/v1/ai/transcribe`

3. Capability catalog

3.1 `GET /api/v1/ai/capabilities`

3.2 `GET /api/v1/ai/capabilities/:capabilityKey`

4. Prompt registry (admin)

4.1 `POST /api/v1/ai/prompts`

4.2 `GET /api/v1/ai/prompts`

4.3 `GET /api/v1/ai/prompts/:promptVersionId`

4.4 `POST /api/v1/ai/prompts/:promptVersionId/promote`

4.5 `POST /api/v1/ai/prompts/:promptVersionId/deprecate`

4.6 `POST /api/v1/ai/prompts/:promptVersionId/retire`

5. Eval harness

5.1 `POST /api/v1/ai/eval/runs`

5.2 `GET /api/v1/ai/eval/runs/:runId`

5.3 `GET /api/v1/ai/eval/suites`

6. HITL gates

6.1 `GET /api/v1/ai/hitl/gates`

6.2 `POST /api/v1/ai/hitl/gates/:gateId/decision`

6.3 `GET /api/v1/ai/hitl/gates/:gateId`

7. Budget

7.1 `GET /api/v1/ai/budget`

8. Edge model manifest

8.1 `GET /api/v1/ai/edge-model-manifest`

8.2 `POST /api/v1/ai/edge-model-manifest`

9. RAG corpora (admin / tenant authoring)

9.1 `POST /api/v1/ai/rag/corpora`

9.2 `POST /api/v1/ai/rag/corpora/:corpusId/ingest`

9.3 `DELETE /api/v1/ai/rag/corpora/:corpusId`

10. BFF passthrough

10.1 `POST /bff/backoffice/v1/ai/tutor/ask`

11. Error catalog (response envelope per ERROR_CODES.md)

12. OpenAPI

13. Routing diagram