API Contracts

:::info Source Sourced from services/ai-gateway-service/API_CONTRACTS.md in the documentation repo. :::

1. Surfaces

Internal REST (/api/v1/ai/…) — via AIClient SDK.
SSE streaming for chat + tutor.
Admin surfaces for prompts, budgets, audit.

2. AIClient Port (F09 — frozen contract)

TypeScript port; implementations per service import @ghasi/ai-client.

interface AIClient {
  complete(req: CompletionRequest): Promise<CompletionResponse>;
  completeStream(req: CompletionRequest): AsyncIterable<CompletionChunk>;
  embed(req: EmbedRequest): Promise<EmbedResponse>;
  moderate(text: string, policy?: SafetyPolicy): Promise<SafetyVerdict>;
  generateImage(req: ImageRequest): Promise<ImageResponse>;
  generateAudio(req: TTSRequest): Promise<AudioResponse>;
  transcribe(req: STTRequest): Promise<TranscriptResponse>;
  classify(req: ClassifyRequest): Promise<ClassifyResponse>;
  knn(req: KNNRequest): Promise<KNNResponse>;
}

Port is additive: adding new methods does not break existing callers.

3. Endpoints (REST)

3.1 Completion / Chat

POST   /api/v1/ai/complete                 body: CompletionRequest
GET    /api/v1/ai/stream/{requestId}       (SSE)

3.2 Embeddings / RAG

POST   /api/v1/ai/embed                    body: { sourceRef, content, modelId? }
POST   /api/v1/ai/knn                      body: { vector, k, filters }

3.3 Moderation

POST   /api/v1/ai/moderate                 body: { text, policy? }

3.4 Media AI

POST   /api/v1/ai/images                   body: { prompt, size, style? }
POST   /api/v1/ai/audio/tts                body: { text, voice, language }
POST   /api/v1/ai/audio/stt                body: { assetId, language }

3.5 Prompt Registry (admin)

GET    /api/v1/ai/prompts                  ?tenantId=&status=active
POST   /api/v1/ai/prompts
GET    /api/v1/ai/prompts/{id}/versions
POST   /api/v1/ai/prompts/{id}/versions
POST   /api/v1/ai/prompts/{id}/versions/{version}/activate
POST   /api/v1/ai/prompts/{id}/versions/{version}/deprecate
POST   /api/v1/ai/prompts/{id}/eval        (runs eval set)

3.6 Models

GET    /api/v1/ai/models                   ?family=&locality=&residency=

3.7 Budgets

GET    /api/v1/ai/budgets/{tenantId}
PATCH  /api/v1/ai/budgets/{tenantId}       (platform admin)

3.8 Audit

GET    /api/v1/ai/audit                    (compliance_officer) ?userId=&promptId=&tenantId=
POST   /api/v1/ai/audit/export             (GDPR export)

3.9 Structured JSON (gateway internal — EDT-142 / EP-14+)

POST   /api/v1/ai/structured               body: { promptId?, input, outputSchema, forceCloud? }
Headers: Idempotency-Key (required), X-Tenant-Id, X-User-Id, X-Ai-Local?, X-Ai-Force-Cloud?

When forceCloud is true (body and/or X-Ai-Force-Cloud: true), cloud budget path is used even if X-Ai-Local is set (EP-19 / US-97).

3.10 Offline local inference telemetry replay (EP-19 / US-99)

POST   /api/v1/ai/local-inference/telemetry   body: { events: [...] }   /* max 100 events */
Headers: Idempotency-Key (required), X-Tenant-Id, X-User-Id

Each event is written to the transactional outbox as ai.inference.local.completed.v1 with idempotent row ids per batch.

4. Request / Response

CompletionRequest

{
  "promptId": "tutor.rag.respond",
  "promptVersion": "2.1.0",           // optional; omit for active
  "input": { "question": "...", "context": { "blocks": [...] } },
  "userId": "u_01H...",
  "sessionId": "ps_01H...",
  "maxTokens": 400,
  "temperature": 0.3,
  "stream": true,
  "metadata": { "traceparent": "..." }
}

CompletionResponse

{
  "data": {
    "completionId": "cpl_01H...",
    "output": { "answer": "...", "citations": [...] },
    "provenance": {
      "model": "gpt-4o-mini",
      "promptId": "tutor.rag.respond",
      "promptVersion": "2.1.0",
      "traceId": "00-...-01",
      "local": false,
      "generatedAt": "...",
      "cost": { "microUSD": 2340, "tokens": { "in": 1245, "out": 412 } }
    },
    "safety": {
      "input": { "overallAction": "allow" },
      "output": { "overallAction": "allow" }
    },
    "cacheHit": false,
    "latencyMs": 1240
  }
}

5. Error Model

ai.refused.safety (422) — safety pipeline blocked.
ai.refused.budget (429) — budget exhausted.
ai.refused.provider (502) — all providers unavailable.
ai.refused.policy (403) — caller lacks AI scope or tenant policy denies.
ai.prompt.not_found (404).
ai.schema.output_invalid (502 — provider returned invalid shape).

6. Streaming (SSE)

Events:

token — partial output.
tool_use — function call (tool use).
safety_verdict — mid-stream safety signal.
done — terminal with full provenance.
error — terminal error.

7. Rate Limits

Per-prompt per-user (e.g., tutor 60/min).
Per-tenant overall (plan-based).
Burst permitted up to 2x sustained.

8. Security

JWT tid + sub used for budget + provenance.
ai scope required on most endpoints.
Admin endpoints: platform_admin or tenant org_admin.
Audit endpoints: compliance_officer only.

9. Idempotency

Completions idempotent on (promptHash + input fingerprint + modelId + tenantId) when cacheable: true.

10. SLOs

First-token p95 < 600ms (cloud); < 1s (local).
Cache hit p95 < 50ms.
Embed p95 < 300ms.
Moderate p95 < 100ms.
Availability 99.9% (degraded with fallback model acceptable).

1. Surfaces​

2. AIClient Port (F09 — frozen contract)​

3. Endpoints (REST)​

3.1 Completion / Chat​

3.2 Embeddings / RAG​

3.3 Moderation​

3.4 Media AI​

3.5 Prompt Registry (admin)​

3.6 Models​

3.7 Budgets​

3.8 Audit​

3.9 Structured JSON (gateway internal — EDT-142 / EP-14+)​

3.10 Offline local inference telemetry replay (EP-19 / US-99)​

4. Request / Response​

CompletionRequest​

CompletionResponse​

5. Error Model​

6. Streaming (SSE)​

7. Rate Limits​

8. Security​

9. Idempotency​

10. SLOs​