Skip to main content

API Contracts

:::info Source Sourced from services/ai-gateway-service/API_CONTRACTS.md in the documentation repo. :::

1. Surfaces

  • Internal REST (/api/v1/ai/…) — via AIClient SDK.
  • SSE streaming for chat + tutor.
  • Admin surfaces for prompts, budgets, audit.

2. AIClient Port (F09 — frozen contract)

TypeScript port; implementations per service import @ghasi/ai-client.

interface AIClient {
complete(req: CompletionRequest): Promise<CompletionResponse>;
completeStream(req: CompletionRequest): AsyncIterable<CompletionChunk>;
embed(req: EmbedRequest): Promise<EmbedResponse>;
moderate(text: string, policy?: SafetyPolicy): Promise<SafetyVerdict>;
generateImage(req: ImageRequest): Promise<ImageResponse>;
generateAudio(req: TTSRequest): Promise<AudioResponse>;
transcribe(req: STTRequest): Promise<TranscriptResponse>;
classify(req: ClassifyRequest): Promise<ClassifyResponse>;
knn(req: KNNRequest): Promise<KNNResponse>;
}

Port is additive: adding new methods does not break existing callers.

3. Endpoints (REST)

3.1 Completion / Chat

POST /api/v1/ai/complete body: CompletionRequest
GET /api/v1/ai/stream/{requestId} (SSE)

3.2 Embeddings / RAG

POST /api/v1/ai/embed body: { sourceRef, content, modelId? }
POST /api/v1/ai/knn body: { vector, k, filters }

3.3 Moderation

POST /api/v1/ai/moderate body: { text, policy? }

3.4 Media AI

POST /api/v1/ai/images body: { prompt, size, style? }
POST /api/v1/ai/audio/tts body: { text, voice, language }
POST /api/v1/ai/audio/stt body: { assetId, language }

3.5 Prompt Registry (admin)

GET /api/v1/ai/prompts ?tenantId=&status=active
POST /api/v1/ai/prompts
GET /api/v1/ai/prompts/{id}/versions
POST /api/v1/ai/prompts/{id}/versions
POST /api/v1/ai/prompts/{id}/versions/{version}/activate
POST /api/v1/ai/prompts/{id}/versions/{version}/deprecate
POST /api/v1/ai/prompts/{id}/eval (runs eval set)

3.6 Models

GET /api/v1/ai/models ?family=&locality=&residency=

3.7 Budgets

GET /api/v1/ai/budgets/{tenantId}
PATCH /api/v1/ai/budgets/{tenantId} (platform admin)

3.8 Audit

GET /api/v1/ai/audit (compliance_officer) ?userId=&promptId=&tenantId=
POST /api/v1/ai/audit/export (GDPR export)

3.9 Structured JSON (gateway internal — EDT-142 / EP-14+)

POST /api/v1/ai/structured body: { promptId?, input, outputSchema, forceCloud? }
Headers: Idempotency-Key (required), X-Tenant-Id, X-User-Id, X-Ai-Local?, X-Ai-Force-Cloud?

When forceCloud is true (body and/or X-Ai-Force-Cloud: true), cloud budget path is used even if X-Ai-Local is set (EP-19 / US-97).

3.10 Offline local inference telemetry replay (EP-19 / US-99)

POST /api/v1/ai/local-inference/telemetry body: { events: [...] } /* max 100 events */
Headers: Idempotency-Key (required), X-Tenant-Id, X-User-Id

Each event is written to the transactional outbox as ai.inference.local.completed.v1 with idempotent row ids per batch.

4. Request / Response

CompletionRequest

{
"promptId": "tutor.rag.respond",
"promptVersion": "2.1.0", // optional; omit for active
"input": { "question": "...", "context": { "blocks": [...] } },
"userId": "u_01H...",
"sessionId": "ps_01H...",
"maxTokens": 400,
"temperature": 0.3,
"stream": true,
"metadata": { "traceparent": "..." }
}

CompletionResponse

{
"data": {
"completionId": "cpl_01H...",
"output": { "answer": "...", "citations": [...] },
"provenance": {
"model": "gpt-4o-mini",
"promptId": "tutor.rag.respond",
"promptVersion": "2.1.0",
"traceId": "00-...-01",
"local": false,
"generatedAt": "...",
"cost": { "microUSD": 2340, "tokens": { "in": 1245, "out": 412 } }
},
"safety": {
"input": { "overallAction": "allow" },
"output": { "overallAction": "allow" }
},
"cacheHit": false,
"latencyMs": 1240
}
}

5. Error Model

  • ai.refused.safety (422) — safety pipeline blocked.
  • ai.refused.budget (429) — budget exhausted.
  • ai.refused.provider (502) — all providers unavailable.
  • ai.refused.policy (403) — caller lacks AI scope or tenant policy denies.
  • ai.prompt.not_found (404).
  • ai.schema.output_invalid (502 — provider returned invalid shape).

6. Streaming (SSE)

Events:

  • token — partial output.
  • tool_use — function call (tool use).
  • safety_verdict — mid-stream safety signal.
  • done — terminal with full provenance.
  • error — terminal error.

7. Rate Limits

  • Per-prompt per-user (e.g., tutor 60/min).
  • Per-tenant overall (plan-based).
  • Burst permitted up to 2x sustained.

8. Security

  • JWT tid + sub used for budget + provenance.
  • ai scope required on most endpoints.
  • Admin endpoints: platform_admin or tenant org_admin.
  • Audit endpoints: compliance_officer only.

9. Idempotency

Completions idempotent on (promptHash + input fingerprint + modelId + tenantId) when cacheable: true.

10. SLOs

  • First-token p95 < 600ms (cloud); < 1s (local).
  • Cache hit p95 < 50ms.
  • Embed p95 < 300ms.
  • Moderate p95 < 100ms.
  • Availability 99.9% (degraded with fallback model acceptable).