API Contracts
:::info Source
Sourced from services/ai-gateway-service/API_CONTRACTS.md in the documentation repo.
:::
1. Surfaces
- Internal REST (
/api/v1/ai/…) — via AIClient SDK. - SSE streaming for chat + tutor.
- Admin surfaces for prompts, budgets, audit.
2. AIClient Port (F09 — frozen contract)
TypeScript port; implementations per service import @ghasi/ai-client.
interface AIClient {
complete(req: CompletionRequest): Promise<CompletionResponse>;
completeStream(req: CompletionRequest): AsyncIterable<CompletionChunk>;
embed(req: EmbedRequest): Promise<EmbedResponse>;
moderate(text: string, policy?: SafetyPolicy): Promise<SafetyVerdict>;
generateImage(req: ImageRequest): Promise<ImageResponse>;
generateAudio(req: TTSRequest): Promise<AudioResponse>;
transcribe(req: STTRequest): Promise<TranscriptResponse>;
classify(req: ClassifyRequest): Promise<ClassifyResponse>;
knn(req: KNNRequest): Promise<KNNResponse>;
}
Port is additive: adding new methods does not break existing callers.
3. Endpoints (REST)
3.1 Completion / Chat
POST /api/v1/ai/complete body: CompletionRequest
GET /api/v1/ai/stream/{requestId} (SSE)
3.2 Embeddings / RAG
POST /api/v1/ai/embed body: { sourceRef, content, modelId? }
POST /api/v1/ai/knn body: { vector, k, filters }
3.3 Moderation
POST /api/v1/ai/moderate body: { text, policy? }
3.4 Media AI
POST /api/v1/ai/images body: { prompt, size, style? }
POST /api/v1/ai/audio/tts body: { text, voice, language }
POST /api/v1/ai/audio/stt body: { assetId, language }
3.5 Prompt Registry (admin)
GET /api/v1/ai/prompts ?tenantId=&status=active
POST /api/v1/ai/prompts
GET /api/v1/ai/prompts/{id}/versions
POST /api/v1/ai/prompts/{id}/versions
POST /api/v1/ai/prompts/{id}/versions/{version}/activate
POST /api/v1/ai/prompts/{id}/versions/{version}/deprecate
POST /api/v1/ai/prompts/{id}/eval (runs eval set)
3.6 Models
GET /api/v1/ai/models ?family=&locality=&residency=
3.7 Budgets
GET /api/v1/ai/budgets/{tenantId}
PATCH /api/v1/ai/budgets/{tenantId} (platform admin)
3.8 Audit
GET /api/v1/ai/audit (compliance_officer) ?userId=&promptId=&tenantId=
POST /api/v1/ai/audit/export (GDPR export)
3.9 Structured JSON (gateway internal — EDT-142 / EP-14+)
POST /api/v1/ai/structured body: { promptId?, input, outputSchema, forceCloud? }
Headers: Idempotency-Key (required), X-Tenant-Id, X-User-Id, X-Ai-Local?, X-Ai-Force-Cloud?
When forceCloud is true (body and/or X-Ai-Force-Cloud: true), cloud budget path is used even if X-Ai-Local is set (EP-19 / US-97).
3.10 Offline local inference telemetry replay (EP-19 / US-99)
POST /api/v1/ai/local-inference/telemetry body: { events: [...] } /* max 100 events */
Headers: Idempotency-Key (required), X-Tenant-Id, X-User-Id
Each event is written to the transactional outbox as ai.inference.local.completed.v1 with idempotent row ids per batch.
4. Request / Response
CompletionRequest
{
"promptId": "tutor.rag.respond",
"promptVersion": "2.1.0", // optional; omit for active
"input": { "question": "...", "context": { "blocks": [...] } },
"userId": "u_01H...",
"sessionId": "ps_01H...",
"maxTokens": 400,
"temperature": 0.3,
"stream": true,
"metadata": { "traceparent": "..." }
}
CompletionResponse
{
"data": {
"completionId": "cpl_01H...",
"output": { "answer": "...", "citations": [...] },
"provenance": {
"model": "gpt-4o-mini",
"promptId": "tutor.rag.respond",
"promptVersion": "2.1.0",
"traceId": "00-...-01",
"local": false,
"generatedAt": "...",
"cost": { "microUSD": 2340, "tokens": { "in": 1245, "out": 412 } }
},
"safety": {
"input": { "overallAction": "allow" },
"output": { "overallAction": "allow" }
},
"cacheHit": false,
"latencyMs": 1240
}
}
5. Error Model
ai.refused.safety(422) — safety pipeline blocked.ai.refused.budget(429) — budget exhausted.ai.refused.provider(502) — all providers unavailable.ai.refused.policy(403) — caller lacks AI scope or tenant policy denies.ai.prompt.not_found(404).ai.schema.output_invalid(502 — provider returned invalid shape).
6. Streaming (SSE)
Events:
token— partial output.tool_use— function call (tool use).safety_verdict— mid-stream safety signal.done— terminal with full provenance.error— terminal error.
7. Rate Limits
- Per-prompt per-user (e.g., tutor 60/min).
- Per-tenant overall (plan-based).
- Burst permitted up to 2x sustained.
8. Security
- JWT
tid+subused for budget + provenance. aiscope required on most endpoints.- Admin endpoints: platform_admin or tenant org_admin.
- Audit endpoints: compliance_officer only.
9. Idempotency
Completions idempotent on (promptHash + input fingerprint + modelId + tenantId) when cacheable: true.
10. SLOs
- First-token p95 < 600ms (cloud); < 1s (local).
- Cache hit p95 < 50ms.
- Embed p95 < 300ms.
- Moderate p95 < 100ms.
- Availability 99.9% (degraded with fallback model acceptable).