Data Model
:::info Source
Sourced from services/ai-gateway-service/DATA_MODEL.md in the documentation repo.
:::
1. Database
Postgres 16 schema ai_gateway. pgvector for embeddings (HNSW per tenant).
2. Tables
2.1 prompts + prompt_versions
CREATE TABLE ai_gateway.prompts (
id ULID PRIMARY KEY,
tenant_id UUID, -- NULL = platform prompt
name TEXT NOT NULL,
description TEXT,
created_by ULID,
created_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (tenant_id, name)
);
CREATE TABLE ai_gateway.prompt_versions (
id ULID PRIMARY KEY,
prompt_id ULID NOT NULL REFERENCES ai_gateway.prompts(id),
version TEXT NOT NULL, -- SemVer
template TEXT NOT NULL,
input_schema JSONB NOT NULL,
output_schema JSONB,
model_preference JSONB NOT NULL,
safety_policy JSONB NOT NULL,
eval_set_ref TEXT,
eval_score NUMERIC(5,4),
status TEXT NOT NULL CHECK (status IN ('draft','active','deprecated')),
published_by ULID,
published_at TIMESTAMPTZ,
UNIQUE (prompt_id, version)
);
CREATE INDEX prompt_versions_active ON ai_gateway.prompt_versions (prompt_id, status);
2.2 models
CREATE TABLE ai_gateway.models (
id ULID PRIMARY KEY,
family TEXT NOT NULL CHECK (family IN ('chat','embedding','image','tts','stt','moderation','classifier')),
vendor TEXT NOT NULL,
name TEXT NOT NULL,
context_window INT,
cost_per_1k_in NUMERIC(10,6),
cost_per_1k_out NUMERIC(10,6),
capabilities TEXT[] NOT NULL,
status TEXT NOT NULL CHECK (status IN ('available','deprecated')),
locality TEXT NOT NULL CHECK (locality IN ('local','cloud')),
residency TEXT[], -- regions the model is permitted in
baa_signed BOOLEAN DEFAULT false
);
2.3 completions (partitioned monthly)
CREATE TABLE ai_gateway.completions (
id ULID NOT NULL,
tenant_id UUID NOT NULL,
user_id ULID NOT NULL,
prompt_id ULID,
prompt_version TEXT,
model_id ULID NOT NULL,
input_tokens INT NOT NULL,
output_tokens INT NOT NULL,
cost_micro_usd BIGINT NOT NULL,
latency_ms INT NOT NULL,
output JSONB,
safety_input JSONB NOT NULL,
safety_output JSONB NOT NULL,
cache_hit BOOLEAN NOT NULL,
trace_id TEXT NOT NULL,
started_at TIMESTAMPTZ NOT NULL,
finished_at TIMESTAMPTZ NOT NULL,
decision_id TEXT,
PRIMARY KEY (id, started_at)
) PARTITION BY RANGE (started_at);
2.4 embeddings
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE ai_gateway.embeddings (
id ULID PRIMARY KEY,
tenant_id UUID NOT NULL,
model_id ULID NOT NULL,
vector vector(1536) NOT NULL, -- size per model
source_kind TEXT NOT NULL,
source_id TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX emb_hnsw_per_tenant ON ai_gateway.embeddings USING hnsw (vector vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX emb_by_source ON ai_gateway.embeddings (tenant_id, source_kind, source_id);
Note: HNSW index is global but every query adds WHERE tenant_id = ? to enforce isolation. For very large tenants, separate partitioned index per tenant.
2.5 budgets
CREATE TABLE ai_gateway.budgets (
tenant_id UUID NOT NULL,
period TEXT NOT NULL CHECK (period IN ('day','month')),
limit_micro_usd BIGINT NOT NULL,
used_micro_usd BIGINT NOT NULL DEFAULT 0,
reset_at TIMESTAMPTZ NOT NULL,
soft_alert_pct INT DEFAULT 80,
hard_cap BOOLEAN DEFAULT true,
updated_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (tenant_id, period)
);
2.6 audit_log (append-only, 7-year retention)
CREATE TABLE ai_gateway.audit_log (
id ULID PRIMARY KEY,
tenant_id UUID NOT NULL,
actor_type TEXT NOT NULL,
actor_id TEXT NOT NULL, -- raw (restricted access)
completion_id ULID,
event TEXT NOT NULL,
provenance JSONB NOT NULL,
ip_hash TEXT,
at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (at);
2.7 cache
Redis-backed (not Postgres). Key: ai:cache:{tenantId}:{promptHash}:{modelId}:{inputHash}. TTL per prompt (default 24h).
2.8 outbox + inbox
Standard.
3. RLS
All tenant-scoped tables. Audit log restricted via separate role (compliance_officer).
4. Indexing
- Completions:
(tenant_id, started_at DESC),(prompt_id, started_at DESC). - Embeddings: HNSW + source lookup.
- Audit:
(tenant_id, at DESC).
5. Partitioning
- Completions + audit: monthly.
- Retention: 18 months hot, 7 years cold (regulated).
6. Performance
- Completions: ~1k/sec write; 5k/sec read (for cache decisions).
- Embeddings: HNSW search p95 < 10ms on <10M vectors.
- Budget check: atomic UPDATE with RETURNING.