Skip to main content

Data Model

:::info Source Sourced from services/ai-gateway-service/DATA_MODEL.md in the documentation repo. :::

1. Database

Postgres 16 schema ai_gateway. pgvector for embeddings (HNSW per tenant).

2. Tables

2.1 prompts + prompt_versions

CREATE TABLE ai_gateway.prompts (
id ULID PRIMARY KEY,
tenant_id UUID, -- NULL = platform prompt
name TEXT NOT NULL,
description TEXT,
created_by ULID,
created_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (tenant_id, name)
);

CREATE TABLE ai_gateway.prompt_versions (
id ULID PRIMARY KEY,
prompt_id ULID NOT NULL REFERENCES ai_gateway.prompts(id),
version TEXT NOT NULL, -- SemVer
template TEXT NOT NULL,
input_schema JSONB NOT NULL,
output_schema JSONB,
model_preference JSONB NOT NULL,
safety_policy JSONB NOT NULL,
eval_set_ref TEXT,
eval_score NUMERIC(5,4),
status TEXT NOT NULL CHECK (status IN ('draft','active','deprecated')),
published_by ULID,
published_at TIMESTAMPTZ,
UNIQUE (prompt_id, version)
);
CREATE INDEX prompt_versions_active ON ai_gateway.prompt_versions (prompt_id, status);

2.2 models

CREATE TABLE ai_gateway.models (
id ULID PRIMARY KEY,
family TEXT NOT NULL CHECK (family IN ('chat','embedding','image','tts','stt','moderation','classifier')),
vendor TEXT NOT NULL,
name TEXT NOT NULL,
context_window INT,
cost_per_1k_in NUMERIC(10,6),
cost_per_1k_out NUMERIC(10,6),
capabilities TEXT[] NOT NULL,
status TEXT NOT NULL CHECK (status IN ('available','deprecated')),
locality TEXT NOT NULL CHECK (locality IN ('local','cloud')),
residency TEXT[], -- regions the model is permitted in
baa_signed BOOLEAN DEFAULT false
);

2.3 completions (partitioned monthly)

CREATE TABLE ai_gateway.completions (
id ULID NOT NULL,
tenant_id UUID NOT NULL,
user_id ULID NOT NULL,
prompt_id ULID,
prompt_version TEXT,
model_id ULID NOT NULL,
input_tokens INT NOT NULL,
output_tokens INT NOT NULL,
cost_micro_usd BIGINT NOT NULL,
latency_ms INT NOT NULL,
output JSONB,
safety_input JSONB NOT NULL,
safety_output JSONB NOT NULL,
cache_hit BOOLEAN NOT NULL,
trace_id TEXT NOT NULL,
started_at TIMESTAMPTZ NOT NULL,
finished_at TIMESTAMPTZ NOT NULL,
decision_id TEXT,
PRIMARY KEY (id, started_at)
) PARTITION BY RANGE (started_at);

2.4 embeddings

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE ai_gateway.embeddings (
id ULID PRIMARY KEY,
tenant_id UUID NOT NULL,
model_id ULID NOT NULL,
vector vector(1536) NOT NULL, -- size per model
source_kind TEXT NOT NULL,
source_id TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX emb_hnsw_per_tenant ON ai_gateway.embeddings USING hnsw (vector vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX emb_by_source ON ai_gateway.embeddings (tenant_id, source_kind, source_id);

Note: HNSW index is global but every query adds WHERE tenant_id = ? to enforce isolation. For very large tenants, separate partitioned index per tenant.

2.5 budgets

CREATE TABLE ai_gateway.budgets (
tenant_id UUID NOT NULL,
period TEXT NOT NULL CHECK (period IN ('day','month')),
limit_micro_usd BIGINT NOT NULL,
used_micro_usd BIGINT NOT NULL DEFAULT 0,
reset_at TIMESTAMPTZ NOT NULL,
soft_alert_pct INT DEFAULT 80,
hard_cap BOOLEAN DEFAULT true,
updated_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (tenant_id, period)
);

2.6 audit_log (append-only, 7-year retention)

CREATE TABLE ai_gateway.audit_log (
id ULID PRIMARY KEY,
tenant_id UUID NOT NULL,
actor_type TEXT NOT NULL,
actor_id TEXT NOT NULL, -- raw (restricted access)
completion_id ULID,
event TEXT NOT NULL,
provenance JSONB NOT NULL,
ip_hash TEXT,
at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (at);

2.7 cache

Redis-backed (not Postgres). Key: ai:cache:{tenantId}:{promptHash}:{modelId}:{inputHash}. TTL per prompt (default 24h).

2.8 outbox + inbox

Standard.

3. RLS

All tenant-scoped tables. Audit log restricted via separate role (compliance_officer).

4. Indexing

  • Completions: (tenant_id, started_at DESC), (prompt_id, started_at DESC).
  • Embeddings: HNSW + source lookup.
  • Audit: (tenant_id, at DESC).

5. Partitioning

  • Completions + audit: monthly.
  • Retention: 18 months hot, 7 years cold (regulated).

6. Performance

  • Completions: ~1k/sec write; 5k/sec read (for cache decisions).
  • Embeddings: HNSW search p95 < 10ms on <10M vectors.
  • Budget check: atomic UPDATE with RETURNING.