Data Model

:::info Source Sourced from services/ai-gateway-service/DATA_MODEL.md in the documentation repo. :::

1. Database

Postgres 16 schema ai_gateway. pgvector for embeddings (HNSW per tenant).

2. Tables

2.1 `prompts` + `prompt_versions`

CREATE TABLE ai_gateway.prompts (
  id ULID PRIMARY KEY,
  tenant_id UUID,                      -- NULL = platform prompt
  name TEXT NOT NULL,
  description TEXT,
  created_by ULID,
  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE (tenant_id, name)
);

CREATE TABLE ai_gateway.prompt_versions (
  id ULID PRIMARY KEY,
  prompt_id ULID NOT NULL REFERENCES ai_gateway.prompts(id),
  version TEXT NOT NULL,              -- SemVer
  template TEXT NOT NULL,
  input_schema JSONB NOT NULL,
  output_schema JSONB,
  model_preference JSONB NOT NULL,
  safety_policy JSONB NOT NULL,
  eval_set_ref TEXT,
  eval_score NUMERIC(5,4),
  status TEXT NOT NULL CHECK (status IN ('draft','active','deprecated')),
  published_by ULID,
  published_at TIMESTAMPTZ,
  UNIQUE (prompt_id, version)
);
CREATE INDEX prompt_versions_active ON ai_gateway.prompt_versions (prompt_id, status);

2.2 `models`

CREATE TABLE ai_gateway.models (
  id ULID PRIMARY KEY,
  family TEXT NOT NULL CHECK (family IN ('chat','embedding','image','tts','stt','moderation','classifier')),
  vendor TEXT NOT NULL,
  name TEXT NOT NULL,
  context_window INT,
  cost_per_1k_in NUMERIC(10,6),
  cost_per_1k_out NUMERIC(10,6),
  capabilities TEXT[] NOT NULL,
  status TEXT NOT NULL CHECK (status IN ('available','deprecated')),
  locality TEXT NOT NULL CHECK (locality IN ('local','cloud')),
  residency TEXT[],                    -- regions the model is permitted in
  baa_signed BOOLEAN DEFAULT false
);

2.3 `completions` (partitioned monthly)

CREATE TABLE ai_gateway.completions (
  id ULID NOT NULL,
  tenant_id UUID NOT NULL,
  user_id ULID NOT NULL,
  prompt_id ULID,
  prompt_version TEXT,
  model_id ULID NOT NULL,
  input_tokens INT NOT NULL,
  output_tokens INT NOT NULL,
  cost_micro_usd BIGINT NOT NULL,
  latency_ms INT NOT NULL,
  output JSONB,
  safety_input JSONB NOT NULL,
  safety_output JSONB NOT NULL,
  cache_hit BOOLEAN NOT NULL,
  trace_id TEXT NOT NULL,
  started_at TIMESTAMPTZ NOT NULL,
  finished_at TIMESTAMPTZ NOT NULL,
  decision_id TEXT,
  PRIMARY KEY (id, started_at)
) PARTITION BY RANGE (started_at);

2.4 `embeddings`

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE ai_gateway.embeddings (
  id ULID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  model_id ULID NOT NULL,
  vector vector(1536) NOT NULL,       -- size per model
  source_kind TEXT NOT NULL,
  source_id TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX emb_hnsw_per_tenant ON ai_gateway.embeddings USING hnsw (vector vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);
CREATE INDEX emb_by_source ON ai_gateway.embeddings (tenant_id, source_kind, source_id);

Note: HNSW index is global but every query adds WHERE tenant_id = ? to enforce isolation. For very large tenants, separate partitioned index per tenant.

2.5 `budgets`

CREATE TABLE ai_gateway.budgets (
  tenant_id UUID NOT NULL,
  period TEXT NOT NULL CHECK (period IN ('day','month')),
  limit_micro_usd BIGINT NOT NULL,
  used_micro_usd BIGINT NOT NULL DEFAULT 0,
  reset_at TIMESTAMPTZ NOT NULL,
  soft_alert_pct INT DEFAULT 80,
  hard_cap BOOLEAN DEFAULT true,
  updated_at TIMESTAMPTZ DEFAULT now(),
  PRIMARY KEY (tenant_id, period)
);

2.6 `audit_log` (append-only, 7-year retention)

CREATE TABLE ai_gateway.audit_log (
  id ULID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  actor_type TEXT NOT NULL,
  actor_id TEXT NOT NULL,              -- raw (restricted access)
  completion_id ULID,
  event TEXT NOT NULL,
  provenance JSONB NOT NULL,
  ip_hash TEXT,
  at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (at);

2.7 `cache`

Redis-backed (not Postgres). Key: ai:cache:{tenantId}:{promptHash}:{modelId}:{inputHash}. TTL per prompt (default 24h).

2.8 `outbox` + `inbox`

Standard.

3. RLS

All tenant-scoped tables. Audit log restricted via separate role (compliance_officer).

4. Indexing

Completions: (tenant_id, started_at DESC), (prompt_id, started_at DESC).
Embeddings: HNSW + source lookup.
Audit: (tenant_id, at DESC).

5. Partitioning

Completions + audit: monthly.
Retention: 18 months hot, 7 years cold (regulated).

6. Performance

Completions: ~1k/sec write; 5k/sec read (for cache decisions).
Embeddings: HNSW search p95 < 10ms on <10M vectors.
Budget check: atomic UPDATE with RETURNING.

1. Database​

2. Tables​

2.1 prompts + prompt_versions​

2.2 models​

2.3 completions (partitioned monthly)​

2.4 embeddings​

2.5 budgets​

2.6 audit_log (append-only, 7-year retention)​

2.7 cache​

2.8 outbox + inbox​

3. RLS​

4. Indexing​

5. Partitioning​

6. Performance​