Deployment Topology

:::info Source Sourced from services/ai-gateway-service/DEPLOYMENT_TOPOLOGY.md in the documentation repo. :::

1. Containers

ai-api — REST + SSE endpoint.
ai-local-worker — local model inference (GPU-backed).
ai-outbox-relay.
ai-eval-worker — runs eval sets nightly + on PR.
ai-budget-reaper — resets daily/monthly budgets.
ai-audit-archiver — batch archives audit rows to cold S3.

2. Scaling

Container	Min	Max	HPA
api	5	50	CPU>60% or in-flight completions > 200/pod
local-worker	2	20 (GPU pool)	GPU util > 70%
outbox-relay	2	8	backlog > 5000
eval-worker	1	5	cron-driven

3. Resources

api: 1000m/4000m, 1Gi/4Gi. local-worker: GPU node (e.g., L4 / A100), 8-16 vCPU, 32Gi.

4. Provider Egress

Dedicated egress NAT per region.
Provider allowlist.
Rate-limit outbound per provider (protect from burst DoS).

5. Cache

Redis (per region):

AI output cache
Rate-limit counters
Budget counters (primary in Postgres; Redis for hot reads).

6. Regional

Per region: us, eu, me, ap. Provider routing respects tenant residency.

7. Service Mesh

mTLS internal. Egress to providers through dedicated proxy.

8. Release

Blue/green for api. Local workers: drain before replace (GPU scheduling). Prompt version deploys versioned — no gateway restart.

9. DR

RPO 5 min (audit outbox + cold archive). RTO 60 min.

10. Diagram

Service SDK (AIClient) ──mTLS──▶ ai-api
                                    │
                                    ├─▶ Postgres (prompts, completions, budgets, embeddings)
                                    ├─▶ Redis (cache, rate-limit)
                                    ├─▶ Safety pipeline (local classifiers)
                                    ├─▶ local-worker (GPU; on-prem or cloud)
                                    └─▶ Provider egress (NAT + allowlist)
                                           │
                                           ├─▶ OpenAI
                                           ├─▶ Anthropic
                                           ├─▶ Google
                                           ├─▶ Azure OpenAI (BAA)
                                           └─▶ Mistral / etc.

 Audit firehose ──▶ analytics-service + audit sink (WORM S3)

1. Containers​

2. Scaling​

3. Resources​

4. Provider Egress​

5. Cache​

6. Regional​

7. Service Mesh​

8. Release​

9. DR​

10. Diagram​