Skip to main content

ai-orchestrator-service — Deployment Topology

Companion to: docs/02-enterprise-architecture.md · SECURITY_MODEL.md

1. Runtime targets

ComponentRuntimeWhy
ai-orchestrator-service (main API)Cloud Run (managed)Fast scale-to-zero off-peak; CPU bursts during peak booking windows; per-request billing aligns with bursty AI traffic
ai-orchestrator-manifest-signerCloud Run (revision with no public ingress; KMS-signer SA)Smallest blast radius for the only role with cloudkms.signer
ai-orchestrator-eval-runnerCloud Run Job (scheduled)Long-running batch fits jobs better than service revisions
ai-orchestrator-rag-ingestorCloud Run Job (event-triggered + scheduled)Bursty ingestion of policies / FAQ / corpora
ai-orchestrator-budget-resetterCloud Scheduler → Cloud Run JobPeriod rollover (daily / monthly)
Provider — Vertex AIGoogle managedPrimary cloud LLM/embeddings/vision/speech
Provider — AnthropicExternal (TLS)Fallback only; routed via egress NAT
Provider — OpenAIExternal (TLS)Fallback only; routed via egress NAT
Edge — ONNX Runtime NodeElectron desktopOffline + low-bandwidth fallback

2. GCP region & project layout

  • Project: melmastoon-prod-ai (separate from the rest of platform to isolate AI cost + security blast radius). VPC peered with melmastoon-prod-platform.
  • Primary region: europe-west1 (Belgium) — lowest aggregate latency to MENA + EU customer base; Vertex AI Gemini family available.
  • Vertex AI region pinning: explicit location='europe-west1' per call. No multi-region fallback (data-residency).
  • Failover region (active-passive): europe-west4 (Netherlands) for Cloud SQL read replica + warm Cloud Run revision; manual failover.
  • Provider egress: a single Cloud NAT in europe-west1 for outbound to Anthropic/OpenAI; static IP whitelisted with each vendor.
┌────────────────────────┐
public ───▶ │ HTTPS LB + Cloud Armor │
└───────────┬────────────┘
│ mTLS

┌─────────────────────────────────────────┐
│ Cloud Run revision: ai-orchestrator-svc │ (autoscale 0..200, min 2 in peak)
│ ├─ Postgres connector → Cloud SQL │
│ ├─ Memorystore connector → Redis │
│ ├─ Pub/Sub publisher │
│ ├─ GCS reader (eval/manifest/RAG) │
│ └─ Egress via Cloud NAT (Anthropic/OpenAI)
└────────┬───────────────┬────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Vertex AI │ │ Cloud Run Jobs │
│ europe-west1 │ │ eval, ingest, │
└──────────────┘ │ budget reset │
└──────────────────┘

3. Cloud Run revision sizing

SettingMain serviceManifest signerEval runner
CPU2 vCPU1 vCPU4 vCPU
Memory4 GiB1 GiB8 GiB
Concurrency4011
Min instances2 (production), 0 (staging)00
Max instances200120
Request timeout60 s30 s3600 s
Startup probe/readyz 200 within 20 s/healthz/healthz
Cold start budget< 3 s (carry preloaded JWKS + Vertex SDK warm)n/an/a

Autoscale signals: target CPU 60%, target concurrency 30 (so we hit a new instance before saturating).

4. Storage topology

StoreTypeSizingHA
Cloud SQL Postgres 16db-custom-8-32768 (8 vCPU, 32 GiB)~120 GiB at year 1Regional HA + read replica in europe-west4
pgvectorextension on the same instanceindexes ~25% of base datainherits HA
Memorystore Redis 7Standard tier, 5 GiBinference cache + idempotencyregional, read replicas
GCS — eval datasetsmelmastoon-prod-ai-evalversioned, lifecycle-archive after 1 ydual-region EU
GCS — model artifacts (ONNX)melmastoon-prod-ai-modelsmanifest-controlleddual-region EU
GCS — desktop snapshotsmelmastoon-prod-ai-snapshotsper-tenant, 1-h signed URLsregional
BigQuerymelmastoon_prod_ai_analyticsai_calls_fact partitioned daily, clustered (tenant_id, capability_key)multi-region EU
Cloud KMSmanifest-signer keyring (asymmetric RSA-PSS-SHA256), data-encryption keyring (CMEK for SQL/GCS)rotate yearlyregional
Secret Managerprovider keys, BYOK tenant keysrotate 60 dregional

5. Networking & ingress

  • HTTPS LB (global) → mTLS-required serverless NEG fronting Cloud Run.
  • Cloud Armor: bot rules, geo allowlist (matches tenant footprint), per-IP rate limiting on /admin/*.
  • VPC-SC perimeter: melmastoon-prod-ai is inside the AI VPC-SC perimeter; only the platform project + on-call jump host are allowed ingress to GCS/BigQuery from outside the perimeter.
  • Internal calls from sibling services arrive via a private serverless NEG (no public route).

6. Pub/Sub topology

TopicProducerSubscribers
melmastoon.ai_orchestrator.inference.events.v1this servicereservation-svc (upsell), notification-svc (drafts), reporting-svc, audit-svc
melmastoon.ai_orchestrator.eval.events.v1this servicereporting-svc, ai-engineering Slack relay
melmastoon.ai_orchestrator.hitl.events.v1this servicenotification-svc (assignee notify), audit-svc
melmastoon.ai_orchestrator.budget.events.v1this servicebilling-svc, finance-ops
melmastoon.ai_orchestrator.commands.inference_request.v1siblingsthis service (worker pool consumes for async inference)

DLQs are configured per topic with 5 redeliveries before park.

7. Deployment pipeline

GitHub Actions:

  1. PR: lint + unit + integration (Testcontainers) + red-team must pass.
  2. Merge to main: build + push to Artifact Registry; deploy to staging Cloud Run; run e2e + nightly suites; canary 10% on staging.
  3. Manual promote to prod: deploy a tagged Cloud Run revision with 0% traffic, run smoke probes, then progressive rollout 10 → 50 → 100% with auto-rollback on SLO breach (Cloud Deploy with Cloud Monitoring policy attached).
  4. DB migrations: applied via flyway in a one-shot Cloud Run Job before the Cloud Run service revision rolls out (forward-only; backwards-compatible per MIGRATION_PLAN.md).

8. Secrets & config

  • Secrets via Secret Manager mounted as env at boot (no disk).
  • Non-secret config via Cloud Run env vars + a config/<env>.json baked into the image (capability defaults, model deployment table fallbacks).
  • Feature flags via ConfigCat-style table in Postgres; refreshed on a 30 s ticker.

9. Disaster recovery

ScenarioRPORTOProcedure
Single AZ failure (Cloud Run)0secondsautomatic
Cloud SQL primary failure< 60 s< 5 minregional HA failover automatic
Region failure europe-west1< 5 min< 60 minmanual: promote europe-west4 replica; flip Cloud Run traffic; flip Vertex AI to europe-west4
Provider (Vertex) outagen/a0 (degrade)router fallback chain; deterministic if all providers fail
KMS key compromise (manifest signer)0< 4 hrotate key, publish a new signed manifest, force desktops to re-fetch