ai-orchestrator-service — Deployment Topology
Companion to:
docs/02-enterprise-architecture.md·SECURITY_MODEL.md
1. Runtime targets
| Component | Runtime | Why |
|---|---|---|
ai-orchestrator-service (main API) | Cloud Run (managed) | Fast scale-to-zero off-peak; CPU bursts during peak booking windows; per-request billing aligns with bursty AI traffic |
ai-orchestrator-manifest-signer | Cloud Run (revision with no public ingress; KMS-signer SA) | Smallest blast radius for the only role with cloudkms.signer |
ai-orchestrator-eval-runner | Cloud Run Job (scheduled) | Long-running batch fits jobs better than service revisions |
ai-orchestrator-rag-ingestor | Cloud Run Job (event-triggered + scheduled) | Bursty ingestion of policies / FAQ / corpora |
ai-orchestrator-budget-resetter | Cloud Scheduler → Cloud Run Job | Period rollover (daily / monthly) |
| Provider — Vertex AI | Google managed | Primary cloud LLM/embeddings/vision/speech |
| Provider — Anthropic | External (TLS) | Fallback only; routed via egress NAT |
| Provider — OpenAI | External (TLS) | Fallback only; routed via egress NAT |
| Edge — ONNX Runtime Node | Electron desktop | Offline + low-bandwidth fallback |
2. GCP region & project layout
- Project:
melmastoon-prod-ai(separate from the rest of platform to isolate AI cost + security blast radius). VPC peered withmelmastoon-prod-platform. - Primary region:
europe-west1(Belgium) — lowest aggregate latency to MENA + EU customer base; Vertex AI Gemini family available. - Vertex AI region pinning: explicit
location='europe-west1'per call. No multi-region fallback (data-residency). - Failover region (active-passive):
europe-west4(Netherlands) for Cloud SQL read replica + warm Cloud Run revision; manual failover. - Provider egress: a single Cloud NAT in
europe-west1for outbound to Anthropic/OpenAI; static IP whitelisted with each vendor.
┌────────────────────────┐
public ───▶ │ HTTPS LB + Cloud Armor │
└───────────┬────────────┘
│ mTLS
▼
┌─────────────────────────────────────────┐
│ Cloud Run revision: ai-orchestrator-svc │ (autoscale 0..200, min 2 in peak)
│ ├─ Postgres connector → Cloud SQL │
│ ├─ Memorystore connector → Redis │
│ ├─ Pub/Sub publisher │
│ ├─ GCS reader (eval/manifest/RAG) │
│ └─ Egress via Cloud NAT (Anthropic/OpenAI)
└────────┬───────────────┬────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Vertex AI │ │ Cloud Run Jobs │
│ europe-west1 │ │ eval, ingest, │
└──────────────┘ │ budget reset │
└──────────────────┘
3. Cloud Run revision sizing
| Setting | Main service | Manifest signer | Eval runner |
|---|---|---|---|
| CPU | 2 vCPU | 1 vCPU | 4 vCPU |
| Memory | 4 GiB | 1 GiB | 8 GiB |
| Concurrency | 40 | 1 | 1 |
| Min instances | 2 (production), 0 (staging) | 0 | 0 |
| Max instances | 200 | 1 | 20 |
| Request timeout | 60 s | 30 s | 3600 s |
| Startup probe | /readyz 200 within 20 s | /healthz | /healthz |
| Cold start budget | < 3 s (carry preloaded JWKS + Vertex SDK warm) | n/a | n/a |
Autoscale signals: target CPU 60%, target concurrency 30 (so we hit a new instance before saturating).
4. Storage topology
| Store | Type | Sizing | HA |
|---|---|---|---|
| Cloud SQL Postgres 16 | db-custom-8-32768 (8 vCPU, 32 GiB) | ~120 GiB at year 1 | Regional HA + read replica in europe-west4 |
| pgvector | extension on the same instance | indexes ~25% of base data | inherits HA |
| Memorystore Redis 7 | Standard tier, 5 GiB | inference cache + idempotency | regional, read replicas |
| GCS — eval datasets | melmastoon-prod-ai-eval | versioned, lifecycle-archive after 1 y | dual-region EU |
| GCS — model artifacts (ONNX) | melmastoon-prod-ai-models | manifest-controlled | dual-region EU |
| GCS — desktop snapshots | melmastoon-prod-ai-snapshots | per-tenant, 1-h signed URLs | regional |
| BigQuery | melmastoon_prod_ai_analytics | ai_calls_fact partitioned daily, clustered (tenant_id, capability_key) | multi-region EU |
| Cloud KMS | manifest-signer keyring (asymmetric RSA-PSS-SHA256), data-encryption keyring (CMEK for SQL/GCS) | rotate yearly | regional |
| Secret Manager | provider keys, BYOK tenant keys | rotate 60 d | regional |
5. Networking & ingress
- HTTPS LB (global) → mTLS-required serverless NEG fronting Cloud Run.
- Cloud Armor: bot rules, geo allowlist (matches tenant footprint), per-IP rate limiting on
/admin/*. - VPC-SC perimeter:
melmastoon-prod-aiis inside the AI VPC-SC perimeter; only the platform project + on-call jump host are allowed ingress to GCS/BigQuery from outside the perimeter. - Internal calls from sibling services arrive via a private serverless NEG (no public route).
6. Pub/Sub topology
| Topic | Producer | Subscribers |
|---|---|---|
melmastoon.ai_orchestrator.inference.events.v1 | this service | reservation-svc (upsell), notification-svc (drafts), reporting-svc, audit-svc |
melmastoon.ai_orchestrator.eval.events.v1 | this service | reporting-svc, ai-engineering Slack relay |
melmastoon.ai_orchestrator.hitl.events.v1 | this service | notification-svc (assignee notify), audit-svc |
melmastoon.ai_orchestrator.budget.events.v1 | this service | billing-svc, finance-ops |
melmastoon.ai_orchestrator.commands.inference_request.v1 | siblings | this service (worker pool consumes for async inference) |
DLQs are configured per topic with 5 redeliveries before park.
7. Deployment pipeline
GitHub Actions:
- PR: lint + unit + integration (Testcontainers) + red-team must pass.
- Merge to
main: build + push to Artifact Registry; deploy to staging Cloud Run; run e2e + nightly suites; canary 10% on staging. - Manual promote to prod: deploy a tagged Cloud Run revision with 0% traffic, run smoke probes, then progressive rollout 10 → 50 → 100% with auto-rollback on SLO breach (Cloud Deploy with Cloud Monitoring policy attached).
- DB migrations: applied via
flywayin a one-shot Cloud Run Job before the Cloud Run service revision rolls out (forward-only; backwards-compatible perMIGRATION_PLAN.md).
8. Secrets & config
- Secrets via Secret Manager mounted as env at boot (no disk).
- Non-secret config via Cloud Run env vars + a
config/<env>.jsonbaked into the image (capability defaults, model deployment table fallbacks). - Feature flags via
ConfigCat-style table in Postgres; refreshed on a 30 s ticker.
9. Disaster recovery
| Scenario | RPO | RTO | Procedure |
|---|---|---|---|
| Single AZ failure (Cloud Run) | 0 | seconds | automatic |
| Cloud SQL primary failure | < 60 s | < 5 min | regional HA failover automatic |
Region failure europe-west1 | < 5 min | < 60 min | manual: promote europe-west4 replica; flip Cloud Run traffic; flip Vertex AI to europe-west4 |
| Provider (Vertex) outage | n/a | 0 (degrade) | router fallback chain; deterministic if all providers fail |
| KMS key compromise (manifest signer) | 0 | < 4 h | rotate key, publish a new signed manifest, force desktops to re-fetch |