ai-orchestrator-service — Deployment Topology

Companion to: docs/02-enterprise-architecture.md · SECURITY_MODEL.md

1. Runtime targets

Component	Runtime	Why
`ai-orchestrator-service` (main API)	Cloud Run (managed)	Fast scale-to-zero off-peak; CPU bursts during peak booking windows; per-request billing aligns with bursty AI traffic
`ai-orchestrator-manifest-signer`	Cloud Run (revision with no public ingress; KMS-signer SA)	Smallest blast radius for the only role with `cloudkms.signer`
`ai-orchestrator-eval-runner`	Cloud Run Job (scheduled)	Long-running batch fits jobs better than service revisions
`ai-orchestrator-rag-ingestor`	Cloud Run Job (event-triggered + scheduled)	Bursty ingestion of policies / FAQ / corpora
`ai-orchestrator-budget-resetter`	Cloud Scheduler → Cloud Run Job	Period rollover (daily / monthly)
Provider — Vertex AI	Google managed	Primary cloud LLM/embeddings/vision/speech
Provider — Anthropic	External (TLS)	Fallback only; routed via egress NAT
Provider — OpenAI	External (TLS)	Fallback only; routed via egress NAT
Edge — ONNX Runtime Node	Electron desktop	Offline + low-bandwidth fallback

2. GCP region & project layout

Project: melmastoon-prod-ai (separate from the rest of platform to isolate AI cost + security blast radius). VPC peered with melmastoon-prod-platform.
Primary region: europe-west1 (Belgium) — lowest aggregate latency to MENA + EU customer base; Vertex AI Gemini family available.
Vertex AI region pinning: explicit location='europe-west1' per call. No multi-region fallback (data-residency).
Failover region (active-passive): europe-west4 (Netherlands) for Cloud SQL read replica + warm Cloud Run revision; manual failover.
Provider egress: a single Cloud NAT in europe-west1 for outbound to Anthropic/OpenAI; static IP whitelisted with each vendor.

                ┌────────────────────────┐
   public ───▶  │ HTTPS LB + Cloud Armor │
                └───────────┬────────────┘
                            │ mTLS
                            ▼
       ┌─────────────────────────────────────────┐
       │ Cloud Run revision: ai-orchestrator-svc │ (autoscale 0..200, min 2 in peak)
       │   ├─ Postgres connector → Cloud SQL     │
       │   ├─ Memorystore connector → Redis      │
       │   ├─ Pub/Sub publisher                  │
       │   ├─ GCS reader (eval/manifest/RAG)     │
       │   └─ Egress via Cloud NAT (Anthropic/OpenAI)
       └────────┬───────────────┬────────────────┘
                │               │
                ▼               ▼
        ┌──────────────┐ ┌──────────────────┐
        │ Vertex AI    │ │ Cloud Run Jobs   │
        │ europe-west1 │ │ eval, ingest,    │
        └──────────────┘ │ budget reset     │
                         └──────────────────┘

3. Cloud Run revision sizing

Setting	Main service	Manifest signer	Eval runner
CPU	2 vCPU	1 vCPU	4 vCPU
Memory	4 GiB	1 GiB	8 GiB
Concurrency	40	1	1
Min instances	2 (production), 0 (staging)	0	0
Max instances	200	1	20
Request timeout	60 s	30 s	3600 s
Startup probe	`/readyz` 200 within 20 s	`/healthz`	`/healthz`
Cold start budget	< 3 s (carry preloaded JWKS + Vertex SDK warm)	n/a	n/a

Autoscale signals: target CPU 60%, target concurrency 30 (so we hit a new instance before saturating).

4. Storage topology

Store	Type	Sizing	HA
Cloud SQL Postgres 16	`db-custom-8-32768` (8 vCPU, 32 GiB)	~120 GiB at year 1	Regional HA + read replica in `europe-west4`
pgvector	extension on the same instance	indexes ~25% of base data	inherits HA
Memorystore Redis 7	Standard tier, 5 GiB	inference cache + idempotency	regional, read replicas
GCS — eval datasets	`melmastoon-prod-ai-eval`	versioned, lifecycle-archive after 1 y	dual-region EU
GCS — model artifacts (ONNX)	`melmastoon-prod-ai-models`	manifest-controlled	dual-region EU
GCS — desktop snapshots	`melmastoon-prod-ai-snapshots`	per-tenant, 1-h signed URLs	regional
BigQuery	`melmastoon_prod_ai_analytics`	`ai_calls_fact` partitioned daily, clustered (tenant_id, capability_key)	multi-region EU
Cloud KMS	`manifest-signer` keyring (asymmetric RSA-PSS-SHA256), `data-encryption` keyring (CMEK for SQL/GCS)	rotate yearly	regional
Secret Manager	provider keys, BYOK tenant keys	rotate 60 d	regional

5. Networking & ingress

HTTPS LB (global) → mTLS-required serverless NEG fronting Cloud Run.
Cloud Armor: bot rules, geo allowlist (matches tenant footprint), per-IP rate limiting on /admin/*.
VPC-SC perimeter: melmastoon-prod-ai is inside the AI VPC-SC perimeter; only the platform project + on-call jump host are allowed ingress to GCS/BigQuery from outside the perimeter.
Internal calls from sibling services arrive via a private serverless NEG (no public route).

6. Pub/Sub topology

Topic	Producer	Subscribers
`melmastoon.ai_orchestrator.inference.events.v1`	this service	reservation-svc (upsell), notification-svc (drafts), reporting-svc, audit-svc
`melmastoon.ai_orchestrator.eval.events.v1`	this service	reporting-svc, ai-engineering Slack relay
`melmastoon.ai_orchestrator.hitl.events.v1`	this service	notification-svc (assignee notify), audit-svc
`melmastoon.ai_orchestrator.budget.events.v1`	this service	billing-svc, finance-ops
`melmastoon.ai_orchestrator.commands.inference_request.v1`	siblings	this service (worker pool consumes for async inference)

DLQs are configured per topic with 5 redeliveries before park.

7. Deployment pipeline

GitHub Actions:

PR: lint + unit + integration (Testcontainers) + red-team must pass.
Merge to main: build + push to Artifact Registry; deploy to staging Cloud Run; run e2e + nightly suites; canary 10% on staging.
Manual promote to prod: deploy a tagged Cloud Run revision with 0% traffic, run smoke probes, then progressive rollout 10 → 50 → 100% with auto-rollback on SLO breach (Cloud Deploy with Cloud Monitoring policy attached).
DB migrations: applied via flyway in a one-shot Cloud Run Job before the Cloud Run service revision rolls out (forward-only; backwards-compatible per MIGRATION_PLAN.md).

8. Secrets & config

Secrets via Secret Manager mounted as env at boot (no disk).
Non-secret config via Cloud Run env vars + a config/<env>.json baked into the image (capability defaults, model deployment table fallbacks).
Feature flags via ConfigCat-style table in Postgres; refreshed on a 30 s ticker.

9. Disaster recovery

Scenario	RPO	RTO	Procedure
Single AZ failure (Cloud Run)	0	seconds	automatic
Cloud SQL primary failure	< 60 s	< 5 min	regional HA failover automatic
Region failure `europe-west1`	< 5 min	< 60 min	manual: promote `europe-west4` replica; flip Cloud Run traffic; flip Vertex AI to `europe-west4`
Provider (Vertex) outage	n/a	0 (degrade)	router fallback chain; deterministic if all providers fail
KMS key compromise (manifest signer)	0	< 4 h	rotate key, publish a new signed manifest, force desktops to re-fetch

1. Runtime targets​

2. GCP region & project layout​

3. Cloud Run revision sizing​

4. Storage topology​

5. Networking & ingress​

6. Pub/Sub topology​

7. Deployment pipeline​

8. Secrets & config​

9. Disaster recovery​

1. Runtime targets

2. GCP region & project layout

3. Cloud Run revision sizing

4. Storage topology

5. Networking & ingress

6. Pub/Sub topology

7. Deployment pipeline

8. Secrets & config

9. Disaster recovery