ai-orchestrator-service — Local Dev Setup
Companion to:
docs/standards/SERVICE_TEMPLATE.md·DEPLOYMENT_TOPOLOGY.md
The goal is that a new contributor can clone the monorepo, run pnpm dev in services/ai-orchestrator-service, and have a fully-working local stack — including a fake LLM provider and an actual ONNX edge model — within 10 minutes.
1. Prerequisites
| Tool | Version | Notes |
|---|---|---|
| Node.js | >= 20.11 < 21 | matches Cloud Run runtime |
| pnpm | >= 9 | monorepo package manager |
| Docker Desktop | latest | for Postgres + Redis + emulators |
| gcloud CLI | latest | only needed for gcloud auth application-default login if testing real Vertex |
| Python 3.11 | optional | for the local LLM emulator (see §3) |
| jq | latest | seed scripts |
direnv | optional | auto-loads .envrc |
2. One-shot bootstrap
From the monorepo root:
pnpm install
pnpm --filter @melmastoon/ai-orchestrator-service run setup:local
setup:local does:
docker compose -f infra/local/ai/docker-compose.yml up -d— starts Postgres+pgvector, Redis, Pub/Sub emulator, GCS emulator, and the fake LLM emulator.- Waits for health on each container.
- Runs Flyway migrations against the local Postgres.
- Seeds:
- 5 capabilities (
pricing.suggest,message.draft,tutor.answer,anomaly.detect,internal.prompt_lint). - 1 prompt + 1 active version per capability.
- 1 model row per provider including
onnx-edge/phi-3-mini-4k-instruct. - 1 RAG corpus per tenant (
tenant_local_a,tenant_local_b) with 50 sample chunks. - A signed local edge model manifest (signed by a local KMS-emulator key).
- 5 capabilities (
- Downloads the ONNX models into
~/.melmastoon/edge-models/(cached across runs):phi-3-mini-4k-instruct-int4.onnx(~2.4 GB) from a public mirror.all-MiniLM-L6-v2.onnx(~96 MB).
- Generates
services/ai-orchestrator-service/.env.localfrom.env.local.example.
3. Local LLM emulator
The repo ships a tiny FastAPI emulator (infra/local/ai/llm-emulator/) that mimics the parts of Vertex AI, Anthropic, and OpenAI APIs the service uses:
POST /vertex/projects/.../publishers/google/models/.../streamGenerateContentPOST /anthropic/v1/messagesPOST /openai/v1/chat/completionsPOST /vertex/.../embeddingsPOST /vertex/.../moderate
Behaviour:
- Returns canned JSON for known capability prompt fingerprints (used in unit tests).
- Falls back to a local rule-based responder for unknown inputs (echoes input with a "[FAKE-MODEL]" marker that's schema-valid for the requesting capability).
- Supports query-string control:
?fail=503,?slow=2000,?invalid_json=true,?budget_exceeded=trueto simulate failure modes fromFAILURE_MODES.md.
The provider adapters in the service detect MELMASTOON_LOCAL_EMULATOR=true and route to http://localhost:8088/... instead of real provider endpoints.
4. Edge ONNX runtime locally
The service has a thin OnnxEdgeAdapter that uses onnxruntime-node. To exercise it locally:
pnpm --filter @melmastoon/ai-orchestrator-service test:edge
This runs vitest --include test/edge/** which loads the cached phi-3-mini-4k-instruct-int4.onnx, runs a 5-prompt harness, and asserts schema validity. First run downloads the model; subsequent runs use the on-disk cache. CI uses a tiny stub model (tiny-llama-stub.onnx, 8 MB) for speed; the heavy model is gated behind EDGE_FULL=1.
5. Running the service
pnpm --filter @melmastoon/ai-orchestrator-service dev
Default ports:
| Port | Service |
|---|---|
4080 | API |
4081 | metrics |
5432 | Postgres |
6379 | Redis |
8085 | Pub/Sub emulator |
4443 | GCS emulator |
8088 | LLM emulator |
Hit it:
curl -X POST http://localhost:4080/api/v1/ai/complete \
-H 'X-Tenant-Id: tenant_local_a' \
-H 'Authorization: Bearer dev.tenant_local_a.user_dev_001' \
-H 'Content-Type: application/json' \
-d '{"capabilityKey":"message.draft","inputs":{"intent":"welcome","guestName":"Asma"}}'
The dev JWT issuer (pnpm --filter @melmastoon/iam-service dev) accepts dev.<tenant>.<user> tokens for local-only convenience.
6. Useful npm scripts
| Script | What it does |
|---|---|
pnpm test | Vitest unit tests |
pnpm test:integration | Spins Testcontainers; runs IT-AI-* |
pnpm test:redteam | Red-team suite |
pnpm test:edge | ONNX edge replay subset |
pnpm eval -- --suite EVAL_MSG_001 | Run an eval suite end-to-end against the local LLM emulator (or real Vertex with EVAL_PROVIDER=vertex) |
pnpm seed:capabilities | Re-seed capability catalog from infra/local/seed/capabilities.json |
pnpm migrate | Run Flyway migrations |
pnpm psql | Open a psql shell against the local DB with app.tenant_id pre-set |
pnpm verify:manifest | Verify the local edge manifest signature using the local KMS emulator key |
7. Talking to real Vertex AI (optional)
gcloud auth application-default login
export MELMASTOON_LOCAL_EMULATOR=false
export GOOGLE_CLOUD_PROJECT=melmastoon-dev-ai
export VERTEX_LOCATION=europe-west1
pnpm dev
Real-Vertex calls cost real money. The dev project has a hard cap of $20/day; budget alerts route to #ai-eng-dev-spend.
8. Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
pgvector extension missing | Wrong Postgres image | docker compose -f infra/local/ai/docker-compose.yml down -v && pnpm setup:local |
| ONNX runtime fails to load with NO_AVX2 | CPU lacks AVX2 (older M1 emulation) | set MELMASTOON_EDGE_DISABLED=true to skip edge tests |
MELMASTOON.AI.PROVENANCE_MISSING on every call | local seed didn't write a default provider/model row | re-run pnpm seed:capabilities |
| Pub/Sub emulator returns DEADLINE_EXCEEDED | host Docker is paused | docker compose restart ai-pubsub |
| 401 on every request | dev JWT issuer not running | pnpm --filter @melmastoon/iam-service dev |