ai-orchestrator-service — Local Dev Setup

Companion to: docs/standards/SERVICE_TEMPLATE.md · DEPLOYMENT_TOPOLOGY.md

The goal is that a new contributor can clone the monorepo, run pnpm dev in services/ai-orchestrator-service, and have a fully-working local stack — including a fake LLM provider and an actual ONNX edge model — within 10 minutes.

1. Prerequisites

Tool	Version	Notes
Node.js	`>= 20.11 < 21`	matches Cloud Run runtime
pnpm	`>= 9`	monorepo package manager
Docker Desktop	latest	for Postgres + Redis + emulators
gcloud CLI	latest	only needed for `gcloud auth application-default login` if testing real Vertex
Python 3.11	optional	for the local LLM emulator (see §3)
jq	latest	seed scripts
`direnv`	optional	auto-loads `.envrc`

2. One-shot bootstrap

From the monorepo root:

pnpm install
pnpm --filter @melmastoon/ai-orchestrator-service run setup:local

setup:local does:

docker compose -f infra/local/ai/docker-compose.yml up -d — starts Postgres+pgvector, Redis, Pub/Sub emulator, GCS emulator, and the fake LLM emulator.
Waits for health on each container.
Runs Flyway migrations against the local Postgres.
Seeds:
- 5 capabilities (pricing.suggest, message.draft, tutor.answer, anomaly.detect, internal.prompt_lint).
- 1 prompt + 1 active version per capability.
- 1 model row per provider including onnx-edge/phi-3-mini-4k-instruct.
- 1 RAG corpus per tenant (tenant_local_a, tenant_local_b) with 50 sample chunks.
- A signed local edge model manifest (signed by a local KMS-emulator key).
Downloads the ONNX models into ~/.melmastoon/edge-models/ (cached across runs):
- phi-3-mini-4k-instruct-int4.onnx (~2.4 GB) from a public mirror.
- all-MiniLM-L6-v2.onnx (~96 MB).
Generates services/ai-orchestrator-service/.env.local from .env.local.example.

3. Local LLM emulator

The repo ships a tiny FastAPI emulator (infra/local/ai/llm-emulator/) that mimics the parts of Vertex AI, Anthropic, and OpenAI APIs the service uses:

POST /vertex/projects/.../publishers/google/models/.../streamGenerateContent
POST /anthropic/v1/messages
POST /openai/v1/chat/completions
POST /vertex/.../embeddings
POST /vertex/.../moderate

Behaviour:

Returns canned JSON for known capability prompt fingerprints (used in unit tests).
Falls back to a local rule-based responder for unknown inputs (echoes input with a "[FAKE-MODEL]" marker that's schema-valid for the requesting capability).
Supports query-string control: ?fail=503, ?slow=2000, ?invalid_json=true, ?budget_exceeded=true to simulate failure modes from FAILURE_MODES.md.

The provider adapters in the service detect MELMASTOON_LOCAL_EMULATOR=true and route to http://localhost:8088/... instead of real provider endpoints.

4. Edge ONNX runtime locally

The service has a thin OnnxEdgeAdapter that uses onnxruntime-node. To exercise it locally:

pnpm --filter @melmastoon/ai-orchestrator-service test:edge

This runs vitest --include test/edge/** which loads the cached phi-3-mini-4k-instruct-int4.onnx, runs a 5-prompt harness, and asserts schema validity. First run downloads the model; subsequent runs use the on-disk cache. CI uses a tiny stub model (tiny-llama-stub.onnx, 8 MB) for speed; the heavy model is gated behind EDGE_FULL=1.

5. Running the service

pnpm --filter @melmastoon/ai-orchestrator-service dev

Default ports:

Port	Service
`4080`	API
`4081`	metrics
`5432`	Postgres
`6379`	Redis
`8085`	Pub/Sub emulator
`4443`	GCS emulator
`8088`	LLM emulator

Hit it:

curl -X POST http://localhost:4080/api/v1/ai/complete \
  -H 'X-Tenant-Id: tenant_local_a' \
  -H 'Authorization: Bearer dev.tenant_local_a.user_dev_001' \
  -H 'Content-Type: application/json' \
  -d '{"capabilityKey":"message.draft","inputs":{"intent":"welcome","guestName":"Asma"}}'

The dev JWT issuer (pnpm --filter @melmastoon/iam-service dev) accepts dev.<tenant>.<user> tokens for local-only convenience.

6. Useful npm scripts

Script	What it does
`pnpm test`	Vitest unit tests
`pnpm test:integration`	Spins Testcontainers; runs `IT-AI-*`
`pnpm test:redteam`	Red-team suite
`pnpm test:edge`	ONNX edge replay subset
`pnpm eval -- --suite EVAL_MSG_001`	Run an eval suite end-to-end against the local LLM emulator (or real Vertex with `EVAL_PROVIDER=vertex`)
`pnpm seed:capabilities`	Re-seed capability catalog from `infra/local/seed/capabilities.json`
`pnpm migrate`	Run Flyway migrations
`pnpm psql`	Open a psql shell against the local DB with `app.tenant_id` pre-set
`pnpm verify:manifest`	Verify the local edge manifest signature using the local KMS emulator key

7. Talking to real Vertex AI (optional)

gcloud auth application-default login
export MELMASTOON_LOCAL_EMULATOR=false
export GOOGLE_CLOUD_PROJECT=melmastoon-dev-ai
export VERTEX_LOCATION=europe-west1
pnpm dev

Real-Vertex calls cost real money. The dev project has a hard cap of $20/day; budget alerts route to #ai-eng-dev-spend.

8. Troubleshooting

Symptom	Likely cause	Fix
`pgvector` extension missing	Wrong Postgres image	`docker compose -f infra/local/ai/docker-compose.yml down -v && pnpm setup:local`
ONNX runtime fails to load with NO_AVX2	CPU lacks AVX2 (older M1 emulation)	set `MELMASTOON_EDGE_DISABLED=true` to skip edge tests
`MELMASTOON.AI.PROVENANCE_MISSING` on every call	local seed didn't write a default provider/model row	re-run `pnpm seed:capabilities`
Pub/Sub emulator returns DEADLINE_EXCEEDED	host Docker is paused	`docker compose restart ai-pubsub`
401 on every request	dev JWT issuer not running	`pnpm --filter @melmastoon/iam-service dev`

1. Prerequisites​

2. One-shot bootstrap​

3. Local LLM emulator​

4. Edge ONNX runtime locally​

5. Running the service​

6. Useful npm scripts​

7. Talking to real Vertex AI (optional)​

8. Troubleshooting​