Skip to main content

ai-orchestrator-service — Local Dev Setup

Companion to: docs/standards/SERVICE_TEMPLATE.md · DEPLOYMENT_TOPOLOGY.md

The goal is that a new contributor can clone the monorepo, run pnpm dev in services/ai-orchestrator-service, and have a fully-working local stack — including a fake LLM provider and an actual ONNX edge model — within 10 minutes.

1. Prerequisites

ToolVersionNotes
Node.js>= 20.11 < 21matches Cloud Run runtime
pnpm>= 9monorepo package manager
Docker Desktoplatestfor Postgres + Redis + emulators
gcloud CLIlatestonly needed for gcloud auth application-default login if testing real Vertex
Python 3.11optionalfor the local LLM emulator (see §3)
jqlatestseed scripts
direnvoptionalauto-loads .envrc

2. One-shot bootstrap

From the monorepo root:

pnpm install
pnpm --filter @melmastoon/ai-orchestrator-service run setup:local

setup:local does:

  1. docker compose -f infra/local/ai/docker-compose.yml up -d — starts Postgres+pgvector, Redis, Pub/Sub emulator, GCS emulator, and the fake LLM emulator.
  2. Waits for health on each container.
  3. Runs Flyway migrations against the local Postgres.
  4. Seeds:
    • 5 capabilities (pricing.suggest, message.draft, tutor.answer, anomaly.detect, internal.prompt_lint).
    • 1 prompt + 1 active version per capability.
    • 1 model row per provider including onnx-edge/phi-3-mini-4k-instruct.
    • 1 RAG corpus per tenant (tenant_local_a, tenant_local_b) with 50 sample chunks.
    • A signed local edge model manifest (signed by a local KMS-emulator key).
  5. Downloads the ONNX models into ~/.melmastoon/edge-models/ (cached across runs):
    • phi-3-mini-4k-instruct-int4.onnx (~2.4 GB) from a public mirror.
    • all-MiniLM-L6-v2.onnx (~96 MB).
  6. Generates services/ai-orchestrator-service/.env.local from .env.local.example.

3. Local LLM emulator

The repo ships a tiny FastAPI emulator (infra/local/ai/llm-emulator/) that mimics the parts of Vertex AI, Anthropic, and OpenAI APIs the service uses:

  • POST /vertex/projects/.../publishers/google/models/.../streamGenerateContent
  • POST /anthropic/v1/messages
  • POST /openai/v1/chat/completions
  • POST /vertex/.../embeddings
  • POST /vertex/.../moderate

Behaviour:

  • Returns canned JSON for known capability prompt fingerprints (used in unit tests).
  • Falls back to a local rule-based responder for unknown inputs (echoes input with a "[FAKE-MODEL]" marker that's schema-valid for the requesting capability).
  • Supports query-string control: ?fail=503, ?slow=2000, ?invalid_json=true, ?budget_exceeded=true to simulate failure modes from FAILURE_MODES.md.

The provider adapters in the service detect MELMASTOON_LOCAL_EMULATOR=true and route to http://localhost:8088/... instead of real provider endpoints.

4. Edge ONNX runtime locally

The service has a thin OnnxEdgeAdapter that uses onnxruntime-node. To exercise it locally:

pnpm --filter @melmastoon/ai-orchestrator-service test:edge

This runs vitest --include test/edge/** which loads the cached phi-3-mini-4k-instruct-int4.onnx, runs a 5-prompt harness, and asserts schema validity. First run downloads the model; subsequent runs use the on-disk cache. CI uses a tiny stub model (tiny-llama-stub.onnx, 8 MB) for speed; the heavy model is gated behind EDGE_FULL=1.

5. Running the service

pnpm --filter @melmastoon/ai-orchestrator-service dev

Default ports:

PortService
4080API
4081metrics
5432Postgres
6379Redis
8085Pub/Sub emulator
4443GCS emulator
8088LLM emulator

Hit it:

curl -X POST http://localhost:4080/api/v1/ai/complete \
-H 'X-Tenant-Id: tenant_local_a' \
-H 'Authorization: Bearer dev.tenant_local_a.user_dev_001' \
-H 'Content-Type: application/json' \
-d '{"capabilityKey":"message.draft","inputs":{"intent":"welcome","guestName":"Asma"}}'

The dev JWT issuer (pnpm --filter @melmastoon/iam-service dev) accepts dev.<tenant>.<user> tokens for local-only convenience.

6. Useful npm scripts

ScriptWhat it does
pnpm testVitest unit tests
pnpm test:integrationSpins Testcontainers; runs IT-AI-*
pnpm test:redteamRed-team suite
pnpm test:edgeONNX edge replay subset
pnpm eval -- --suite EVAL_MSG_001Run an eval suite end-to-end against the local LLM emulator (or real Vertex with EVAL_PROVIDER=vertex)
pnpm seed:capabilitiesRe-seed capability catalog from infra/local/seed/capabilities.json
pnpm migrateRun Flyway migrations
pnpm psqlOpen a psql shell against the local DB with app.tenant_id pre-set
pnpm verify:manifestVerify the local edge manifest signature using the local KMS emulator key

7. Talking to real Vertex AI (optional)

gcloud auth application-default login
export MELMASTOON_LOCAL_EMULATOR=false
export GOOGLE_CLOUD_PROJECT=melmastoon-dev-ai
export VERTEX_LOCATION=europe-west1
pnpm dev

Real-Vertex calls cost real money. The dev project has a hard cap of $20/day; budget alerts route to #ai-eng-dev-spend.

8. Troubleshooting

SymptomLikely causeFix
pgvector extension missingWrong Postgres imagedocker compose -f infra/local/ai/docker-compose.yml down -v && pnpm setup:local
ONNX runtime fails to load with NO_AVX2CPU lacks AVX2 (older M1 emulation)set MELMASTOON_EDGE_DISABLED=true to skip edge tests
MELMASTOON.AI.PROVENANCE_MISSING on every calllocal seed didn't write a default provider/model rowre-run pnpm seed:capabilities
Pub/Sub emulator returns DEADLINE_EXCEEDEDhost Docker is pauseddocker compose restart ai-pubsub
401 on every requestdev JWT issuer not runningpnpm --filter @melmastoon/iam-service dev