ai-orchestrator-service — Sync Contract

Companion to: DATA_MODEL.md · ADR-0003 Electron Offline-First · 08 AI Architecture §9

1. Why this service participates in sync

The Electron desktop must continue producing AI artifacts when the cloud is unreachable: anomaly flags, message drafts via Phi-3-mini, tutor answers, RAG over local policies, photo-quality scoring, and housekeeping route optimization. To do that the desktop needs a read-only snapshot of:

The active prompt registry (prompt_versions rows in status='active', restricted to capabilities the tenant can run on the edge).
The edge model manifest (the signed EdgeModelManifest published row).
The active capability catalog (capabilities + fallback chains + HITL config).
A per-tenant edge RAG bundle — a subset of embeddings_edge + rag_chunks for the tenant's policies and faq namespaces.
The active model catalog (so the desktop knows model versions for provenance).
The AB sticky assignment for this tenant (so a draft prompt versioned at 5% online stays sticky offline).

The desktop does not sync inference audit, provenance, or HITL data downward. Those are server-authoritative and stay in the cloud. Edge inference still produces audit (see §6 below).

2. Aggregate sync policy table

Aggregate	Direction	Conflict policy	Snapshot scope	Notes
`Capability`	cloud → desktop (read-only)	`server_authoritative`	All `active`; tenant-enabled subset filtered server-side	Replaced atomically per snapshot
`Prompt` / `PromptVersion`	cloud → desktop (read-only)	`server_authoritative`	All `active` versions for capabilities the desktop can run	Pinned to the AB assignment row for this tenant
`Model`	cloud → desktop (read-only)	`server_authoritative`	All `available` rows for `provider='onnx-edge'` plus a metadata row per cloud model used in provenance
`EdgeModelManifest`	cloud → desktop (read-only)	`server_authoritative`	The single `published` row	Signature verified at every load
`RAGCorpus` (edge)	cloud → desktop (read-only)	`server_authoritative`	Per-tenant `policies` + `faq` namespaces	Limit to `embedding_dim = 384` (edge bundle)
`embeddings_edge` + `rag_chunks` for that corpus	cloud → desktop (read-only)	`server_authoritative`	Per-tenant subset; capped at 50,000 chunks per corpus per tenant	Compressed bundle in the snapshot
`BudgetCounter`	cloud → desktop (read-only, throttled)	`server_authoritative`	Current period for tenant	Refreshed on every sync (≤ 5 min stale tolerated)
`AB assignment`	cloud → desktop (read-only)	`server_authoritative`	Per-tenant per-capability
`InferenceRequest`	desktop → cloud (push)	`append_only`	Edge inference audit	One-shot push; idempotent on `requestId`
`InferenceResult`	desktop → cloud (push)	`append_only`	Same	Carries provenance with `local: true`
`Provenance`	desktop → cloud (push)	`append_only`	Same	Computed locally; cloud verifies + persists
`HitlGate`	(none on edge)	n/a	HITL is cloud-orchestrated; the desktop UI consumes them via the cloud API when online; offline HITL is not allowed for AI-drafted artifacts
`HitlDecision`	desktop → cloud (push)	`append_only`	Decisions made offline against gates that were already open before disconnection are queued; cloud accepts on idempotent `(gateId, reviewerUserId)`
`EvalSuite` / `EvalRun`	(none)	n/a	Eval runs are cloud-only

3. Snapshot endpoints

The desktop pulls a single snapshot per session start (and on demand from the in-app "Force AI sync" action) via:

GET /api/v1/sync/v1/pull?since=<cursor>&aggregates=ai-orchestrator

handled by sync-service which delegates to ai-orchestrator-service for the AI portion. The response payload shape:

{
  "ai-orchestrator": {
    "cursor": "ai_2026-05-12T01:31:09.412Z_evt_01H...",
    "capabilities": [ /* active rows; ≤ 200 */ ],
    "promptVersions": [ /* active rows pinned to AB; ≤ 200 */ ],
    "abAssignments": [ /* per-capability for this tenant */ ],
    "models": [ /* edge models + cloud-model metadata used in provenance */ ],
    "edgeModelManifest": { /* the signed manifest */ },
    "ragBundles": [
      {
        "corpusId": "rag_01H...",
        "namespace": "policies",
        "embeddingDim": 384,
        "chunks": [ /* { chunkId, text, metadata, sourceUri } */ ],
        "vectorsBlobUri": "https://desktop-snapshots.../tenants/.../policies.fvecs.zst"
      }
    ],
    "budgetSnapshot": { /* per-scope counters */ }
  }
}

vectorsBlobUri is a per-snapshot signed GCS URL; vectors travel as a compressed binary blob (fvecs.zst) to keep the JSON payload small. The desktop main process loads them into SQLite + a local HNSW index (or sqlite-vss) on first use.

4. Push endpoints

POST /api/v1/sync/v1/push

The desktop pushes batches. AI-related batches contain:

{
  "ai-orchestrator": {
    "edgeInference": [
      {
        "requestId": "ifr_01H...",
        "capabilityKey": "message.draft",
        "tenantId": "tnt_01H...",
        "promptVersionId": "pmv_01H...",
        "inputHash": "sha256:...",
        "redactedInputHash": "sha256:...",
        "completedAt": "2026-05-12T01:30:00Z",
        "latencyMs": 2841,
        "status": "completed",
        "outputJson": { /* schema-validated locally; cloud re-validates */ },
        "provenance": {
          "promptVersionId": "pmv_01H...",
          "promptVersionNo": 3,
          "model": { "provider": "onnx-edge", "name": "phi-3-mini-4k-instruct", "version": "int4-2.4.1" },
          "tokens": { "input": 612, "output": 184 },
          "costMicros": 0,
          "local": true,
          "cacheHit": false,
          "safety": { "input": "pass", "output": "pass" }
        }
      }
    ],
    "hitlDecisions": [ /* decisions made offline against still-open gates */ ]
  }
}

The cloud:

Dedupes on requestId.
Re-validates the output against the pinned prompt's output_schema_json. Mismatch → rejects with MELMASTOON.AI.OUTPUT_INVALID; the desktop surfaces the artifact as degraded and asks for a re-draft online.
Re-runs server-side moderation on the output. If block, the artifact is replaced with deterministic fallback and melmastoon.ai_orchestrator.moderation.flagged.v1 is emitted with side: 'output', source: 'edge_replay'.
Persists the inference + provenance rows in the cloud (with local: true flag).
Emits melmastoon.ai_orchestrator.inference.completed.v1 with local: true so analytics + downstream subscribers see the same envelope.

5. Conflict semantics

Read-only snapshots cannot conflict — the cloud is authoritative. Push-side conflicts:

Conflict	Resolution
Edge inference duplicate `requestId`	Idempotent — return the original cloud-persisted result
Edge HITL decision against a gate the cloud has already auto-closed (timeout)	The cloud's auto-decision wins; the desktop's late decision is recorded as `decision.outcome` with `superseded: true` and not used to gate downstream effects
Edge inference produced by a deprecated prompt version (desktop bundle was stale)	Cloud accepts the audit row but flags `provenance.notes: 'deprecated_prompt'`; downstream subscribers may still consume
Edge embedding dimension mismatch (e.g., 768 sent for an edge corpus)	Reject with `MELMASTOON.AI.OUTPUT_INVALID`
Edge moderation passed but cloud post-moderation blocks	Cloud wins; artifact replaced with deterministic fallback; user sees a banner explaining "your offline draft contained content that didn't pass server moderation"

6. Audit-of-edge

Every edge inference must produce inference.completed.v1 on next sync. To prove no edge artifact is "lost", the cloud reconciler runs a daily job comparing the count of edge artifacts created locally (the desktop emits a hourly edge.inference.tally event) with the count of inference.completed.v1 events with provenance.local = true for that tenant. Drift > 1% pages the AI on-call.

7. Snapshot size budget

Bundle	Hard cap	Behavior on overflow
Capabilities + prompts + manifest + AB + models + budget	1 MB JSON	Hard fail; degrade caps before snapshot
Per-corpus RAG bundle (chunks + metadata, JSON)	5 MB JSON	Truncate by importance score (manual tag in `metadata.priority`)
Per-corpus vectors blob (.fvecs.zst)	80 MB	Truncate matching the chunk truncation; emit `melmastoon.ai_orchestrator.edge_rag.truncated.v1`
Total snapshot	100 MB	Reject; require `?aggregates=ai-orchestrator&namespaces=policies` to scope

8. Refresh cadence

Trigger	Action
Desktop session start	Pull full AI snapshot
`melmastoon.ai_orchestrator.edge_model.manifest_updated.v1` consumed by sync-service	Push notification to online desktops; opportunistic re-pull on next idle
`melmastoon.ai_orchestrator.prompt.version_published.v1`	Same as above
Tenant changes a `policies` or `faq` document	Re-ingest in cloud; re-bundle on next snapshot
User opens "Force AI sync"	Full re-pull
Daily idle window (03:00 local on the desktop)	Background re-pull if the snapshot is > 24 h old

9. Security

The snapshot endpoint requires a device-bound JWT (MELMASTOON.IDENTITY.DEVICE_NOT_BOUND otherwise).
The signed EdgeModelManifest is verified by the desktop main process at every model load via the KMS public key embedded in the binary; tampering refuses load.
The vectors blob URL is short-TTL (1 h) and tenant-scoped.
The desktop encrypts the local SQLite snapshot at rest with the device-binding key (Argon2id-derived from the device passphrase + Ed25519 device private key); see ADR-0003 §5.
The desktop never has a service-account JWT; only the device-bound subject token.

10. Backwards compatibility

Snapshot payloads carry schemaVersion: 1. Cloud emits the highest version supported by the requesting client (declared via X-Client-Version header). Removed fields require a major version bump and a 30-day overlap window.

11. Test coverage

test/integration/sync-snapshot.spec.ts ensures:

A new tenant gets a non-empty capabilities + empty RAG bundle.
Promoting a prompt invalidates the snapshot cache for affected tenants.
An edge inference push that omits provenance is rejected.
A duplicate requestId is idempotent.
An out-of-range since cursor returns MELMASTOON.SYNC.CURSOR_OUT_OF_RANGE and forces a full pull.
A push for a different tenant than the device-bound JWT returns MELMASTOON.GENERAL.CROSS_TENANT_REFERENCE.

1. Why this service participates in sync​

2. Aggregate sync policy table​

3. Snapshot endpoints​

4. Push endpoints​

5. Conflict semantics​

6. Audit-of-edge​

7. Snapshot size budget​

8. Refresh cadence​

9. Security​

10. Backwards compatibility​

11. Test coverage​