Skip to main content

ai-orchestrator-service — Sync Contract

Companion to: DATA_MODEL.md · ADR-0003 Electron Offline-First · 08 AI Architecture §9

1. Why this service participates in sync

The Electron desktop must continue producing AI artifacts when the cloud is unreachable: anomaly flags, message drafts via Phi-3-mini, tutor answers, RAG over local policies, photo-quality scoring, and housekeeping route optimization. To do that the desktop needs a read-only snapshot of:

  1. The active prompt registry (prompt_versions rows in status='active', restricted to capabilities the tenant can run on the edge).
  2. The edge model manifest (the signed EdgeModelManifest published row).
  3. The active capability catalog (capabilities + fallback chains + HITL config).
  4. A per-tenant edge RAG bundle — a subset of embeddings_edge + rag_chunks for the tenant's policies and faq namespaces.
  5. The active model catalog (so the desktop knows model versions for provenance).
  6. The AB sticky assignment for this tenant (so a draft prompt versioned at 5% online stays sticky offline).

The desktop does not sync inference audit, provenance, or HITL data downward. Those are server-authoritative and stay in the cloud. Edge inference still produces audit (see §6 below).

2. Aggregate sync policy table

AggregateDirectionConflict policySnapshot scopeNotes
Capabilitycloud → desktop (read-only)server_authoritativeAll active; tenant-enabled subset filtered server-sideReplaced atomically per snapshot
Prompt / PromptVersioncloud → desktop (read-only)server_authoritativeAll active versions for capabilities the desktop can runPinned to the AB assignment row for this tenant
Modelcloud → desktop (read-only)server_authoritativeAll available rows for provider='onnx-edge' plus a metadata row per cloud model used in provenance
EdgeModelManifestcloud → desktop (read-only)server_authoritativeThe single published rowSignature verified at every load
RAGCorpus (edge)cloud → desktop (read-only)server_authoritativePer-tenant policies + faq namespacesLimit to embedding_dim = 384 (edge bundle)
embeddings_edge + rag_chunks for that corpuscloud → desktop (read-only)server_authoritativePer-tenant subset; capped at 50,000 chunks per corpus per tenantCompressed bundle in the snapshot
BudgetCountercloud → desktop (read-only, throttled)server_authoritativeCurrent period for tenantRefreshed on every sync (≤ 5 min stale tolerated)
AB assignmentcloud → desktop (read-only)server_authoritativePer-tenant per-capability
InferenceRequestdesktop → cloud (push)append_onlyEdge inference auditOne-shot push; idempotent on requestId
InferenceResultdesktop → cloud (push)append_onlySameCarries provenance with local: true
Provenancedesktop → cloud (push)append_onlySameComputed locally; cloud verifies + persists
HitlGate(none on edge)n/aHITL is cloud-orchestrated; the desktop UI consumes them via the cloud API when online; offline HITL is not allowed for AI-drafted artifacts
HitlDecisiondesktop → cloud (push)append_onlyDecisions made offline against gates that were already open before disconnection are queued; cloud accepts on idempotent (gateId, reviewerUserId)
EvalSuite / EvalRun(none)n/aEval runs are cloud-only

3. Snapshot endpoints

The desktop pulls a single snapshot per session start (and on demand from the in-app "Force AI sync" action) via:

GET /api/v1/sync/v1/pull?since=<cursor>&aggregates=ai-orchestrator

handled by sync-service which delegates to ai-orchestrator-service for the AI portion. The response payload shape:

{
"ai-orchestrator": {
"cursor": "ai_2026-05-12T01:31:09.412Z_evt_01H...",
"capabilities": [ /* active rows; ≤ 200 */ ],
"promptVersions": [ /* active rows pinned to AB; ≤ 200 */ ],
"abAssignments": [ /* per-capability for this tenant */ ],
"models": [ /* edge models + cloud-model metadata used in provenance */ ],
"edgeModelManifest": { /* the signed manifest */ },
"ragBundles": [
{
"corpusId": "rag_01H...",
"namespace": "policies",
"embeddingDim": 384,
"chunks": [ /* { chunkId, text, metadata, sourceUri } */ ],
"vectorsBlobUri": "https://desktop-snapshots.../tenants/.../policies.fvecs.zst"
}
],
"budgetSnapshot": { /* per-scope counters */ }
}
}

vectorsBlobUri is a per-snapshot signed GCS URL; vectors travel as a compressed binary blob (fvecs.zst) to keep the JSON payload small. The desktop main process loads them into SQLite + a local HNSW index (or sqlite-vss) on first use.

4. Push endpoints

POST /api/v1/sync/v1/push

The desktop pushes batches. AI-related batches contain:

{
"ai-orchestrator": {
"edgeInference": [
{
"requestId": "ifr_01H...",
"capabilityKey": "message.draft",
"tenantId": "tnt_01H...",
"promptVersionId": "pmv_01H...",
"inputHash": "sha256:...",
"redactedInputHash": "sha256:...",
"completedAt": "2026-05-12T01:30:00Z",
"latencyMs": 2841,
"status": "completed",
"outputJson": { /* schema-validated locally; cloud re-validates */ },
"provenance": {
"promptVersionId": "pmv_01H...",
"promptVersionNo": 3,
"model": { "provider": "onnx-edge", "name": "phi-3-mini-4k-instruct", "version": "int4-2.4.1" },
"tokens": { "input": 612, "output": 184 },
"costMicros": 0,
"local": true,
"cacheHit": false,
"safety": { "input": "pass", "output": "pass" }
}
}
],
"hitlDecisions": [ /* decisions made offline against still-open gates */ ]
}
}

The cloud:

  1. Dedupes on requestId.
  2. Re-validates the output against the pinned prompt's output_schema_json. Mismatch → rejects with MELMASTOON.AI.OUTPUT_INVALID; the desktop surfaces the artifact as degraded and asks for a re-draft online.
  3. Re-runs server-side moderation on the output. If block, the artifact is replaced with deterministic fallback and melmastoon.ai_orchestrator.moderation.flagged.v1 is emitted with side: 'output', source: 'edge_replay'.
  4. Persists the inference + provenance rows in the cloud (with local: true flag).
  5. Emits melmastoon.ai_orchestrator.inference.completed.v1 with local: true so analytics + downstream subscribers see the same envelope.

5. Conflict semantics

Read-only snapshots cannot conflict — the cloud is authoritative. Push-side conflicts:

ConflictResolution
Edge inference duplicate requestIdIdempotent — return the original cloud-persisted result
Edge HITL decision against a gate the cloud has already auto-closed (timeout)The cloud's auto-decision wins; the desktop's late decision is recorded as decision.outcome with superseded: true and not used to gate downstream effects
Edge inference produced by a deprecated prompt version (desktop bundle was stale)Cloud accepts the audit row but flags provenance.notes: 'deprecated_prompt'; downstream subscribers may still consume
Edge embedding dimension mismatch (e.g., 768 sent for an edge corpus)Reject with MELMASTOON.AI.OUTPUT_INVALID
Edge moderation passed but cloud post-moderation blocksCloud wins; artifact replaced with deterministic fallback; user sees a banner explaining "your offline draft contained content that didn't pass server moderation"

6. Audit-of-edge

Every edge inference must produce inference.completed.v1 on next sync. To prove no edge artifact is "lost", the cloud reconciler runs a daily job comparing the count of edge artifacts created locally (the desktop emits a hourly edge.inference.tally event) with the count of inference.completed.v1 events with provenance.local = true for that tenant. Drift > 1% pages the AI on-call.

7. Snapshot size budget

BundleHard capBehavior on overflow
Capabilities + prompts + manifest + AB + models + budget1 MB JSONHard fail; degrade caps before snapshot
Per-corpus RAG bundle (chunks + metadata, JSON)5 MB JSONTruncate by importance score (manual tag in metadata.priority)
Per-corpus vectors blob (.fvecs.zst)80 MBTruncate matching the chunk truncation; emit melmastoon.ai_orchestrator.edge_rag.truncated.v1
Total snapshot100 MBReject; require ?aggregates=ai-orchestrator&namespaces=policies to scope

8. Refresh cadence

TriggerAction
Desktop session startPull full AI snapshot
melmastoon.ai_orchestrator.edge_model.manifest_updated.v1 consumed by sync-servicePush notification to online desktops; opportunistic re-pull on next idle
melmastoon.ai_orchestrator.prompt.version_published.v1Same as above
Tenant changes a policies or faq documentRe-ingest in cloud; re-bundle on next snapshot
User opens "Force AI sync"Full re-pull
Daily idle window (03:00 local on the desktop)Background re-pull if the snapshot is > 24 h old

9. Security

  • The snapshot endpoint requires a device-bound JWT (MELMASTOON.IDENTITY.DEVICE_NOT_BOUND otherwise).
  • The signed EdgeModelManifest is verified by the desktop main process at every model load via the KMS public key embedded in the binary; tampering refuses load.
  • The vectors blob URL is short-TTL (1 h) and tenant-scoped.
  • The desktop encrypts the local SQLite snapshot at rest with the device-binding key (Argon2id-derived from the device passphrase + Ed25519 device private key); see ADR-0003 §5.
  • The desktop never has a service-account JWT; only the device-bound subject token.

10. Backwards compatibility

Snapshot payloads carry schemaVersion: 1. Cloud emits the highest version supported by the requesting client (declared via X-Client-Version header). Removed fields require a major version bump and a 30-day overlap window.

11. Test coverage

test/integration/sync-snapshot.spec.ts ensures:

  • A new tenant gets a non-empty capabilities + empty RAG bundle.
  • Promoting a prompt invalidates the snapshot cache for affected tenants.
  • An edge inference push that omits provenance is rejected.
  • A duplicate requestId is idempotent.
  • An out-of-range since cursor returns MELMASTOON.SYNC.CURSOR_OUT_OF_RANGE and forces a full pull.
  • A push for a different tenant than the device-bound JWT returns MELMASTOON.GENERAL.CROSS_TENANT_REFERENCE.