ai-orchestrator-service — Sync Contract
Companion to:
DATA_MODEL.md· ADR-0003 Electron Offline-First · 08 AI Architecture §9
1. Why this service participates in sync
The Electron desktop must continue producing AI artifacts when the cloud is unreachable: anomaly flags, message drafts via Phi-3-mini, tutor answers, RAG over local policies, photo-quality scoring, and housekeeping route optimization. To do that the desktop needs a read-only snapshot of:
- The active prompt registry (
prompt_versionsrows instatus='active', restricted to capabilities the tenant can run on the edge). - The edge model manifest (the signed
EdgeModelManifestpublished row). - The active capability catalog (capabilities + fallback chains + HITL config).
- A per-tenant edge RAG bundle — a subset of
embeddings_edge+rag_chunksfor the tenant'spoliciesandfaqnamespaces. - The active model catalog (so the desktop knows model versions for provenance).
- The AB sticky assignment for this tenant (so a draft prompt versioned at 5% online stays sticky offline).
The desktop does not sync inference audit, provenance, or HITL data downward. Those are server-authoritative and stay in the cloud. Edge inference still produces audit (see §6 below).
2. Aggregate sync policy table
| Aggregate | Direction | Conflict policy | Snapshot scope | Notes |
|---|---|---|---|---|
Capability | cloud → desktop (read-only) | server_authoritative | All active; tenant-enabled subset filtered server-side | Replaced atomically per snapshot |
Prompt / PromptVersion | cloud → desktop (read-only) | server_authoritative | All active versions for capabilities the desktop can run | Pinned to the AB assignment row for this tenant |
Model | cloud → desktop (read-only) | server_authoritative | All available rows for provider='onnx-edge' plus a metadata row per cloud model used in provenance | |
EdgeModelManifest | cloud → desktop (read-only) | server_authoritative | The single published row | Signature verified at every load |
RAGCorpus (edge) | cloud → desktop (read-only) | server_authoritative | Per-tenant policies + faq namespaces | Limit to embedding_dim = 384 (edge bundle) |
embeddings_edge + rag_chunks for that corpus | cloud → desktop (read-only) | server_authoritative | Per-tenant subset; capped at 50,000 chunks per corpus per tenant | Compressed bundle in the snapshot |
BudgetCounter | cloud → desktop (read-only, throttled) | server_authoritative | Current period for tenant | Refreshed on every sync (≤ 5 min stale tolerated) |
AB assignment | cloud → desktop (read-only) | server_authoritative | Per-tenant per-capability | |
InferenceRequest | desktop → cloud (push) | append_only | Edge inference audit | One-shot push; idempotent on requestId |
InferenceResult | desktop → cloud (push) | append_only | Same | Carries provenance with local: true |
Provenance | desktop → cloud (push) | append_only | Same | Computed locally; cloud verifies + persists |
HitlGate | (none on edge) | n/a | HITL is cloud-orchestrated; the desktop UI consumes them via the cloud API when online; offline HITL is not allowed for AI-drafted artifacts | |
HitlDecision | desktop → cloud (push) | append_only | Decisions made offline against gates that were already open before disconnection are queued; cloud accepts on idempotent (gateId, reviewerUserId) | |
EvalSuite / EvalRun | (none) | n/a | Eval runs are cloud-only |
3. Snapshot endpoints
The desktop pulls a single snapshot per session start (and on demand from the in-app "Force AI sync" action) via:
GET /api/v1/sync/v1/pull?since=<cursor>&aggregates=ai-orchestrator
handled by sync-service which delegates to ai-orchestrator-service for the AI portion. The response payload shape:
{
"ai-orchestrator": {
"cursor": "ai_2026-05-12T01:31:09.412Z_evt_01H...",
"capabilities": [ /* active rows; ≤ 200 */ ],
"promptVersions": [ /* active rows pinned to AB; ≤ 200 */ ],
"abAssignments": [ /* per-capability for this tenant */ ],
"models": [ /* edge models + cloud-model metadata used in provenance */ ],
"edgeModelManifest": { /* the signed manifest */ },
"ragBundles": [
{
"corpusId": "rag_01H...",
"namespace": "policies",
"embeddingDim": 384,
"chunks": [ /* { chunkId, text, metadata, sourceUri } */ ],
"vectorsBlobUri": "https://desktop-snapshots.../tenants/.../policies.fvecs.zst"
}
],
"budgetSnapshot": { /* per-scope counters */ }
}
}
vectorsBlobUri is a per-snapshot signed GCS URL; vectors travel as a compressed binary blob (fvecs.zst) to keep the JSON payload small. The desktop main process loads them into SQLite + a local HNSW index (or sqlite-vss) on first use.
4. Push endpoints
POST /api/v1/sync/v1/push
The desktop pushes batches. AI-related batches contain:
{
"ai-orchestrator": {
"edgeInference": [
{
"requestId": "ifr_01H...",
"capabilityKey": "message.draft",
"tenantId": "tnt_01H...",
"promptVersionId": "pmv_01H...",
"inputHash": "sha256:...",
"redactedInputHash": "sha256:...",
"completedAt": "2026-05-12T01:30:00Z",
"latencyMs": 2841,
"status": "completed",
"outputJson": { /* schema-validated locally; cloud re-validates */ },
"provenance": {
"promptVersionId": "pmv_01H...",
"promptVersionNo": 3,
"model": { "provider": "onnx-edge", "name": "phi-3-mini-4k-instruct", "version": "int4-2.4.1" },
"tokens": { "input": 612, "output": 184 },
"costMicros": 0,
"local": true,
"cacheHit": false,
"safety": { "input": "pass", "output": "pass" }
}
}
],
"hitlDecisions": [ /* decisions made offline against still-open gates */ ]
}
}
The cloud:
- Dedupes on
requestId. - Re-validates the output against the pinned prompt's
output_schema_json. Mismatch → rejects withMELMASTOON.AI.OUTPUT_INVALID; the desktop surfaces the artifact asdegradedand asks for a re-draft online. - Re-runs server-side moderation on the output. If
block, the artifact is replaced with deterministic fallback andmelmastoon.ai_orchestrator.moderation.flagged.v1is emitted withside: 'output', source: 'edge_replay'. - Persists the inference + provenance rows in the cloud (with
local: trueflag). - Emits
melmastoon.ai_orchestrator.inference.completed.v1withlocal: trueso analytics + downstream subscribers see the same envelope.
5. Conflict semantics
Read-only snapshots cannot conflict — the cloud is authoritative. Push-side conflicts:
| Conflict | Resolution |
|---|---|
Edge inference duplicate requestId | Idempotent — return the original cloud-persisted result |
| Edge HITL decision against a gate the cloud has already auto-closed (timeout) | The cloud's auto-decision wins; the desktop's late decision is recorded as decision.outcome with superseded: true and not used to gate downstream effects |
| Edge inference produced by a deprecated prompt version (desktop bundle was stale) | Cloud accepts the audit row but flags provenance.notes: 'deprecated_prompt'; downstream subscribers may still consume |
| Edge embedding dimension mismatch (e.g., 768 sent for an edge corpus) | Reject with MELMASTOON.AI.OUTPUT_INVALID |
| Edge moderation passed but cloud post-moderation blocks | Cloud wins; artifact replaced with deterministic fallback; user sees a banner explaining "your offline draft contained content that didn't pass server moderation" |
6. Audit-of-edge
Every edge inference must produce inference.completed.v1 on next sync. To prove no edge artifact is "lost", the cloud reconciler runs a daily job comparing the count of edge artifacts created locally (the desktop emits a hourly edge.inference.tally event) with the count of inference.completed.v1 events with provenance.local = true for that tenant. Drift > 1% pages the AI on-call.
7. Snapshot size budget
| Bundle | Hard cap | Behavior on overflow |
|---|---|---|
| Capabilities + prompts + manifest + AB + models + budget | 1 MB JSON | Hard fail; degrade caps before snapshot |
| Per-corpus RAG bundle (chunks + metadata, JSON) | 5 MB JSON | Truncate by importance score (manual tag in metadata.priority) |
| Per-corpus vectors blob (.fvecs.zst) | 80 MB | Truncate matching the chunk truncation; emit melmastoon.ai_orchestrator.edge_rag.truncated.v1 |
| Total snapshot | 100 MB | Reject; require ?aggregates=ai-orchestrator&namespaces=policies to scope |
8. Refresh cadence
| Trigger | Action |
|---|---|
| Desktop session start | Pull full AI snapshot |
melmastoon.ai_orchestrator.edge_model.manifest_updated.v1 consumed by sync-service | Push notification to online desktops; opportunistic re-pull on next idle |
melmastoon.ai_orchestrator.prompt.version_published.v1 | Same as above |
Tenant changes a policies or faq document | Re-ingest in cloud; re-bundle on next snapshot |
| User opens "Force AI sync" | Full re-pull |
| Daily idle window (03:00 local on the desktop) | Background re-pull if the snapshot is > 24 h old |
9. Security
- The snapshot endpoint requires a device-bound JWT (
MELMASTOON.IDENTITY.DEVICE_NOT_BOUNDotherwise). - The signed
EdgeModelManifestis verified by the desktop main process at every model load via the KMS public key embedded in the binary; tampering refuses load. - The vectors blob URL is short-TTL (1 h) and tenant-scoped.
- The desktop encrypts the local SQLite snapshot at rest with the device-binding key (Argon2id-derived from the device passphrase + Ed25519 device private key); see ADR-0003 §5.
- The desktop never has a service-account JWT; only the device-bound subject token.
10. Backwards compatibility
Snapshot payloads carry schemaVersion: 1. Cloud emits the highest version supported by the requesting client (declared via X-Client-Version header). Removed fields require a major version bump and a 30-day overlap window.
11. Test coverage
test/integration/sync-snapshot.spec.ts ensures:
- A new tenant gets a non-empty capabilities + empty RAG bundle.
- Promoting a prompt invalidates the snapshot cache for affected tenants.
- An edge inference push that omits provenance is rejected.
- A duplicate
requestIdis idempotent. - An out-of-range
sincecursor returnsMELMASTOON.SYNC.CURSOR_OUT_OF_RANGEand forces a full pull. - A push for a different tenant than the device-bound JWT returns
MELMASTOON.GENERAL.CROSS_TENANT_REFERENCE.