AI_INTEGRATION — bff-backoffice-service

Sibling: API_CONTRACTS · APPLICATION_LOGIC · DOMAIN_MODEL · SECURITY_MODEL

Cross-cutting: 08 AI Architecture · ADR-0003 §6 AI inference placement

1. Posture

This BFF makes no direct LLM calls and runs no on-device inference. It is an orchestration layer between the Electron desktop's AI surfaces (suggestion inbox, anomaly badges, "ask Ghasi" prompt) and two upstream AI runtimes:

Cloud AI — ai-orchestrator-service for cloud-hosted models (planning, summarization, what-if forecasting).
Edge AI — ONNX Runtime Node inside the desktop's Electron main process, for housekeeping reorder, anomaly heuristics, demand smoothing, image quality scoring (per ADR-0003 §6).

The BFF's role is to: list AI suggestions from the cloud orchestrator, record operator decisions with full audit, render AI provenance to the operator UI, and emit decision telemetry. Edge inference results bypass the BFF — they are written to the desktop's local outbox and replayed via the standard sync stream.

2. Capabilities surfaced (cloud)

Capability	Trigger	Upstream call	Decision UX
Overbooking warning	Inventory + reservation drift	`ai-orchestrator-service.fetchSuggestions`	Operator can override; decision logged
Rate-change suggestion	Pricing engine + occupancy curve	same	Decision logged + reason captured
Housekeeping reorder	Arrival pattern + housekeeping queue	same	One-click apply; modified delta possible
Maintenance priority	Work order + room arrival overlap	same	Operator confirms or overrides
Guest special handling	Guest profile + reservation	same	Suggestion only; no autonomous action
Staffing	Forecast + roster	same	Advisory; logs decision
Audit anomaly	Operator activity + comparison	same	High-severity; full audit

All suggestions are advisory; nothing autonomously mutates a domain aggregate. Acceptance triggers a normal mutation proxy via /reservations/*, /housekeeping/*, etc., still owned by the operator.

3. Capabilities surfaced (edge)

Edge inference is not routed through this BFF. The desktop main process invokes ONNX Runtime Node directly. The BFF sees the result only when the desktop subsequently:

Fires a normal mutation proxy whose decision included edge AI input (e.g., an accepted housekeeping reorder), or
Replays the local ai.inference.local.completed.v1 event on next sync.

The BFF therefore renders edge-AI provenance via the same provenance envelope (with modelClass: 'edge') but plays no orchestration role.

4. Calling pattern (cloud)

class FetchAISuggestionsUseCase {
  async execute(ctx: SessionContext, propertyId: PropertyId, filter: AiFilter): Promise<AiSuggestionListVm> {
    const cached = await this.cache.read(`ai:inbox:${ctx.tenantId}:${propertyId}`);
    if (cached && !filter.bypassCache) return cached;

    const upstream = await this.aiOrchestrator.fetchSuggestions({
      tenantId: ctx.tenantId,
      propertyId,
      operatorRole: ctx.session.primaryRole(),
      categoryFilter: filter.category,
      limit: filter.limit ?? 20,
    });

    const vm = composeAiInboxVm(upstream);
    await this.cache.writeWithTtl(`ai:inbox:${ctx.tenantId}:${propertyId}`, vm, 60);
    return vm;
  }
}

The BFF does not mint prompts, choose models, or post-process model output. Those concerns live in ai-orchestrator-service.

5. Telemetry annotations

Every event published by this BFF for an action that involved AI carries an aiInfluence envelope:

"aiInfluence": {
  "suggestionsViewed": ["sg_..."],
  "suggestionsAccepted": ["sg_..."],
  "suggestionsRejected": [],
  "edgeInferenceUsed": false
}

This makes downstream analytics able to compare AI-influenced vs operator-only decisions for measuring uplift.

6. Failure handling

AI orchestrator down → AI surfaces hidden in dashboard + workbench; operator sees no banner; aiAvailable: false on bootstrap response.
AI orchestrator slow → 800 ms deadline; partial composition; suggestions fall through with cached-stale tag.
Edge inference failure → owned by desktop; never surfaces here.
Decision recording failure → return 503 + retry; idempotency key absorbs duplicates.
Notify-orchestrator-of-decision failure → recorded locally; orchestrator reconciles via inbox sync.

The default UX rule: silent degradation. AI is advisory; missing AI never blocks operator work.

7. PII handling

The BFF never sends raw PII to the orchestrator. The fetch request body carries:

tenantId, propertyId, operatorRole
category, severity, limit
lastSeenAt (for delta polling)
never operator name / email / phone; never guest names; never folio details

The orchestrator's prompt assembly is its concern; it has its own PII rules per 08 AI Architecture §5.

Decision payloads include notes (free-text). We truncate at 500 chars before persisting and at 200 chars before exporting to BigQuery. We never send notes back to the orchestrator (orchestrator does not need them for next-suggestion ranking).

8. Caching AI outputs

Cache	TTL	Rationale
AI inbox list per (tenant, property)	60 s	Suggestions don't change every second
Per-suggestion detail	5 min	Detail is heavier; rarely re-fetched
Provenance digest per (model, modelVersion)	24 h	Used for UI rendering; rarely changes

Bypass with ?bypassCache=true on GET /ai/suggestions (operator-initiated refresh).

9. Feature flags

Flag	Default	Purpose
`ai.surfaces.enabled[<tenantId>]`	true	Per-tenant kill switch
`ai.suggestion.categories.enabled`	per-tenant list	Limit visible categories
`ai.cache.ttlSec`	60	TTL override
`ai.transport.preferred`	`sse`	Push vs polling for AI inbox updates
`ai.decision.requireMfa[<category>]`	false	Force step-up on certain decisions

Flags loaded from bff-backoffice-flags Memorystore key with 30 s refresh.

10. Compliance

Provenance recorded on every suggestion: model, modelVersion, promptVersion, modelClass, signatureFingerprint. Stored 7 years in ai_decision_log.
Operator can always override; no autonomous mutations.
Sharia-compliance flag passes through to orchestrator via complianceProfile; orchestrator filters suggestions accordingly.
Audit lake export 7 y; satisfies regulatory queries.
AI usage disclosure in operator-facing UI (small "Suggested by Ghasi AI" badge with model + provenance link).

11. Performance targets

Metric	Target
`GET /ai/suggestions` p95 (cached)	< 50 ms
`GET /ai/suggestions` p95 (composed)	< 500 ms
`POST /ai/suggestions/{id}/decide` p95	< 200 ms (decision recorded; orchestrator notify async)
SSE `ai.new` push latency p95	< 200 ms from orchestrator emit

12. Sharia / regional compliance

For tenants flagged complianceProfile = sharia:

Pricing change suggestions filtered to remove interest-bearing payment plan recommendations.
Loyalty point suggestions filtered to remove gambling-style mechanics.
Image-quality suggestions skip review of any imagery flagged as religious-sensitive.
Filter applied at orchestrator; the BFF only enforces by passing complianceProfile and refusing to render unfiltered suggestions if the orchestrator response lacks the complianceFiltered: true attestation.

13. Cross-links

services/ai-orchestrator-service/ — upstream AI runtime
docs/08-ai-architecture.md — platform AI architecture
ADR-0003 §6 — edge inference placement
SECURITY_MODEL — MFA gates on AI-driven decisions
API_CONTRACTS §13–14 — AI suggestion endpoints

1. Posture​

2. Capabilities surfaced (cloud)​

3. Capabilities surfaced (edge)​

4. Calling pattern (cloud)​

5. Telemetry annotations​

6. Failure handling​

7. PII handling​

8. Caching AI outputs​

9. Feature flags​

10. Compliance​

11. Performance targets​

12. Sharia / regional compliance​

13. Cross-links​