AI_INTEGRATION — bff-consumer-service

Sibling: APPLICATION_LOGIC · API_CONTRACTS · SECURITY_MODEL

Cross-cutting: 08 AI Architecture

Posture. This BFF performs no direct AI calls in Phase 1. AI personalisation for the consumer surface (semantic ranking, LLM-driven query rewrite, cluster-bias detection) is performed inside search-aggregation-service and ai-orchestrator-service, and the results are projected into the cross-tenant index this BFF reads. We pass through the AIProvenance we receive; we never invent AI artifacts.

1. Where AI lives in this surface

Capability	Owner	This BFF's role
Semantic ranking of search results (pgvector + LLM-judged relevance)	`search-aggregation-service`	We send `(query, locale, geo)` and receive a ranked list with `aiProvenance` per row when ranking is AI-influenced
Query rewrite ("hotel near big mosque" → geo + amenity expansion)	`ai-orchestrator-service` (called by `search-aggregation-service`)	We forward the rewritten interpretation back to the client as a `queryRewriteHint` for transparency
Auto-translation of property descriptions	`ai-orchestrator-service` (translation invoked by `theme-config-service` / `property-service`)	We receive translated strings already; we surface the `translationProvenance` field but never re-translate
Bot detection (LLM-augmented in Phase 2)	This BFF (Phase 1 = rules + reCAPTCHA; Phase 2 = ai-orchestrator routing)	Phase 1: rules-only
Personalised wishlist suggestions	Deferred to Phase 2	n/a Phase 1

2. Provenance pass-through

Every AI-influenced response field carries an aiProvenance block per 08. The BFF's responsibility is:

Preserve aiProvenance on every nested object that came from an AI-touching upstream.
Strip aiProvenance only when explicitly required by privacy rules (none currently apply for the consumer surface).
Never synthesise an aiProvenance block locally.

interface AIProvenance {
  modelId: string;             // e.g. 'vertex/gemini-1.5-flash'
  promptTemplateId: string;    // e.g. 'pmt_search_rerank_v3'
  invokedAt: ISODateTime;
  decisionId?: DecisionId;     // dec_<ulid> when HITL gated
  routedThrough: 'cloud' | 'edge';
  costMicroUsd?: number;
  safetyVerdict?: 'allow' | 'review' | 'block';
}

A ListingCardVM may carry aiProvenance on the name.localized.<locale> (translation), the rank position (semantic re-rank), or the description field (auto-summary). The BFF copies these fields verbatim from upstream.

3. Phase 2 plans (informational)

3.1 LLM-augmented bot detection

When Phase 2 lands, the BotDetector will optionally route ambiguous cases (score ∈ [0.5, 0.85)) to ai-orchestrator-service via the BotJudgePort for an LLM verdict. Routing rules:

Cloud Vertex AI for primary path; ONNX Runtime edge fallback when Cloud Run instance has EDGE_AI_AVAILABLE=true.
Strict budget cap: ≤ 1 LLM call per session per minute.
HITL: not required (the LLM is advisory; the rules engine retains final say).
Provenance recorded in bot_score_log.signals as kind='llm-judge' with aiProvenance block.

interface BotJudgePort {
  judge(input: BotJudgeInput, ctx: RequestContext): Promise<{
    verdict: 'human' | 'suspect' | 'bot';
    confidence: number;
    reasoning: string;
    aiProvenance: AIProvenance;
  }>;
}

3.2 Personalised wishlist suggestions

When Phase 2 introduces consumer accounts, an ai-orchestrator-service job will produce WishlistSuggestion[] per user. The BFF will project these into a new /wishlist/suggestions endpoint, again purely as pass-through.

3.3 Conversational search

A future Phase 3 capability — conversational refinement on top of /search — will require streaming via SSE and is explicitly not in Phase 1 scope. When it lands, the BFF will own the SSE channel but delegate the LLM turn to ai-orchestrator-service.

4. Moderation + safety

When AI-touched fields (translations, summaries) flow through this BFF, the upstream service has already applied the platform moderation policy via ai-orchestrator-service. The BFF performs:

Schema validation: reject any payload where aiProvenance.safetyVerdict === 'block' — fall back to the un-translated default value.
Length sanity: cap pass-through translated name, description, policies at the same byte limits as the canonical strings (e.g., name ≤ 256 bytes).
Locale match: ensure the translated localized.<locale> key matches a locale the user requested or their fallback chain.

5. HITL surfaces

This BFF surfaces no HITL approval to consumers. Every AI artifact reaching the consumer surface has been auto-approved or human-approved upstream. The presence of an aiProvenance.safetyVerdict === 'review' value indicates the upstream HITL is still in progress; the BFF treats those values as not yet visible and falls back to the un-AI version.

6. Cost + budget

The BFF's own AI cost is zero in Phase 1. Phase 2 budgets:

Capability	Budget cap (per day per environment)	Enforcer
LLM-augmented bot detection	$25 USD	`ai-orchestrator-service` token bucket
Wishlist suggestions	$5 USD	Same

Exceeded budgets fall back gracefully (rules-only bot detection; no suggestions surfaced). Alert pages on-call when budget consumed > 80% by 10 a.m. local.

7. Observability

Every AI pass-through field carries the upstream decisionId so analysts can trace from a search-result re-rank back to the AI call in ai-orchestrator-service.
We emit an OpenTelemetry span attribute ai.upstream.invoked=true on responses where any nested object had aiProvenance. This makes it easy to filter "AI-touched responses" in Cloud Trace.
We do not double-count cost: only the originating service is the cost-attributed service for an AI call.

8. Audit + compliance

Pass-through AI responses inherit the audit position of the upstream service. No new audit entries are written by this BFF for AI artifacts. The audit-service join key is decisionId; the BFF's contribution is only that it surfaced the artifact to a guestSessionId. That linkage lives in the MetaPageView row, where the page's AI-touched fields are summarised in a aiTouchedFields[] JSON column (Phase 2; not present Phase 1).

1. Where AI lives in this surface​

2. Provenance pass-through​

3. Phase 2 plans (informational)​

3.1 LLM-augmented bot detection​

3.2 Personalised wishlist suggestions​

3.3 Conversational search​

4. Moderation + safety​

5. HITL surfaces​

6. Cost + budget​

7. Observability​

8. Audit + compliance​