AI_INTEGRATION — bff-consumer-service
Sibling: APPLICATION_LOGIC · API_CONTRACTS · SECURITY_MODEL
Cross-cutting: 08 AI Architecture
Posture. This BFF performs no direct AI calls in Phase 1. AI personalisation for the consumer surface (semantic ranking, LLM-driven query rewrite, cluster-bias detection) is performed inside
search-aggregation-serviceandai-orchestrator-service, and the results are projected into the cross-tenant index this BFF reads. We pass through theAIProvenancewe receive; we never invent AI artifacts.
1. Where AI lives in this surface
| Capability | Owner | This BFF's role |
|---|---|---|
| Semantic ranking of search results (pgvector + LLM-judged relevance) | search-aggregation-service | We send (query, locale, geo) and receive a ranked list with aiProvenance per row when ranking is AI-influenced |
| Query rewrite ("hotel near big mosque" → geo + amenity expansion) | ai-orchestrator-service (called by search-aggregation-service) | We forward the rewritten interpretation back to the client as a queryRewriteHint for transparency |
| Auto-translation of property descriptions | ai-orchestrator-service (translation invoked by theme-config-service / property-service) | We receive translated strings already; we surface the translationProvenance field but never re-translate |
| Bot detection (LLM-augmented in Phase 2) | This BFF (Phase 1 = rules + reCAPTCHA; Phase 2 = ai-orchestrator routing) | Phase 1: rules-only |
| Personalised wishlist suggestions | Deferred to Phase 2 | n/a Phase 1 |
2. Provenance pass-through
Every AI-influenced response field carries an aiProvenance block per 08. The BFF's responsibility is:
- Preserve
aiProvenanceon every nested object that came from an AI-touching upstream. - Strip
aiProvenanceonly when explicitly required by privacy rules (none currently apply for the consumer surface). - Never synthesise an
aiProvenanceblock locally.
interface AIProvenance {
modelId: string; // e.g. 'vertex/gemini-1.5-flash'
promptTemplateId: string; // e.g. 'pmt_search_rerank_v3'
invokedAt: ISODateTime;
decisionId?: DecisionId; // dec_<ulid> when HITL gated
routedThrough: 'cloud' | 'edge';
costMicroUsd?: number;
safetyVerdict?: 'allow' | 'review' | 'block';
}
A ListingCardVM may carry aiProvenance on the name.localized.<locale> (translation), the rank position (semantic re-rank), or the description field (auto-summary). The BFF copies these fields verbatim from upstream.
3. Phase 2 plans (informational)
3.1 LLM-augmented bot detection
When Phase 2 lands, the BotDetector will optionally route ambiguous cases (score ∈ [0.5, 0.85)) to ai-orchestrator-service via the BotJudgePort for an LLM verdict. Routing rules:
- Cloud Vertex AI for primary path; ONNX Runtime edge fallback when Cloud Run instance has
EDGE_AI_AVAILABLE=true. - Strict budget cap: ≤ 1 LLM call per session per minute.
- HITL: not required (the LLM is advisory; the rules engine retains final say).
- Provenance recorded in
bot_score_log.signalsaskind='llm-judge'withaiProvenanceblock.
interface BotJudgePort {
judge(input: BotJudgeInput, ctx: RequestContext): Promise<{
verdict: 'human' | 'suspect' | 'bot';
confidence: number;
reasoning: string;
aiProvenance: AIProvenance;
}>;
}
3.2 Personalised wishlist suggestions
When Phase 2 introduces consumer accounts, an ai-orchestrator-service job will produce WishlistSuggestion[] per user. The BFF will project these into a new /wishlist/suggestions endpoint, again purely as pass-through.
3.3 Conversational search
A future Phase 3 capability — conversational refinement on top of /search — will require streaming via SSE and is explicitly not in Phase 1 scope. When it lands, the BFF will own the SSE channel but delegate the LLM turn to ai-orchestrator-service.
4. Moderation + safety
When AI-touched fields (translations, summaries) flow through this BFF, the upstream service has already applied the platform moderation policy via ai-orchestrator-service. The BFF performs:
- Schema validation: reject any payload where
aiProvenance.safetyVerdict === 'block'— fall back to the un-translated default value. - Length sanity: cap pass-through translated
name,description,policiesat the same byte limits as the canonical strings (e.g., name ≤ 256 bytes). - Locale match: ensure the translated
localized.<locale>key matches a locale the user requested or their fallback chain.
5. HITL surfaces
This BFF surfaces no HITL approval to consumers. Every AI artifact reaching the consumer surface has been auto-approved or human-approved upstream. The presence of an aiProvenance.safetyVerdict === 'review' value indicates the upstream HITL is still in progress; the BFF treats those values as not yet visible and falls back to the un-AI version.
6. Cost + budget
The BFF's own AI cost is zero in Phase 1. Phase 2 budgets:
| Capability | Budget cap (per day per environment) | Enforcer |
|---|---|---|
| LLM-augmented bot detection | $25 USD | ai-orchestrator-service token bucket |
| Wishlist suggestions | $5 USD | Same |
Exceeded budgets fall back gracefully (rules-only bot detection; no suggestions surfaced). Alert pages on-call when budget consumed > 80% by 10 a.m. local.
7. Observability
- Every AI pass-through field carries the upstream
decisionIdso analysts can trace from a search-result re-rank back to the AI call inai-orchestrator-service. - We emit an OpenTelemetry span attribute
ai.upstream.invoked=trueon responses where any nested object hadaiProvenance. This makes it easy to filter "AI-touched responses" in Cloud Trace. - We do not double-count cost: only the originating service is the cost-attributed service for an AI call.
8. Audit + compliance
Pass-through AI responses inherit the audit position of the upstream service. No new audit entries are written by this BFF for AI artifacts. The audit-service join key is decisionId; the BFF's contribution is only that it surfaced the artifact to a guestSessionId. That linkage lives in the MetaPageView row, where the page's AI-touched fields are summarised in a aiTouchedFields[] JSON column (Phase 2; not present Phase 1).