search-aggregation-service — AI_INTEGRATION

Companion: SERVICE_OVERVIEW · APPLICATION_LOGIC · DATA_MODEL · SECURITY_MODEL · ../../docs/08-ai-architecture.md

1. Posture

search-aggregation-service is an AI consumer, never an AI orchestrator. It does not call LLMs directly; all model calls go through ai-orchestrator-service per 08 AI Architecture. It uses AI for three narrow read-side capabilities, all of which degrade gracefully when AI is unavailable:

Capability	Phase	LLM call?	Embedding?	Degrade-without-AI behaviour
Multilingual query understanding	Phase 1	one cached intent classification per canonical-query-hash	no	Treat query as keyword text in detected language
Semantic re-rank (k-NN over hotel embeddings)	Phase 2+	no	yes — `pgvector` + OpenSearch k-NN	Skip semantic boost, BM25 + popularity only
AI-generated city/hotel summaries (optional copy)	Phase 3+	yes (cached, HITL-gated)	no	Show static description

The service never uses AI to make ranking decisions that affect monetization (boost rules, sponsored slots) without HITL approval recorded as AIProvenance.

2. Allowed AI surfaces

2.1 Query understanding (`QueryUnderstandingPort`)

Adapter calls POST {AI_ORCHESTRATOR_BASE_URL}/v1/intents/parse-search with:

{
  "text": "هتل سه ستاره در کابل با وای فای رایگان",
  "locale": "fa",
  "region": "AF",
  "tenantId": null
}

Response:

{
  "intent": "lodging.search",
  "extracted": {
    "destination": { "city": "kabul", "confidence": 0.97 },
    "starRating": { "min": 3, "confidence": 0.93 },
    "amenities":  [{ "code": "wifi.free", "confidence": 0.95 }],
    "datesHint":  null
  },
  "provenance": {
    "model": "gemini-2.0-flash",
    "promptVersion": "search-intent@2025-04-01.v3",
    "latencyMs": 142,
    "costMicros": 84,
    "redactionApplied": true,
    "hitlReviewed": false
  }
}

Constraints:

30-day cache keyed on sha256(locale|region|normalize(text)). Cache hit ⇒ no orchestrator call.
800 ms hard timeout. On timeout/error: extracted = null, search proceeds with raw text.
tenantId is always null in cross-tenant search ⇒ no per-tenant model leakage.
The orchestrator runs PII redaction on text before the model call. The cache key is built from the redacted text.

2.2 Semantic re-rank (Phase 2+)

Adapter calls POST {AI_ORCHESTRATOR_BASE_URL}/v1/embeddings/encode once per canonical-query-hash:

{ "text": "boutique hotel central kabul wifi", "modality": "search-query.v1" }

Returns a 768-dim vector (provider-stable, e.g. gemini-text-embedding-004). The vector is used to issue a k-NN query against the OpenSearch embedding field, restricted by the same filter clause used by BM25.

Hotel embeddings are produced offline by a nightly job (embed-properties) that:

Reads new/changed HotelIndexEntry rows since the last watermark.
Builds a per-hotel "embedding text" from allow-listed fields only: name.default || description.default || city || country || amenities.join(' ').
Calls the orchestrator embeddings endpoint.
Persists the vector to Postgres pgvector column and pushes to OpenSearch embedding.
Records AIProvenance (model, promptVersion, costMicros, latencyMs) on the row.

Hard rules:

Embedding text never includes any field outside the allow-list.
A hotel without an embedding is still searchable — semantic re-rank just uses BM25 score + popularity for that doc.
Embedding model upgrade requires IndexBuild cycle so old vectors are wiped (cosine similarity isn't comparable across model versions).

2.3 City/Hotel AI summaries (Phase 3+)

Optional pre-rendered marketing copy ("things to do in Mazar-e Sharif", "highlights of Hotel Kabul Serena"). Generated by an offline job; all output is HITL-reviewed and persisted as a MarketingSnippet (separate aggregate, not in this bundle). The search service simply joins the snippet on propertyId / city for the detail page.

Hard rules:

AI text is never returned in a search response without a stored hitlReviewed=true flag.
Tenant-owned text always wins over AI-generated text. AI text is shown only if the tenant supplied none.
A user-visible "AI-assisted summary" badge is required by 07 Security & Compliance.

3. Forbidden AI surfaces

Forbidden	Why
Per-user personalization model	Cross-tenant anonymous traffic; no PII to personalize on. Recommendations belong to `bff-consumer-service` later, with explicit consent.
LLM in the hot search path (per request)	Latency budget is 250 ms p95; one LLM call eats 60 % of it. Intent parse runs only on cache miss.
LLM-generated SQL/OpenSearch DSL	Injection risk against the only cross-tenant data store. All DSL is templated and parameterized in code.
LLM that decides boost multipliers, sponsored slots, or pricing display	Direct revenue impact ⇒ HITL or rule-based only.
Model fine-tuning on search queries	Logged queries are sampled and PII-redacted; not approved as training data.
Direct provider SDK (Anthropic / Vertex / OpenAI)	All providers must be reached via `ai-orchestrator-service` for cost, audit, and rate-limit governance.

4. AIProvenance contract

Every AI-derived field on a domain object carries an AIProvenance record per DOMAIN_MODEL.md:

interface AIProvenance {
  model: string;              // e.g. "gemini-2.0-flash"
  promptVersion: string;      // e.g. "search-intent@2025-04-01.v3"
  latencyMs: number;
  costMicros: number;
  redactionApplied: boolean;
  hitlReviewed: boolean;
  reviewedBy?: UserId;
  reviewedAt?: ISODateString;
}

Persisted alongside the field (e.g. embedding_provenance jsonb, summary_provenance jsonb). Surfaced via GET /api/v1/search/hotels/{propertyId} only when the operator-admin scope is present, never on consumer responses.

5. Cost & rate limits

Surface	Calls / hour (steady state)	Hard cap (rate limiter)	Budget owner
Query intent parse	≤ 5 K (cache hit ratio target ≥ 90 %)	12 K	search-aggregation-service
Embedding (offline job)	≤ 5 K rows/day	50 K/day	search-aggregation-service
Marketing summary (offline)	≤ 100/day, HITL-gated	500/day	content-team-service (Phase 3)

All caps are enforced by the orchestrator; the service emits melmastoon.search.ai.budget_breach.v1 on cap denial and continues without AI.

6. Tenancy & data minimization

Cross-tenant: query understanding never tags a tenant; embeddings of public hotels are public allow-listed text only.
Per-tenant intelligence (e.g. occupancy forecast, dynamic pricing suggestions) is owned by pricing-service / analytics-service, not by this service.
Logs of orchestrator calls retain only cacheKey, model, latencyMs, costMicros, redactionApplied — never raw text.

7. Failure modes

Failure	Behaviour	Alert
Orchestrator 5xx	Cached intent or fallback to keyword; `ai_search_intent_fallbacks_total` counter increment	Burn alert if fallback rate > 20 % over 15 min
Orchestrator timeout	Same as above	same
Embedding job failure	Job retried with backoff; missing embeddings flagged in `index.health_alert.v1` if > 10 % of corpus	page on >24 h gap
Provider model deprecation	Embedding job halts; `IndexBuild` required to re-embed	runbook Embedding model migration
HITL-unreviewed AI summary leak	Hard alarm; surface returns 500 instead of leaking	security on-call

1. Posture​

2. Allowed AI surfaces​

2.1 Query understanding (QueryUnderstandingPort)​

2.2 Semantic re-rank (Phase 2+)​

2.3 City/Hotel AI summaries (Phase 3+)​

3. Forbidden AI surfaces​

4. AIProvenance contract​

5. Cost & rate limits​

6. Tenancy & data minimization​

7. Failure modes​

8. References​