Skip to main content

search-aggregation-service — AI_INTEGRATION

Companion: SERVICE_OVERVIEW · APPLICATION_LOGIC · DATA_MODEL · SECURITY_MODEL · ../../docs/08-ai-architecture.md

1. Posture

search-aggregation-service is an AI consumer, never an AI orchestrator. It does not call LLMs directly; all model calls go through ai-orchestrator-service per 08 AI Architecture. It uses AI for three narrow read-side capabilities, all of which degrade gracefully when AI is unavailable:

CapabilityPhaseLLM call?Embedding?Degrade-without-AI behaviour
Multilingual query understandingPhase 1one cached intent classification per canonical-query-hashnoTreat query as keyword text in detected language
Semantic re-rank (k-NN over hotel embeddings)Phase 2+noyes — pgvector + OpenSearch k-NNSkip semantic boost, BM25 + popularity only
AI-generated city/hotel summaries (optional copy)Phase 3+yes (cached, HITL-gated)noShow static description

The service never uses AI to make ranking decisions that affect monetization (boost rules, sponsored slots) without HITL approval recorded as AIProvenance.

2. Allowed AI surfaces

2.1 Query understanding (QueryUnderstandingPort)

Adapter calls POST {AI_ORCHESTRATOR_BASE_URL}/v1/intents/parse-search with:

{
"text": "هتل سه ستاره در کابل با وای فای رایگان",
"locale": "fa",
"region": "AF",
"tenantId": null
}

Response:

{
"intent": "lodging.search",
"extracted": {
"destination": { "city": "kabul", "confidence": 0.97 },
"starRating": { "min": 3, "confidence": 0.93 },
"amenities": [{ "code": "wifi.free", "confidence": 0.95 }],
"datesHint": null
},
"provenance": {
"model": "gemini-2.0-flash",
"promptVersion": "search-intent@2025-04-01.v3",
"latencyMs": 142,
"costMicros": 84,
"redactionApplied": true,
"hitlReviewed": false
}
}

Constraints:

  • 30-day cache keyed on sha256(locale|region|normalize(text)). Cache hit ⇒ no orchestrator call.
  • 800 ms hard timeout. On timeout/error: extracted = null, search proceeds with raw text.
  • tenantId is always null in cross-tenant search ⇒ no per-tenant model leakage.
  • The orchestrator runs PII redaction on text before the model call. The cache key is built from the redacted text.

2.2 Semantic re-rank (Phase 2+)

Adapter calls POST {AI_ORCHESTRATOR_BASE_URL}/v1/embeddings/encode once per canonical-query-hash:

{ "text": "boutique hotel central kabul wifi", "modality": "search-query.v1" }

Returns a 768-dim vector (provider-stable, e.g. gemini-text-embedding-004). The vector is used to issue a k-NN query against the OpenSearch embedding field, restricted by the same filter clause used by BM25.

Hotel embeddings are produced offline by a nightly job (embed-properties) that:

  1. Reads new/changed HotelIndexEntry rows since the last watermark.
  2. Builds a per-hotel "embedding text" from allow-listed fields only: name.default || description.default || city || country || amenities.join(' ').
  3. Calls the orchestrator embeddings endpoint.
  4. Persists the vector to Postgres pgvector column and pushes to OpenSearch embedding.
  5. Records AIProvenance (model, promptVersion, costMicros, latencyMs) on the row.

Hard rules:

  • Embedding text never includes any field outside the allow-list.
  • A hotel without an embedding is still searchable — semantic re-rank just uses BM25 score + popularity for that doc.
  • Embedding model upgrade requires IndexBuild cycle so old vectors are wiped (cosine similarity isn't comparable across model versions).

2.3 City/Hotel AI summaries (Phase 3+)

Optional pre-rendered marketing copy ("things to do in Mazar-e Sharif", "highlights of Hotel Kabul Serena"). Generated by an offline job; all output is HITL-reviewed and persisted as a MarketingSnippet (separate aggregate, not in this bundle). The search service simply joins the snippet on propertyId / city for the detail page.

Hard rules:

  • AI text is never returned in a search response without a stored hitlReviewed=true flag.
  • Tenant-owned text always wins over AI-generated text. AI text is shown only if the tenant supplied none.
  • A user-visible "AI-assisted summary" badge is required by 07 Security & Compliance.

3. Forbidden AI surfaces

ForbiddenWhy
Per-user personalization modelCross-tenant anonymous traffic; no PII to personalize on. Recommendations belong to bff-consumer-service later, with explicit consent.
LLM in the hot search path (per request)Latency budget is 250 ms p95; one LLM call eats 60 % of it. Intent parse runs only on cache miss.
LLM-generated SQL/OpenSearch DSLInjection risk against the only cross-tenant data store. All DSL is templated and parameterized in code.
LLM that decides boost multipliers, sponsored slots, or pricing displayDirect revenue impact ⇒ HITL or rule-based only.
Model fine-tuning on search queriesLogged queries are sampled and PII-redacted; not approved as training data.
Direct provider SDK (Anthropic / Vertex / OpenAI)All providers must be reached via ai-orchestrator-service for cost, audit, and rate-limit governance.

4. AIProvenance contract

Every AI-derived field on a domain object carries an AIProvenance record per DOMAIN_MODEL.md:

interface AIProvenance {
model: string; // e.g. "gemini-2.0-flash"
promptVersion: string; // e.g. "search-intent@2025-04-01.v3"
latencyMs: number;
costMicros: number;
redactionApplied: boolean;
hitlReviewed: boolean;
reviewedBy?: UserId;
reviewedAt?: ISODateString;
}

Persisted alongside the field (e.g. embedding_provenance jsonb, summary_provenance jsonb). Surfaced via GET /api/v1/search/hotels/{propertyId} only when the operator-admin scope is present, never on consumer responses.

5. Cost & rate limits

SurfaceCalls / hour (steady state)Hard cap (rate limiter)Budget owner
Query intent parse≤ 5 K (cache hit ratio target ≥ 90 %)12 Ksearch-aggregation-service
Embedding (offline job)≤ 5 K rows/day50 K/daysearch-aggregation-service
Marketing summary (offline)≤ 100/day, HITL-gated500/daycontent-team-service (Phase 3)

All caps are enforced by the orchestrator; the service emits melmastoon.search.ai.budget_breach.v1 on cap denial and continues without AI.

6. Tenancy & data minimization

  • Cross-tenant: query understanding never tags a tenant; embeddings of public hotels are public allow-listed text only.
  • Per-tenant intelligence (e.g. occupancy forecast, dynamic pricing suggestions) is owned by pricing-service / analytics-service, not by this service.
  • Logs of orchestrator calls retain only cacheKey, model, latencyMs, costMicros, redactionApplied — never raw text.

7. Failure modes

FailureBehaviourAlert
Orchestrator 5xxCached intent or fallback to keyword; ai_search_intent_fallbacks_total counter incrementBurn alert if fallback rate > 20 % over 15 min
Orchestrator timeoutSame as abovesame
Embedding job failureJob retried with backoff; missing embeddings flagged in index.health_alert.v1 if > 10 % of corpuspage on >24 h gap
Provider model deprecationEmbedding job halts; IndexBuild required to re-embedrunbook Embedding model migration
HITL-unreviewed AI summary leakHard alarm; surface returns 500 instead of leakingsecurity on-call

8. References