search-aggregation-service — AI_INTEGRATION
Companion: SERVICE_OVERVIEW · APPLICATION_LOGIC · DATA_MODEL · SECURITY_MODEL · ../../docs/08-ai-architecture.md
1. Posture
search-aggregation-service is an AI consumer, never an AI orchestrator. It does not call LLMs directly; all model calls go through ai-orchestrator-service per 08 AI Architecture. It uses AI for three narrow read-side capabilities, all of which degrade gracefully when AI is unavailable:
| Capability | Phase | LLM call? | Embedding? | Degrade-without-AI behaviour |
|---|---|---|---|---|
| Multilingual query understanding | Phase 1 | one cached intent classification per canonical-query-hash | no | Treat query as keyword text in detected language |
| Semantic re-rank (k-NN over hotel embeddings) | Phase 2+ | no | yes — pgvector + OpenSearch k-NN | Skip semantic boost, BM25 + popularity only |
| AI-generated city/hotel summaries (optional copy) | Phase 3+ | yes (cached, HITL-gated) | no | Show static description |
The service never uses AI to make ranking decisions that affect monetization (boost rules, sponsored slots) without HITL approval recorded as AIProvenance.
2. Allowed AI surfaces
2.1 Query understanding (QueryUnderstandingPort)
Adapter calls POST {AI_ORCHESTRATOR_BASE_URL}/v1/intents/parse-search with:
{
"text": "هتل سه ستاره در کابل با وای فای رایگان",
"locale": "fa",
"region": "AF",
"tenantId": null
}
Response:
{
"intent": "lodging.search",
"extracted": {
"destination": { "city": "kabul", "confidence": 0.97 },
"starRating": { "min": 3, "confidence": 0.93 },
"amenities": [{ "code": "wifi.free", "confidence": 0.95 }],
"datesHint": null
},
"provenance": {
"model": "gemini-2.0-flash",
"promptVersion": "search-intent@2025-04-01.v3",
"latencyMs": 142,
"costMicros": 84,
"redactionApplied": true,
"hitlReviewed": false
}
}
Constraints:
- 30-day cache keyed on
sha256(locale|region|normalize(text)). Cache hit ⇒ no orchestrator call. - 800 ms hard timeout. On timeout/error:
extracted = null, search proceeds with raw text. tenantIdis alwaysnullin cross-tenant search ⇒ no per-tenant model leakage.- The orchestrator runs PII redaction on
textbefore the model call. The cache key is built from the redacted text.
2.2 Semantic re-rank (Phase 2+)
Adapter calls POST {AI_ORCHESTRATOR_BASE_URL}/v1/embeddings/encode once per canonical-query-hash:
{ "text": "boutique hotel central kabul wifi", "modality": "search-query.v1" }
Returns a 768-dim vector (provider-stable, e.g. gemini-text-embedding-004). The vector is used to issue a k-NN query against the OpenSearch embedding field, restricted by the same filter clause used by BM25.
Hotel embeddings are produced offline by a nightly job (embed-properties) that:
- Reads new/changed
HotelIndexEntryrows since the last watermark. - Builds a per-hotel "embedding text" from allow-listed fields only:
name.default || description.default || city || country || amenities.join(' '). - Calls the orchestrator embeddings endpoint.
- Persists the vector to Postgres
pgvectorcolumn and pushes to OpenSearchembedding. - Records
AIProvenance(model,promptVersion,costMicros,latencyMs) on the row.
Hard rules:
- Embedding text never includes any field outside the allow-list.
- A hotel without an embedding is still searchable — semantic re-rank just uses BM25 score + popularity for that doc.
- Embedding model upgrade requires
IndexBuildcycle so old vectors are wiped (cosine similarity isn't comparable across model versions).
2.3 City/Hotel AI summaries (Phase 3+)
Optional pre-rendered marketing copy ("things to do in Mazar-e Sharif", "highlights of Hotel Kabul Serena"). Generated by an offline job; all output is HITL-reviewed and persisted as a MarketingSnippet (separate aggregate, not in this bundle). The search service simply joins the snippet on propertyId / city for the detail page.
Hard rules:
- AI text is never returned in a search response without a stored
hitlReviewed=trueflag. - Tenant-owned text always wins over AI-generated text. AI text is shown only if the tenant supplied none.
- A user-visible "AI-assisted summary" badge is required by 07 Security & Compliance.
3. Forbidden AI surfaces
| Forbidden | Why |
|---|---|
| Per-user personalization model | Cross-tenant anonymous traffic; no PII to personalize on. Recommendations belong to bff-consumer-service later, with explicit consent. |
| LLM in the hot search path (per request) | Latency budget is 250 ms p95; one LLM call eats 60 % of it. Intent parse runs only on cache miss. |
| LLM-generated SQL/OpenSearch DSL | Injection risk against the only cross-tenant data store. All DSL is templated and parameterized in code. |
| LLM that decides boost multipliers, sponsored slots, or pricing display | Direct revenue impact ⇒ HITL or rule-based only. |
| Model fine-tuning on search queries | Logged queries are sampled and PII-redacted; not approved as training data. |
| Direct provider SDK (Anthropic / Vertex / OpenAI) | All providers must be reached via ai-orchestrator-service for cost, audit, and rate-limit governance. |
4. AIProvenance contract
Every AI-derived field on a domain object carries an AIProvenance record per DOMAIN_MODEL.md:
interface AIProvenance {
model: string; // e.g. "gemini-2.0-flash"
promptVersion: string; // e.g. "search-intent@2025-04-01.v3"
latencyMs: number;
costMicros: number;
redactionApplied: boolean;
hitlReviewed: boolean;
reviewedBy?: UserId;
reviewedAt?: ISODateString;
}
Persisted alongside the field (e.g. embedding_provenance jsonb, summary_provenance jsonb). Surfaced via GET /api/v1/search/hotels/{propertyId} only when the operator-admin scope is present, never on consumer responses.
5. Cost & rate limits
| Surface | Calls / hour (steady state) | Hard cap (rate limiter) | Budget owner |
|---|---|---|---|
| Query intent parse | ≤ 5 K (cache hit ratio target ≥ 90 %) | 12 K | search-aggregation-service |
| Embedding (offline job) | ≤ 5 K rows/day | 50 K/day | search-aggregation-service |
| Marketing summary (offline) | ≤ 100/day, HITL-gated | 500/day | content-team-service (Phase 3) |
All caps are enforced by the orchestrator; the service emits melmastoon.search.ai.budget_breach.v1 on cap denial and continues without AI.
6. Tenancy & data minimization
- Cross-tenant: query understanding never tags a tenant; embeddings of public hotels are public allow-listed text only.
- Per-tenant intelligence (e.g. occupancy forecast, dynamic pricing suggestions) is owned by
pricing-service/analytics-service, not by this service. - Logs of orchestrator calls retain only
cacheKey,model,latencyMs,costMicros,redactionApplied— never rawtext.
7. Failure modes
| Failure | Behaviour | Alert |
|---|---|---|
| Orchestrator 5xx | Cached intent or fallback to keyword; ai_search_intent_fallbacks_total counter increment | Burn alert if fallback rate > 20 % over 15 min |
| Orchestrator timeout | Same as above | same |
| Embedding job failure | Job retried with backoff; missing embeddings flagged in index.health_alert.v1 if > 10 % of corpus | page on >24 h gap |
| Provider model deprecation | Embedding job halts; IndexBuild required to re-embed | runbook Embedding model migration |
| HITL-unreviewed AI summary leak | Hard alarm; surface returns 500 instead of leaking | security on-call |