Skip to main content

search-aggregation-service — SYNC_CONTRACT

Companion: SERVICE_OVERVIEW · APPLICATION_LOGIC · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL · ../../docs/architecture/ADR-0003-electron-offline-first-desktop.md

1. Posture: NO Electron sync surface

search-aggregation-service is a cloud-only meta-search service. It is consumed by:

  • bff-consumer-service (web/PWA, anonymous traffic)
  • bff-tenant-marketing-service (server-side, optional)
  • internal operator tooling for boost rules and index health

It is not consumed by any Electron desktop client. There is therefore no:

  • offline-first replication ledger,
  • LWW-with-vector-clock client diff,
  • melmastoon.<aggregate>.synced.v1 event,
  • desktop-resident SQLite mirror,
  • sync_state table or change_log table on this service.

The desktop frontoffice and backoffice clients (per ADR-0003) do not need cross-tenant search; they search within their own tenant via property-service and reservation-service. Cross-tenant search is exclusively a public, read-only, web capability.

This document exists for two reasons:

  1. To make the "no sync" decision explicit and auditable, so a future engineer doesn't accidentally introduce a stale local search index on the desktop client.
  2. To document the internal projection sync — how search-aggregation-service keeps its own Postgres+OpenSearch+Redis replicas of upstream data consistent, since this is a comparable convergence problem even though it has no Electron surface.

2. Internal projection convergence (the sync that does exist here)

The service maintains three replicas of an upstream truth:

property-service / pricing-service / inventory-service / tenant-service

Pub/Sub (per-aggregate ordering keys)


┌──────────────────────────────────────────┐
│ search-aggregation-service application │
│ (consumers + ProjectionAllowListPolicy) │
└────────────┬─────────────────────────────┘
│ single transaction
┌────────────┴─────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Postgres `search│ │ Outbox row │
│ .hotel_index_ │ │ (projection. │
│ entries` + … │ │ updated.v1) │
└────────┬────────┘ └────────┬────────┘
│ │
│ outbox publisher │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ OpenSearch │ │ Memorystore │
│ (mirror writer) │ │ cache invalid. │
└─────────────────┘ └─────────────────┘

2.1 Convergence guarantees

PropertyGuaranteeHow
At-least-once consumeyesPub/Sub default + inbox dedup (event_id unique)
Per-aggregate orderingyesPub/Sub ordering key = propertyId for property/pricing/inventory/projection topics
Out-of-order safety across producersyesVector-clock guard: vc_<service> >= incoming.vc_<service> rejects stale slice
Atomic local commityesOne DB tx for inbox.processed_at + projection write + outbox row
OpenSearch ↔ Postgres convergenceeventual, ≤ 2 s p95OpenSearch writer lags Postgres outbox by one publish cycle; a recovery job replays from outbox.published_at IS NULL plus a 5-minute scan-for-drift
Cache ↔ Postgres convergencebest-effort, immediate or ≤ 60 s TTLDirect invalidation on projection.updated.v1; TTL fallback if invalidator drops
Total drift bound< 30 s p95 (event → result)SLO Freshness in SERVICE_OVERVIEW.md

2.2 Conflict resolution

Concurrent writes to a HotelIndexEntry happen when two upstream services emit events about the same propertyId simultaneously. Each consumer only writes its own slice:

SliceOwner consumerVector-clock column
Identity, geo, amenities, languages, hero, region, statusPropertyEventConsumervc_property_service
priceFromBaseMicro, freeCancellation, payAtProperty, RateSnapshotPricingEventConsumervc_pricing_service
roomsAvailable, AvailabilityHintInventoryEventConsumervc_inventory_service
popularityScore7d/28d, boostMultiplier, freshnessBoost, qualityScorethis service (commands + jobs)none — internal fields
Tenant cascade (status='suppressed' then delete)TenantEventConsumern/a

There is no field that is jointly written by two consumers. Conflicts are therefore reduced to "is this incoming event newer than what I last applied for my own slice?" — answered by the vector-clock column.

When the consumer sees incoming.vectorClock < stored.vc_<service>, it:

  1. logs projection.skipped_stale,
  2. records inbox.result = 'dropped_stale',
  3. emits no projection.updated.v1 event.

When incoming.vectorClock > stored.vc_<service>, the slice is overwritten and the vector clock advances.

When ==, the consumer treats it as duplicate; the inbox unique constraint on event_id already short-circuits this path.

2.3 Recovery & rebuild

Three recovery mechanisms:

  1. Outbox publisher catch-up — runs continuously; handles transient Pub/Sub publish failures.
  2. Drift sweep — every 5 minutes, picks 1 000 hotel_index_entries rows updated in the last hour and re-emits a projection.updated.v1 if the OpenSearch document hash differs.
  3. Full reindexIndexBuild orchestration consumes a BigQuery archive of canonical events from since_ts, replays them into a fresh OpenSearch index melmastoon-search-v<n>-<region>, then atomically swaps the melmastoon-search-current alias. See APPLICATION_LOGIC § StartIndexRebuild and DEPLOYMENT_TOPOLOGY § index swap runbook.

Postgres is always the canonical projection — OpenSearch and Redis can be wiped and rebuilt at any time without data loss.


3. Read-side cache contract for bff-consumer-service

Although there is no client-side mirror, bff-consumer-service may cache responses. The contract:

  • Every search response carries a Cache-Control: public, max-age=60, stale-while-revalidate=120 for anonymous, non-personalized queries.
  • Every hotel-detail response carries Cache-Control: public, max-age=300, stale-while-revalidate=600.
  • Personalized responses (with X-User-Bucket set on a recommendation route, future) carry Cache-Control: private, max-age=30.
  • Surrogate keys: Surrogate-Key: hotel:<propertyId> hotel:<propertyId>:<currency> so a CDN can purge per-property on projection.updated.v1 if bff-consumer-service chooses to wire the webhook.

search-aggregation-service itself does not push to the CDN. It exposes GET /internal/v1/projection/changes (see API_CONTRACTS.md) so any cache layer can pull recent change keys and purge accordingly.


4. Forbidden patterns (will fail review)

  • A new Electron client that mirrors hotel_index_entries to a local SQLite — use the existing search API instead. The desktop must not "ship a search engine" with results from other tenants; that violates the cross-tenant boundary rules even though those rows are public, because the desktop client cannot enforce the allow-list contract over time.
  • Bidirectional sync (mobile or desktop write back to the index). All writes are event-driven; there is no client-write API.
  • A new "sync_state" table for any external client — every reader uses the search API.
  • Direct Postgres reads from bff-consumer-service against search.*. Reads must go through the public REST API to preserve degradation, caching, ranking, and rate-limit semantics.

5. Versioning

This service participates in server-side schema versioning only:

Versioned thingHow it's versionedBackward compat
Public REST APIURL /api/v<N>/… (see API_CONTRACTS.md)Two majors at once min. 90 d
Event payloadsevent.version integer + topic suffix .v<n>Topic-major break ⇒ new topic; producers dual-publish ≥ 30 d
OpenSearch index templatemelmastoon-search-v<n>-<region> index, one aliasAtomic alias swap, instant rollback
Postgres schemaexpand → backfill → contract, see MIGRATION_PLAN.mdOld code keeps running through expand & backfill phases

There is no client-side schema to coordinate with; rollouts are pure server orchestration.