search-aggregation-service — SYNC_CONTRACT

Companion: SERVICE_OVERVIEW · APPLICATION_LOGIC · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL · ../../docs/architecture/ADR-0003-electron-offline-first-desktop.md

1. Posture: NO Electron sync surface

search-aggregation-service is a cloud-only meta-search service. It is consumed by:

bff-consumer-service (web/PWA, anonymous traffic)
bff-tenant-marketing-service (server-side, optional)
internal operator tooling for boost rules and index health

It is not consumed by any Electron desktop client. There is therefore no:

offline-first replication ledger,
LWW-with-vector-clock client diff,
melmastoon.<aggregate>.synced.v1 event,
desktop-resident SQLite mirror,
sync_state table or change_log table on this service.

The desktop frontoffice and backoffice clients (per ADR-0003) do not need cross-tenant search; they search within their own tenant via property-service and reservation-service. Cross-tenant search is exclusively a public, read-only, web capability.

This document exists for two reasons:

To make the "no sync" decision explicit and auditable, so a future engineer doesn't accidentally introduce a stale local search index on the desktop client.
To document the internal projection sync — how search-aggregation-service keeps its own Postgres+OpenSearch+Redis replicas of upstream data consistent, since this is a comparable convergence problem even though it has no Electron surface.

2. Internal projection convergence (the sync that does exist here)

The service maintains three replicas of an upstream truth:

property-service / pricing-service / inventory-service / tenant-service
                                │
                  Pub/Sub (per-aggregate ordering keys)
                                │
                                ▼
            ┌──────────────────────────────────────────┐
            │  search-aggregation-service application  │
            │  (consumers + ProjectionAllowListPolicy) │
            └────────────┬─────────────────────────────┘
                         │ single transaction
            ┌────────────┴─────────────┐
            ▼                          ▼
    ┌─────────────────┐        ┌─────────────────┐
    │ Postgres `search│        │ Outbox row      │
    │  .hotel_index_  │        │ (projection.    │
    │  entries` + …   │        │  updated.v1)    │
    └────────┬────────┘        └────────┬────────┘
             │                          │
             │  outbox publisher        │
             ▼                          ▼
    ┌─────────────────┐        ┌─────────────────┐
    │   OpenSearch    │        │ Memorystore     │
    │ (mirror writer) │        │ cache invalid.  │
    └─────────────────┘        └─────────────────┘

2.1 Convergence guarantees

Property	Guarantee	How
At-least-once consume	yes	Pub/Sub default + inbox dedup (`event_id` unique)
Per-aggregate ordering	yes	Pub/Sub ordering key = `propertyId` for property/pricing/inventory/projection topics
Out-of-order safety across producers	yes	Vector-clock guard: `vc_<service> >= incoming.vc_<service>` rejects stale slice
Atomic local commit	yes	One DB tx for `inbox.processed_at` + projection write + outbox row
OpenSearch ↔ Postgres convergence	eventual, ≤ 2 s p95	OpenSearch writer lags Postgres outbox by one publish cycle; a recovery job replays from `outbox.published_at IS NULL` plus a 5-minute scan-for-drift
Cache ↔ Postgres convergence	best-effort, immediate or ≤ 60 s TTL	Direct invalidation on `projection.updated.v1`; TTL fallback if invalidator drops
Total drift bound	< 30 s p95 (event → result)	SLO Freshness in SERVICE_OVERVIEW.md

2.2 Conflict resolution

Concurrent writes to a HotelIndexEntry happen when two upstream services emit events about the same propertyId simultaneously. Each consumer only writes its own slice:

Slice	Owner consumer	Vector-clock column
Identity, geo, amenities, languages, hero, region, status	`PropertyEventConsumer`	`vc_property_service`
`priceFromBaseMicro`, `freeCancellation`, `payAtProperty`, `RateSnapshot`	`PricingEventConsumer`	`vc_pricing_service`
`roomsAvailable`, `AvailabilityHint`	`InventoryEventConsumer`	`vc_inventory_service`
`popularityScore7d/28d`, `boostMultiplier`, `freshnessBoost`, `qualityScore`	this service (commands + jobs)	none — internal fields
Tenant cascade (`status='suppressed'` then delete)	`TenantEventConsumer`	n/a

There is no field that is jointly written by two consumers. Conflicts are therefore reduced to "is this incoming event newer than what I last applied for my own slice?" — answered by the vector-clock column.

When the consumer sees incoming.vectorClock < stored.vc_<service>, it:

logs projection.skipped_stale,
records inbox.result = 'dropped_stale',
emits no projection.updated.v1 event.

When incoming.vectorClock > stored.vc_<service>, the slice is overwritten and the vector clock advances.

When ==, the consumer treats it as duplicate; the inbox unique constraint on event_id already short-circuits this path.

2.3 Recovery & rebuild

Three recovery mechanisms:

Outbox publisher catch-up — runs continuously; handles transient Pub/Sub publish failures.
Drift sweep — every 5 minutes, picks 1 000 hotel_index_entries rows updated in the last hour and re-emits a projection.updated.v1 if the OpenSearch document hash differs.
Full reindex — IndexBuild orchestration consumes a BigQuery archive of canonical events from since_ts, replays them into a fresh OpenSearch index melmastoon-search-v<n>-<region>, then atomically swaps the melmastoon-search-current alias. See APPLICATION_LOGIC § StartIndexRebuild and DEPLOYMENT_TOPOLOGY § index swap runbook.

Postgres is always the canonical projection — OpenSearch and Redis can be wiped and rebuilt at any time without data loss.

3. Read-side cache contract for `bff-consumer-service`

Although there is no client-side mirror, bff-consumer-service may cache responses. The contract:

Every search response carries a Cache-Control: public, max-age=60, stale-while-revalidate=120 for anonymous, non-personalized queries.
Every hotel-detail response carries Cache-Control: public, max-age=300, stale-while-revalidate=600.
Personalized responses (with X-User-Bucket set on a recommendation route, future) carry Cache-Control: private, max-age=30.
Surrogate keys: Surrogate-Key: hotel:<propertyId> hotel:<propertyId>:<currency> so a CDN can purge per-property on projection.updated.v1 if bff-consumer-service chooses to wire the webhook.

search-aggregation-service itself does not push to the CDN. It exposes GET /internal/v1/projection/changes (see API_CONTRACTS.md) so any cache layer can pull recent change keys and purge accordingly.

4. Forbidden patterns (will fail review)

A new Electron client that mirrors hotel_index_entries to a local SQLite — use the existing search API instead. The desktop must not "ship a search engine" with results from other tenants; that violates the cross-tenant boundary rules even though those rows are public, because the desktop client cannot enforce the allow-list contract over time.
Bidirectional sync (mobile or desktop write back to the index). All writes are event-driven; there is no client-write API.
A new "sync_state" table for any external client — every reader uses the search API.
Direct Postgres reads from bff-consumer-service against search.*. Reads must go through the public REST API to preserve degradation, caching, ranking, and rate-limit semantics.

5. Versioning

This service participates in server-side schema versioning only:

Versioned thing	How it's versioned	Backward compat
Public REST API	URL `/api/v<N>/…` (see API_CONTRACTS.md)	Two majors at once min. 90 d
Event payloads	`event.version` integer + topic suffix `.v<n>`	Topic-major break ⇒ new topic; producers dual-publish ≥ 30 d
OpenSearch index template	`melmastoon-search-v<n>-<region>` index, one alias	Atomic alias swap, instant rollback
Postgres schema	`expand → backfill → contract`, see MIGRATION_PLAN.md	Old code keeps running through expand & backfill phases

There is no client-side schema to coordinate with; rollouts are pure server orchestration.

1. Posture: NO Electron sync surface​

2. Internal projection convergence (the sync that does exist here)​

2.1 Convergence guarantees​

2.2 Conflict resolution​

2.3 Recovery & rebuild​

3. Read-side cache contract for bff-consumer-service​

4. Forbidden patterns (will fail review)​

5. Versioning​