search-aggregation-service — SECURITY_MODEL

Companion: SERVICE_OVERVIEW · DATA_MODEL · API_CONTRACTS · DEPLOYMENT_TOPOLOGY · ../../docs/07-security-compliance-tenancy.md · ../../docs/architecture/ADR-0002-multi-tenancy-model.md

1. Threat model — top three

Cross-tenant PII leak through the search index. The service is uniquely permitted to read across tenants, so any field that escapes the allow-list reaches anonymous users worldwide. Mitigation: four defense layers (type, projection, schema, audit).
Tenant data injection by a compromised upstream service publishing a malicious payload. Mitigation: strict event payload validation, allow-list filtering, signed event provenance.
DoS via expensive queries (from + size > 10 000, very wide bbox, many facets). Mitigation: request validation, hard limits, per-IP and per-user-bucket rate limits, query-cost circuit breaker.

The full threat model is maintained in SERVICE_RISK_REGISTER.md.

2. Tenancy posture (inverted)

This service is the single exception to the tenant-isolation rule documented in ADR-0002. It is therefore subject to extra controls:

Posture	Standard service	search-aggregation-service
`X-Tenant-Id` required on consumer reads	yes	no (anonymous meta-search)
Postgres RLS scoped to `tenant_id = current_setting('app.tenant_id')`	yes	no — sentinel `__cross_tenant__` allowed; tables have no tenant-scoped policy
Per-tenant Cloud SQL connection pool	yes	no — single pool with the cross-tenant SA
Field-level allow-list for indexable data	optional	mandatory and CI-enforced
OpenSearch index per tenant	n/a (only this service uses OpenSearch)	no — single index across tenants, by region
Cascade purge on `tenant.deleted.v1`	optional	mandatory
Tenant-isolation integration test	mandatory	inverted — asserts cross-tenant reads succeed AND that no forbidden field appears in any document

3. Identity, authentication, authorization

3.1 Identity context

Surface	Identity provider	Credential	Scopes
Public consumer routes (`/api/v1/search/queries`, `/hotels/*`, `/suggest`)	none	none	none — anonymous
Public consumer click route (`/api/v1/search/clicks`)	optional opaque `X-User-Bucket`	none	none — anonymous
Operator boost-rule routes	`iam-service` (JWT, RS256)	`Authorization: Bearer <jwt>`, `X-Tenant-Id: <uuid>`	`search:boost-rule:read
Operator index routes	platform admin JWT	as above + `X-Admin: true` claim	`search:index:rebuild
Internal endpoints (`/internal/*`)	mTLS via service-mesh + IAM SA on Cloud Run	x509	per-route IAM bindings
Health (`/healthz`, `/readyz`)	none	none	none

3.2 Authorization rules

Public routes: no authorization check; only rate limiting and request validation.
Boost-rule routes: an operator may only read/write boost rules whose tenantId matches the JWT tenantId claim. Cross-tenant boost-rule writes are rejected with MELMASTOON.SEARCH.BOOST_RULE_SCOPE_VIOLATION.
Index admin routes: gated by an OPA policy bundle search-admin.rego requiring role: platform.search_admin AND mfa: true claims. Source-IP allow-list (operator VPN ranges) enforced at Cloud Armor.
Internal routes: bound to caller SAs:
- analytics-service@… → GET /internal/v1/projection/changes
- bff-consumer-service@… → POST /internal/v1/cache:invalidate
- other SAs → 403.

3.3 Token verification

JWKS pulled from iam-service and cached 10 minutes.
iss, aud, exp, nbf, tenantId, sub, mfa claims validated.
Clock skew tolerance: 60 s.
On verification failure: 401 + audit log entry + per-IP rate-limit decrement.

4. Cross-tenant data exposure controls (the centerpiece)

Layer	Control
L1 — Type system	`HotelIndexEntry` TypeScript type literally has no field for forbidden data; `extends never` static guard on any extra key. CI test `type-allowlist.spec.ts` walks the type via `ts-morph` and rejects PRs that add a non-allow-listed property.
L2 — Projection policy	`ProjectionAllowListPolicy` filters every inbound event payload to the explicit allow-list set in `domain/policies/allow-list.ts`. Anything outside ⇒ stripped, counter `projection_field_stripped_total{field}` incremented, alert when > 0 in steady state.
L3 — Schema	Postgres `search.hotel_index_entries` columns and OpenSearch index template (`dynamic: "strict"`) only declare allow-listed fields. Schema diff CI gate compares the live schema to the committed allow-list.
L4 — Audit job	Nightly `ProjectionExposureAuditor` (DATA_MODEL § 7) scans all rows and all OpenSearch fields and pages security on any anomaly.

Adding a new searchable field requires a PR that touches all four layers and includes a security review checkbox referencing the field's classification.

5. Data classification & handling

Field group	Classification	At rest	In transit	In logs
`propertyId`, `tenantId`, `region`, `geo`, `amenities`, `name`, `description`, `hero`, `priceFrom`, `roomsAvailable`	`public`	unencrypted column-level (CMEK at storage layer)	TLS 1.2+	allowed
`popularity*`, `boostMultiplier`, `freshnessBoost`, `qualityScore`	`internal`	as above	TLS	allowed
`search_queries.text`, `search_queries.user_bucket`	`restricted` (potential PII in free text)	column CMEK + nullified at 30 d	TLS	hash only, never raw
`click_events.user_bucket`	`restricted`	nullified at 30 d	TLS	hash only
Any AI provenance	`internal`	jsonb	TLS	allowed
Anything else	forbidden — must not exist	n/a	n/a	n/a

Logs are routed to Cloud Logging with severity, traceId, spanId, eventId, tenantId (when present), and request.shape (only the request structure, never raw text or user identifiers). PII redaction middleware runs before the log appender.

6. Secret handling

Secrets:

OPENSEARCH_BASIC_AUTH — Aiven managed credentials.
MEMORYSTORE_AUTH_STRING — Memorystore AUTH.
IAM_JWKS_URL and JWKS cache (no secret per se, hardened TLS only).
AI_ORCHESTRATOR_BASE_URL (no secret; SA-bound).

All secrets live in Google Secret Manager, accessed via Workload Identity Federation. The runtime SA search-aggregation@<project>.iam.gserviceaccount.com has roles/secretmanager.secretAccessor scoped to those secret names only. Rotation: 90 days for OpenSearch and Memorystore; automated via Cloud Scheduler + Secret Manager rotation hook.

No secret is ever read from environment variables in production. Local dev pulls from a synthetic secret bundle (see LOCAL_DEV_SETUP.md).

7. Network posture

Service runs on Cloud Run (region europe-west1 primary, asia-south1 secondary). Ingress: internal + Cloud Load Balancing only.
Public traffic enters via the Apigee → Cloud Armor → external HTTPS LB → bff-consumer-service → this service. Direct internet ingress is denied at the LB.
Egress is restricted by Serverless VPC Connector to:
- Cloud SQL Postgres private IP,
- OpenSearch (Aiven peered VPC),
- Memorystore Redis private IP,
- Pub/Sub via Private Google Access,
- Secret Manager via PGA,
- ai-orchestrator-service via internal LB.
All other egress denied by VPC firewall rules.
Cloud Armor rules: WAF preconfigured rules (OWASP CRS), per-IP rate limit (60 rps burst / 10 rps sustained for /api/v1/search/queries), geographic block list (synced from sanctions/OFAC list weekly).

8. Input validation & query safety

Request bodies validated with zod schemas per endpoint, max body size 32 KiB.
text field: max 256 characters, NFKC-normalized, control characters stripped, regex/wildcard syntax stripped before going to OpenSearch.
bbox: rejected if area > 250 000 km².
radiusKm: clamped to [0.1, 200].
page.size: hard max 50; page.cursor validated as opaque base64 of an HMAC-signed payload (rejects forged cursors).
from + size cap on OpenSearch translates to a "deep paging" guard returning MELMASTOON.SEARCH.PAGE_OUT_OF_RANGE past 10 000.
Facet selection limited to 12 simultaneous facets per query.
Idempotency-Key validated as ULID/UUID; replay returns the previous response and tracks idempotency_replay_total.

9. AI surface controls

Per AI_INTEGRATION.md:

All LLM/embedding calls go through ai-orchestrator-service and are subject to platform-wide redaction, rate limits, and cost caps.
AI is never used to construct OpenSearch DSL or SQL.
Cache keys for AI responses use the redacted prompt; raw prompts never leak into Redis.
AI-generated user-visible text requires hitlReviewed=true to be served.

10. Auditability

Event	Sink	Retention
Boost rule create/activate/cancel	`melmastoon.search.boost_rule.v1` topic + `audit-service` mirror	7 years
Index rebuild start/complete/fail	`melmastoon.search.index.v1` topic + audit	2 years
Allow-list strip event	counter + `projection.failed.v1` if persistent	2 years
Auth failure (operator routes)	structured log + audit mirror	1 year
Sensitive admin action (cache invalidate, index swap)	structured log + Slack `#sec-ops` notification	1 year
Tenant cascade purge	`melmastoon.tenant.purge_completed.v1` ack	7 years

Tamper-evidence: audit topic is append-only with subscriber audit-service writing to BigQuery + GCS Object Lock (90-day immutability).

11. Compliance hooks

Right-to-erasure (per tenant or per upstream subject): when a property is deleted upstream, the projection row is removed and OpenSearch document deleted within the SLO window. Search query logs for the affected propertyId clicks are anonymized in the next nightly nightly-anonymize-search-queries run.
Data residency: Region-pinned indexes ensure AF/TJ/IR property data are stored in regional indexes; cross-region reads from the consumer surface are explicit user choice.
Sanctions screening: not enforced here (no transactions). Tenants under sanctions are suppressed by tenant-service issuing tenant.deleted.v1 (or .suspended.v1 Phase 2+) which suppresses index entries.

12. Service identity & supply chain

Container built from a hermetic Cloud Build pipeline; image signed with Sigstore cosign, attestation stored in Artifact Registry.
Cloud Run admission policy requires cosign.verified=true on the deployed image.
Base image: distroless gcr.io/distroless/nodejs20-debian12.
SBOM (CycloneDX) generated per build; CVE scan via Container Analysis. Critical CVEs block release.

13. Penetration test coverage (annual)

Scope items required for the annual pentest:

Allow-list bypass attempts via crafted upstream events (replay malformed payloads in staging).
OpenSearch DSL injection through text, filter.amenities, sort.key.
Boost-rule scope violation (cross-tenant write attempts with valid JWT for tenant A targeting tenant B).
Cursor forgery (re-use, modify, sign with wrong key).
Rate-limit bypass (header smuggling, IPv6 expansion, Apigee header spoofing).
Cache poisoning via Vary mismatch on X-Currency / X-Region / Accept-Language.
Cross-tenant query log inference (can a logged query expose another tenant's identity?).

1. Threat model — top three​

2. Tenancy posture (inverted)​

3. Identity, authentication, authorization​

3.1 Identity context​

3.2 Authorization rules​

3.3 Token verification​

4. Cross-tenant data exposure controls (the centerpiece)​

5. Data classification & handling​

6. Secret handling​

7. Network posture​

8. Input validation & query safety​

9. AI surface controls​

10. Auditability​

11. Compliance hooks​

12. Service identity & supply chain​

13. Penetration test coverage (annual)​