search-aggregation-service — SECURITY_MODEL
Companion: SERVICE_OVERVIEW · DATA_MODEL · API_CONTRACTS · DEPLOYMENT_TOPOLOGY · ../../docs/07-security-compliance-tenancy.md · ../../docs/architecture/ADR-0002-multi-tenancy-model.md
1. Threat model — top three
- Cross-tenant PII leak through the search index. The service is uniquely permitted to read across tenants, so any field that escapes the allow-list reaches anonymous users worldwide. Mitigation: four defense layers (type, projection, schema, audit).
- Tenant data injection by a compromised upstream service publishing a malicious payload. Mitigation: strict event payload validation, allow-list filtering, signed event provenance.
- DoS via expensive queries (
from + size > 10 000, very wide bbox, many facets). Mitigation: request validation, hard limits, per-IP and per-user-bucket rate limits, query-cost circuit breaker.
The full threat model is maintained in SERVICE_RISK_REGISTER.md.
2. Tenancy posture (inverted)
This service is the single exception to the tenant-isolation rule documented in ADR-0002. It is therefore subject to extra controls:
| Posture | Standard service | search-aggregation-service |
|---|---|---|
X-Tenant-Id required on consumer reads | yes | no (anonymous meta-search) |
Postgres RLS scoped to tenant_id = current_setting('app.tenant_id') | yes | no — sentinel __cross_tenant__ allowed; tables have no tenant-scoped policy |
| Per-tenant Cloud SQL connection pool | yes | no — single pool with the cross-tenant SA |
| Field-level allow-list for indexable data | optional | mandatory and CI-enforced |
| OpenSearch index per tenant | n/a (only this service uses OpenSearch) | no — single index across tenants, by region |
Cascade purge on tenant.deleted.v1 | optional | mandatory |
| Tenant-isolation integration test | mandatory | inverted — asserts cross-tenant reads succeed AND that no forbidden field appears in any document |
3. Identity, authentication, authorization
3.1 Identity context
| Surface | Identity provider | Credential | Scopes |
|---|---|---|---|
Public consumer routes (/api/v1/search/queries, /hotels/*, /suggest) | none | none | none — anonymous |
Public consumer click route (/api/v1/search/clicks) | optional opaque X-User-Bucket | none | none — anonymous |
| Operator boost-rule routes | iam-service (JWT, RS256) | Authorization: Bearer <jwt>, X-Tenant-Id: <uuid> | `search:boost-rule:read |
| Operator index routes | platform admin JWT | as above + X-Admin: true claim | `search:index:rebuild |
Internal endpoints (/internal/*) | mTLS via service-mesh + IAM SA on Cloud Run | x509 | per-route IAM bindings |
Health (/healthz, /readyz) | none | none | none |
3.2 Authorization rules
- Public routes: no authorization check; only rate limiting and request validation.
- Boost-rule routes: an operator may only read/write boost rules whose
tenantIdmatches the JWTtenantIdclaim. Cross-tenant boost-rule writes are rejected withMELMASTOON.SEARCH.BOOST_RULE_SCOPE_VIOLATION. - Index admin routes: gated by an OPA policy bundle
search-admin.regorequiringrole: platform.search_adminANDmfa: trueclaims. Source-IP allow-list (operator VPN ranges) enforced at Cloud Armor. - Internal routes: bound to caller SAs:
analytics-service@…→GET /internal/v1/projection/changesbff-consumer-service@…→POST /internal/v1/cache:invalidate- other SAs → 403.
3.3 Token verification
- JWKS pulled from
iam-serviceand cached 10 minutes. iss,aud,exp,nbf,tenantId,sub,mfaclaims validated.- Clock skew tolerance: 60 s.
- On verification failure:
401+ audit log entry + per-IP rate-limit decrement.
4. Cross-tenant data exposure controls (the centerpiece)
| Layer | Control |
|---|---|
| L1 — Type system | HotelIndexEntry TypeScript type literally has no field for forbidden data; extends never static guard on any extra key. CI test type-allowlist.spec.ts walks the type via ts-morph and rejects PRs that add a non-allow-listed property. |
| L2 — Projection policy | ProjectionAllowListPolicy filters every inbound event payload to the explicit allow-list set in domain/policies/allow-list.ts. Anything outside ⇒ stripped, counter projection_field_stripped_total{field} incremented, alert when > 0 in steady state. |
| L3 — Schema | Postgres search.hotel_index_entries columns and OpenSearch index template (dynamic: "strict") only declare allow-listed fields. Schema diff CI gate compares the live schema to the committed allow-list. |
| L4 — Audit job | Nightly ProjectionExposureAuditor (DATA_MODEL § 7) scans all rows and all OpenSearch fields and pages security on any anomaly. |
Adding a new searchable field requires a PR that touches all four layers and includes a security review checkbox referencing the field's classification.
5. Data classification & handling
| Field group | Classification | At rest | In transit | In logs |
|---|---|---|---|---|
propertyId, tenantId, region, geo, amenities, name, description, hero*, priceFrom*, roomsAvailable | public | unencrypted column-level (CMEK at storage layer) | TLS 1.2+ | allowed |
popularity*, boostMultiplier, freshnessBoost, qualityScore | internal | as above | TLS | allowed |
search_queries.text, search_queries.user_bucket | restricted (potential PII in free text) | column CMEK + nullified at 30 d | TLS | hash only, never raw |
click_events.user_bucket | restricted | nullified at 30 d | TLS | hash only |
| Any AI provenance | internal | jsonb | TLS | allowed |
| Anything else | forbidden — must not exist | n/a | n/a | n/a |
Logs are routed to Cloud Logging with severity, traceId, spanId, eventId, tenantId (when present), and request.shape (only the request structure, never raw text or user identifiers). PII redaction middleware runs before the log appender.
6. Secret handling
Secrets:
OPENSEARCH_BASIC_AUTH— Aiven managed credentials.MEMORYSTORE_AUTH_STRING— Memorystore AUTH.IAM_JWKS_URLand JWKS cache (no secret per se, hardened TLS only).AI_ORCHESTRATOR_BASE_URL(no secret; SA-bound).
All secrets live in Google Secret Manager, accessed via Workload Identity Federation. The runtime SA search-aggregation@<project>.iam.gserviceaccount.com has roles/secretmanager.secretAccessor scoped to those secret names only. Rotation: 90 days for OpenSearch and Memorystore; automated via Cloud Scheduler + Secret Manager rotation hook.
No secret is ever read from environment variables in production. Local dev pulls from a synthetic secret bundle (see LOCAL_DEV_SETUP.md).
7. Network posture
- Service runs on Cloud Run (region
europe-west1primary,asia-south1secondary). Ingress: internal + Cloud Load Balancing only. - Public traffic enters via the Apigee → Cloud Armor → external HTTPS LB →
bff-consumer-service→ this service. Direct internet ingress is denied at the LB. - Egress is restricted by Serverless VPC Connector to:
- Cloud SQL Postgres private IP,
- OpenSearch (Aiven peered VPC),
- Memorystore Redis private IP,
- Pub/Sub via Private Google Access,
- Secret Manager via PGA,
ai-orchestrator-servicevia internal LB.
- All other egress denied by VPC firewall rules.
- Cloud Armor rules: WAF preconfigured rules (OWASP CRS), per-IP rate limit (60 rps burst / 10 rps sustained for
/api/v1/search/queries), geographic block list (synced from sanctions/OFAC list weekly).
8. Input validation & query safety
- Request bodies validated with
zodschemas per endpoint, max body size 32 KiB. textfield: max 256 characters, NFKC-normalized, control characters stripped, regex/wildcard syntax stripped before going to OpenSearch.bbox: rejected ifarea > 250 000 km².radiusKm: clamped to[0.1, 200].page.size: hard max 50;page.cursorvalidated as opaque base64 of an HMAC-signed payload (rejects forged cursors).from + sizecap on OpenSearch translates to a "deep paging" guard returningMELMASTOON.SEARCH.PAGE_OUT_OF_RANGEpast 10 000.- Facet selection limited to 12 simultaneous facets per query.
Idempotency-Keyvalidated as ULID/UUID; replay returns the previous response and tracksidempotency_replay_total.
9. AI surface controls
Per AI_INTEGRATION.md:
- All LLM/embedding calls go through
ai-orchestrator-serviceand are subject to platform-wide redaction, rate limits, and cost caps. - AI is never used to construct OpenSearch DSL or SQL.
- Cache keys for AI responses use the redacted prompt; raw prompts never leak into Redis.
- AI-generated user-visible text requires
hitlReviewed=trueto be served.
10. Auditability
| Event | Sink | Retention |
|---|---|---|
| Boost rule create/activate/cancel | melmastoon.search.boost_rule.v1 topic + audit-service mirror | 7 years |
| Index rebuild start/complete/fail | melmastoon.search.index.v1 topic + audit | 2 years |
| Allow-list strip event | counter + projection.failed.v1 if persistent | 2 years |
| Auth failure (operator routes) | structured log + audit mirror | 1 year |
| Sensitive admin action (cache invalidate, index swap) | structured log + Slack #sec-ops notification | 1 year |
| Tenant cascade purge | melmastoon.tenant.purge_completed.v1 ack | 7 years |
Tamper-evidence: audit topic is append-only with subscriber audit-service writing to BigQuery + GCS Object Lock (90-day immutability).
11. Compliance hooks
- Right-to-erasure (per tenant or per upstream subject): when a property is deleted upstream, the projection row is removed and OpenSearch document deleted within the SLO window. Search query logs for the affected
propertyIdclicks are anonymized in the next nightlynightly-anonymize-search-queriesrun. - Data residency: Region-pinned indexes ensure AF/TJ/IR property data are stored in regional indexes; cross-region reads from the consumer surface are explicit user choice.
- Sanctions screening: not enforced here (no transactions). Tenants under sanctions are suppressed by
tenant-serviceissuingtenant.deleted.v1(or.suspended.v1Phase 2+) which suppresses index entries.
12. Service identity & supply chain
- Container built from a hermetic Cloud Build pipeline; image signed with Sigstore cosign, attestation stored in Artifact Registry.
- Cloud Run admission policy requires
cosign.verified=trueon the deployed image. - Base image: distroless
gcr.io/distroless/nodejs20-debian12. - SBOM (CycloneDX) generated per build; CVE scan via Container Analysis. Critical CVEs block release.
13. Penetration test coverage (annual)
Scope items required for the annual pentest:
- Allow-list bypass attempts via crafted upstream events (replay malformed payloads in staging).
- OpenSearch DSL injection through
text,filter.amenities,sort.key. - Boost-rule scope violation (cross-tenant write attempts with valid JWT for tenant A targeting tenant B).
- Cursor forgery (re-use, modify, sign with wrong key).
- Rate-limit bypass (header smuggling, IPv6 expansion, Apigee header spoofing).
- Cache poisoning via
Varymismatch onX-Currency/X-Region/Accept-Language. - Cross-tenant query log inference (can a logged query expose another tenant's identity?).