Skip to main content

Security

:::info Source Sourced from services/search-service/SECURITY_MODEL.md in the documentation repo. :::

Inherits platform baselines from docs/13-security-compliance-tenancy.md.

Security in search is about three things: tenant isolation, authorization on every hit, and content safety in the index.

1. Threat Model (Abbreviated)

ThreatImpactMitigation
Cross-tenant leak via queryHIGHMandatory tenantId filter injected at port layer; unit + integration + fuzz test every adapter call path
Bypass via direct OpenSearch accessCRITICALOpenSearch not internet-exposed; inside VPC; no direct clients outside search-service
Query injection (Lucene)HIGHNever concatenate raw user input into Lucene DSL; query built via typed AST → OS DSL
Embedding leak (PII in vectors)HIGHSanitizer before embedding for visibility ∈ {marketplace, public}; vectors never leave tenant for others
DoS via expensive queriesMEDQuery cost analyzer (wildcard/regex rejection); per-actor quotas; timeout 5s
Prompt injection via LLM pathMEDTemplate-guided prompts; LLM output validated via schema; never executed
Reindex abuse (resource exhaust)MED2/day/tenant rate limit; quota per org size
Stale visibility (document should be private but shows up)HIGHVisibility events prioritized in consumer; reindex on demotion
Recommendation feedback spoofingMEDSigned generationId; only the recipient user can feedback

2. Authentication

  • All endpoints except /healthz and /readyz require a valid JWT.
  • JWT validated against identity-service JWKS (cached 1h).
  • Required claims: sub, tid, roles, iat, exp, scope.
  • S2S (projector → ai-gateway, for example) uses mTLS + service-account JWT.

3. Authorization

3.1 Role Matrix

Role/search/suggest/recommendations/*/reindex/rebuild-embeddings/debug/explain
learneryesyesselfnonono
instructoryesyesselfnonono
manageryesyesself + reportsnonono
tenant-adminyesyesany in tenantyesnono
platform-adminyes (cross-tenant)n/an/ayesyesyes
service-account (projector)n/an/an/an/an/an/a

Platform-admin cross-tenant access is strictly audited.

3.2 Query-Time Filter

Every inbound query is augmented (before reaching OpenSearch) with:

tenantId = <actor.tid>
AND (visibility IN applicableTo(actor.roles))
AND deletedAt NOT EXISTS

For visibility=marketplace queries (e.g. marketplace discovery), tenantId filter is replaced with visibility=marketplace and additional region filter.

Cross-tenant search is a distinct, audited endpoint used only by platform-admin tooling.

4. Tenant Isolation

4.1 Data plane

  • Every OpenSearch index filtered by tenantId (enforced at port).
  • Per-tenant dedicated aliases for top-N tenants — zero index-level sharing.
  • Postgres search schema uses RLS on every table (USING (tenant_id = current_setting('app.tenant_id')::uuid)).
  • pgvector calls to ai-gateway carry tenantId header; ai-gateway-side RLS enforces isolation.
  • Redis keys prefixed search:{tenantId}:…; key scan restricted in code.

4.2 Observability plane

  • Logs redact PII fields; tenantId allowed.
  • Traces tagged with tenant.id; cross-tenant query span ties are not joinable in the UI (enforced at Grafana datasource level).

4.3 Operator plane

  • Reindex and rebuild endpoints require the caller's tid to equal the target tenant (except platform-admin).
  • Platform-admin actions emit platform.admin.action.performed.v1 audit events.

5. Input Validation

SurfaceValidation
Query string qlength ≤ 512; UTF-8 well-formed; reject control chars except tab
Filter DSLParsed by typed AST parser; unknown ops rejected
hybridAlpha0.0..1.0 bounded
page.size1..100
typestrict enum
localeBCP47 regex
Reindex bodyJSON schema validated
Feedback bodyschema + verified generationId belongs to user

Zod or JSON-Schema validators live at the HTTP layer; every use case also revalidates domain invariants.

6. Secrets & Credentials

  • OpenSearch credentials: short-lived IAM tokens via SigV4 (or mTLS on self-hosted cluster).
  • ai-gateway service token: 15-min JWT rotated via spiffe/SPIRE.
  • Redis ACLs: dedicated user for search-service, restricted key prefix search:*.
  • Postgres creds: Vault-issued, 1h TTL.
  • No secrets in env files in prod; dev uses .env.local never committed.

7. Rate Limiting & Quotas

Per API_CONTRACTS.md §11. Additionally:

  • Suspect patterns (regex, wildcard-leading, deeply paginated) rate-capped tighter.
  • Per-tenant embedding budget enforced by ai-gateway; exceed → lexical-only fallback.
  • Per-IP anomaly detection (100× baseline) triggers soft block.

8. Audit Logging

Audit entries written for:

  • Every /reindex or /rebuild-embeddings invocation.
  • Every platform-admin cross-tenant query.
  • Every document with visibility=public indexed or tombstoned.
  • Every recommendation feedback (for L2R integrity).

Audit schema:

{
"type": "search.audit",
"actor": { "id": "...", "roles": [] },
"tenantId": "...",
"action": "reindex | cross-tenant-query | public-doc-index | feedback-recorded",
"target": { "kind": "...", "id": "..." },
"at": "...",
"traceId": "..."
}

Retained 7 years; routed to analytics-service audit workspace.

9. Encryption

LayerAt restIn transit
OpenSearchAES-256 via cloud KMSTLS 1.3
PostgresTDETLS 1.3
RedisDisk encryptionTLS 1.2+
NATSper-stream encryptionmTLS
Client offline storeSQLCipher / WebCrypton/a

All TLS inside mesh pinned to service identity (spiffe).

10. Compliance Obligations

  • GDPR erasure: On tenant.user.removed.v1, search-service deletes user document, user profile embedding, query history within 30d (audit trail kept).
  • Data residency: Enforced via region field; never ship records cross-region except for the global marketplace slice.
  • COPPA: Queries for minor-flagged tenants exclude visibility=marketplace unless explicitly allow-listed.
  • Accessibility: Suggest and explain responses respect locale + RTL.

11. Security Testing

  • SAST (CodeQL) on every PR.
  • DAST against staging /search + /suggest weekly.
  • Fuzzing q + filter DSL 1h/day in CI.
  • Pen test annually — scope includes cross-tenant boundary.
  • Chaos security drill: force-delete tenant → verify index purge within SLO.

12. Incident Response

ScenarioActionSLA
Confirmed cross-tenant leakDisable offending query path; forensic replay; notify compliance1h
PII found in public embeddingsPurge vectors; rebuild affected doc; notify DPO4h
OpenSearch credential leakRotate IAM; rotate cluster; audit 30d access1h
DLQ > 0Triage poison messages; fix projector; replay24h

Runbooks in FAILURE_MODES.md.

13. Security Checklist (pre-merge)

  • All new endpoints covered by authz tests (both positive and negative).
  • No string interpolation into DSL/SQL.
  • Every query path provably injects tenantId filter (reviewed via tenancy lint).
  • PII paths run through sanitizer.
  • Rate-limit rule defined for any new endpoint.
  • Secrets read from Vault/config, not code.
  • Audit event defined for any privileged action.