Security

:::info Source Sourced from services/search-service/SECURITY_MODEL.md in the documentation repo. :::

Inherits platform baselines from docs/13-security-compliance-tenancy.md.

Security in search is about three things: tenant isolation, authorization on every hit, and content safety in the index.

1. Threat Model (Abbreviated)

Threat	Impact	Mitigation
Cross-tenant leak via query	HIGH	Mandatory `tenantId` filter injected at port layer; unit + integration + fuzz test every adapter call path
Bypass via direct OpenSearch access	CRITICAL	OpenSearch not internet-exposed; inside VPC; no direct clients outside search-service
Query injection (Lucene)	HIGH	Never concatenate raw user input into Lucene DSL; query built via typed AST → OS DSL
Embedding leak (PII in vectors)	HIGH	Sanitizer before embedding for `visibility ∈ {marketplace, public}`; vectors never leave tenant for others
DoS via expensive queries	MED	Query cost analyzer (wildcard/regex rejection); per-actor quotas; timeout 5s
Prompt injection via LLM path	MED	Template-guided prompts; LLM output validated via schema; never executed
Reindex abuse (resource exhaust)	MED	2/day/tenant rate limit; quota per org size
Stale visibility (document should be private but shows up)	HIGH	Visibility events prioritized in consumer; reindex on demotion
Recommendation feedback spoofing	MED	Signed generationId; only the recipient user can feedback

2. Authentication

All endpoints except /healthz and /readyz require a valid JWT.
JWT validated against identity-service JWKS (cached 1h).
Required claims: sub, tid, roles, iat, exp, scope.
S2S (projector → ai-gateway, for example) uses mTLS + service-account JWT.

3. Authorization

3.1 Role Matrix

Role	`/search`	`/suggest`	`/recommendations/*`	`/reindex`	`/rebuild-embeddings`	`/debug/explain`
learner	yes	yes	self	no	no	no
instructor	yes	yes	self	no	no	no
manager	yes	yes	self + reports	no	no	no
tenant-admin	yes	yes	any in tenant	yes	no	no
platform-admin	yes (cross-tenant)	n/a	n/a	yes	yes	yes
service-account (projector)	n/a	n/a	n/a	n/a	n/a	n/a

Platform-admin cross-tenant access is strictly audited.

3.2 Query-Time Filter

Every inbound query is augmented (before reaching OpenSearch) with:

tenantId = <actor.tid>
AND (visibility IN applicableTo(actor.roles))
AND deletedAt NOT EXISTS

For visibility=marketplace queries (e.g. marketplace discovery), tenantId filter is replaced with visibility=marketplace and additional region filter.

Cross-tenant search is a distinct, audited endpoint used only by platform-admin tooling.

4. Tenant Isolation

4.1 Data plane

Every OpenSearch index filtered by tenantId (enforced at port).
Per-tenant dedicated aliases for top-N tenants — zero index-level sharing.
Postgres search schema uses RLS on every table (USING (tenant_id = current_setting('app.tenant_id')::uuid)).
pgvector calls to ai-gateway carry tenantId header; ai-gateway-side RLS enforces isolation.
Redis keys prefixed search:{tenantId}:…; key scan restricted in code.

4.2 Observability plane

Logs redact PII fields; tenantId allowed.
Traces tagged with tenant.id; cross-tenant query span ties are not joinable in the UI (enforced at Grafana datasource level).

4.3 Operator plane

Reindex and rebuild endpoints require the caller's tid to equal the target tenant (except platform-admin).
Platform-admin actions emit platform.admin.action.performed.v1 audit events.

5. Input Validation

Surface	Validation
Query string `q`	length ≤ 512; UTF-8 well-formed; reject control chars except tab
Filter DSL	Parsed by typed AST parser; unknown ops rejected
`hybridAlpha`	`0.0..1.0` bounded
`page.size`	`1..100`
`type`	strict enum
`locale`	BCP47 regex
Reindex body	JSON schema validated
Feedback body	schema + verified generationId belongs to user

Zod or JSON-Schema validators live at the HTTP layer; every use case also revalidates domain invariants.

6. Secrets & Credentials

OpenSearch credentials: short-lived IAM tokens via SigV4 (or mTLS on self-hosted cluster).
ai-gateway service token: 15-min JWT rotated via spiffe/SPIRE.
Redis ACLs: dedicated user for search-service, restricted key prefix search:*.
Postgres creds: Vault-issued, 1h TTL.
No secrets in env files in prod; dev uses .env.local never committed.

7. Rate Limiting & Quotas

Per API_CONTRACTS.md §11. Additionally:

Suspect patterns (regex, wildcard-leading, deeply paginated) rate-capped tighter.
Per-tenant embedding budget enforced by ai-gateway; exceed → lexical-only fallback.
Per-IP anomaly detection (100× baseline) triggers soft block.

8. Audit Logging

Audit entries written for:

Every /reindex or /rebuild-embeddings invocation.
Every platform-admin cross-tenant query.
Every document with visibility=public indexed or tombstoned.
Every recommendation feedback (for L2R integrity).

Audit schema:

{
  "type": "search.audit",
  "actor": { "id": "...", "roles": [] },
  "tenantId": "...",
  "action": "reindex | cross-tenant-query | public-doc-index | feedback-recorded",
  "target": { "kind": "...", "id": "..." },
  "at": "...",
  "traceId": "..."
}

Retained 7 years; routed to analytics-service audit workspace.

9. Encryption

Layer	At rest	In transit
OpenSearch	AES-256 via cloud KMS	TLS 1.3
Postgres	TDE	TLS 1.3
Redis	Disk encryption	TLS 1.2+
NATS	per-stream encryption	mTLS
Client offline store	SQLCipher / WebCrypto	n/a

All TLS inside mesh pinned to service identity (spiffe).

10. Compliance Obligations

GDPR erasure: On tenant.user.removed.v1, search-service deletes user document, user profile embedding, query history within 30d (audit trail kept).
Data residency: Enforced via region field; never ship records cross-region except for the global marketplace slice.
COPPA: Queries for minor-flagged tenants exclude visibility=marketplace unless explicitly allow-listed.
Accessibility: Suggest and explain responses respect locale + RTL.

11. Security Testing

SAST (CodeQL) on every PR.
DAST against staging /search + /suggest weekly.
Fuzzing q + filter DSL 1h/day in CI.
Pen test annually — scope includes cross-tenant boundary.
Chaos security drill: force-delete tenant → verify index purge within SLO.

12. Incident Response

Scenario	Action	SLA
Confirmed cross-tenant leak	Disable offending query path; forensic replay; notify compliance	1h
PII found in public embeddings	Purge vectors; rebuild affected doc; notify DPO	4h
OpenSearch credential leak	Rotate IAM; rotate cluster; audit 30d access	1h
DLQ > 0	Triage poison messages; fix projector; replay	24h

Runbooks in FAILURE_MODES.md.

13. Security Checklist (pre-merge)

All new endpoints covered by authz tests (both positive and negative).
No string interpolation into DSL/SQL.
Every query path provably injects tenantId filter (reviewed via tenancy lint).
PII paths run through sanitizer.
Rate-limit rule defined for any new endpoint.
Secrets read from Vault/config, not code.
Audit event defined for any privileged action.

1. Threat Model (Abbreviated)​

2. Authentication​

3. Authorization​

3.1 Role Matrix​

3.2 Query-Time Filter​

4. Tenant Isolation​

4.1 Data plane​

4.2 Observability plane​

4.3 Operator plane​

5. Input Validation​

6. Secrets & Credentials​

7. Rate Limiting & Quotas​

8. Audit Logging​

9. Encryption​

10. Compliance Obligations​

11. Security Testing​

12. Incident Response​

13. Security Checklist (pre-merge)​