Security
:::info Source
Sourced from services/search-service/SECURITY_MODEL.md in the documentation repo.
:::
Inherits platform baselines from docs/13-security-compliance-tenancy.md.
Security in search is about three things: tenant isolation, authorization on every hit, and content safety in the index.
1. Threat Model (Abbreviated)
| Threat | Impact | Mitigation |
|---|---|---|
| Cross-tenant leak via query | HIGH | Mandatory tenantId filter injected at port layer; unit + integration + fuzz test every adapter call path |
| Bypass via direct OpenSearch access | CRITICAL | OpenSearch not internet-exposed; inside VPC; no direct clients outside search-service |
| Query injection (Lucene) | HIGH | Never concatenate raw user input into Lucene DSL; query built via typed AST → OS DSL |
| Embedding leak (PII in vectors) | HIGH | Sanitizer before embedding for visibility ∈ {marketplace, public}; vectors never leave tenant for others |
| DoS via expensive queries | MED | Query cost analyzer (wildcard/regex rejection); per-actor quotas; timeout 5s |
| Prompt injection via LLM path | MED | Template-guided prompts; LLM output validated via schema; never executed |
| Reindex abuse (resource exhaust) | MED | 2/day/tenant rate limit; quota per org size |
| Stale visibility (document should be private but shows up) | HIGH | Visibility events prioritized in consumer; reindex on demotion |
| Recommendation feedback spoofing | MED | Signed generationId; only the recipient user can feedback |
2. Authentication
- All endpoints except
/healthzand/readyzrequire a valid JWT. - JWT validated against identity-service JWKS (cached 1h).
- Required claims:
sub,tid,roles,iat,exp,scope. - S2S (projector → ai-gateway, for example) uses mTLS + service-account JWT.
3. Authorization
3.1 Role Matrix
| Role | /search | /suggest | /recommendations/* | /reindex | /rebuild-embeddings | /debug/explain |
|---|---|---|---|---|---|---|
| learner | yes | yes | self | no | no | no |
| instructor | yes | yes | self | no | no | no |
| manager | yes | yes | self + reports | no | no | no |
| tenant-admin | yes | yes | any in tenant | yes | no | no |
| platform-admin | yes (cross-tenant) | n/a | n/a | yes | yes | yes |
| service-account (projector) | n/a | n/a | n/a | n/a | n/a | n/a |
Platform-admin cross-tenant access is strictly audited.
3.2 Query-Time Filter
Every inbound query is augmented (before reaching OpenSearch) with:
tenantId = <actor.tid>
AND (visibility IN applicableTo(actor.roles))
AND deletedAt NOT EXISTS
For visibility=marketplace queries (e.g. marketplace discovery), tenantId filter is replaced with visibility=marketplace and additional region filter.
Cross-tenant search is a distinct, audited endpoint used only by platform-admin tooling.
4. Tenant Isolation
4.1 Data plane
- Every OpenSearch index filtered by
tenantId(enforced at port). - Per-tenant dedicated aliases for top-N tenants — zero index-level sharing.
- Postgres
searchschema uses RLS on every table (USING (tenant_id = current_setting('app.tenant_id')::uuid)). - pgvector calls to ai-gateway carry
tenantIdheader; ai-gateway-side RLS enforces isolation. - Redis keys prefixed
search:{tenantId}:…; key scan restricted in code.
4.2 Observability plane
- Logs redact PII fields;
tenantIdallowed. - Traces tagged with
tenant.id; cross-tenant query span ties are not joinable in the UI (enforced at Grafana datasource level).
4.3 Operator plane
- Reindex and rebuild endpoints require the caller's
tidto equal the target tenant (except platform-admin). - Platform-admin actions emit
platform.admin.action.performed.v1audit events.
5. Input Validation
| Surface | Validation |
|---|---|
Query string q | length ≤ 512; UTF-8 well-formed; reject control chars except tab |
| Filter DSL | Parsed by typed AST parser; unknown ops rejected |
hybridAlpha | 0.0..1.0 bounded |
page.size | 1..100 |
type | strict enum |
locale | BCP47 regex |
| Reindex body | JSON schema validated |
| Feedback body | schema + verified generationId belongs to user |
Zod or JSON-Schema validators live at the HTTP layer; every use case also revalidates domain invariants.
6. Secrets & Credentials
- OpenSearch credentials: short-lived IAM tokens via SigV4 (or mTLS on self-hosted cluster).
- ai-gateway service token: 15-min JWT rotated via spiffe/SPIRE.
- Redis ACLs: dedicated user for search-service, restricted key prefix
search:*. - Postgres creds: Vault-issued, 1h TTL.
- No secrets in env files in prod; dev uses
.env.localnever committed.
7. Rate Limiting & Quotas
Per API_CONTRACTS.md §11. Additionally:
- Suspect patterns (regex, wildcard-leading, deeply paginated) rate-capped tighter.
- Per-tenant embedding budget enforced by ai-gateway; exceed → lexical-only fallback.
- Per-IP anomaly detection (100× baseline) triggers soft block.
8. Audit Logging
Audit entries written for:
- Every
/reindexor/rebuild-embeddingsinvocation. - Every platform-admin cross-tenant query.
- Every document with
visibility=publicindexed or tombstoned. - Every recommendation feedback (for L2R integrity).
Audit schema:
{
"type": "search.audit",
"actor": { "id": "...", "roles": [] },
"tenantId": "...",
"action": "reindex | cross-tenant-query | public-doc-index | feedback-recorded",
"target": { "kind": "...", "id": "..." },
"at": "...",
"traceId": "..."
}
Retained 7 years; routed to analytics-service audit workspace.
9. Encryption
| Layer | At rest | In transit |
|---|---|---|
| OpenSearch | AES-256 via cloud KMS | TLS 1.3 |
| Postgres | TDE | TLS 1.3 |
| Redis | Disk encryption | TLS 1.2+ |
| NATS | per-stream encryption | mTLS |
| Client offline store | SQLCipher / WebCrypto | n/a |
All TLS inside mesh pinned to service identity (spiffe).
10. Compliance Obligations
- GDPR erasure: On
tenant.user.removed.v1, search-service deletes user document, user profile embedding, query history within 30d (audit trail kept). - Data residency: Enforced via
regionfield; never ship records cross-region except for the global marketplace slice. - COPPA: Queries for minor-flagged tenants exclude
visibility=marketplaceunless explicitly allow-listed. - Accessibility: Suggest and explain responses respect locale + RTL.
11. Security Testing
- SAST (CodeQL) on every PR.
- DAST against staging
/search+/suggestweekly. - Fuzzing
q+ filter DSL 1h/day in CI. - Pen test annually — scope includes cross-tenant boundary.
- Chaos security drill: force-delete tenant → verify index purge within SLO.
12. Incident Response
| Scenario | Action | SLA |
|---|---|---|
| Confirmed cross-tenant leak | Disable offending query path; forensic replay; notify compliance | 1h |
| PII found in public embeddings | Purge vectors; rebuild affected doc; notify DPO | 4h |
| OpenSearch credential leak | Rotate IAM; rotate cluster; audit 30d access | 1h |
| DLQ > 0 | Triage poison messages; fix projector; replay | 24h |
Runbooks in FAILURE_MODES.md.
13. Security Checklist (pre-merge)
- All new endpoints covered by authz tests (both positive and negative).
- No string interpolation into DSL/SQL.
- Every query path provably injects
tenantIdfilter (reviewed via tenancy lint). - PII paths run through sanitizer.
- Rate-limit rule defined for any new endpoint.
- Secrets read from Vault/config, not code.
- Audit event defined for any privileged action.