Security & Tenancy
:::info Source
Sourced from docs/13-security-compliance-tenancy.md in the documentation repo.
:::
Companion: 01 Enterprise Architecture · 03 Microservices · 05 API Design · 12 Data Models
This document consolidates the platform's security model, compliance posture, and the multi-layer enforcement of multi-tenant isolation. Every other doc references this one for the canonical rules.
1. Threat Model (summary)
| Threat | Surface | Mitigation |
|---|---|---|
| Credential stuffing / spray | identity-service | argon2id, lockout, adaptive MFA, anomaly classifier |
| Session hijack | API gateway | short JWT TTL + rotating refresh; device binding |
| Cross-tenant data leak | All services | RLS + JWT scope + domain invariants |
| Insecure direct object reference | All services | ABAC policy on every resource access |
| SCORM zip RCE | content-service | Sandbox import + manifest validator + signed origin allowlist |
| Bundle tampering | client + content-service | JWS signature + AES-GCM + tamper report flow |
| Prompt injection | ai-gateway-service | Prompt-injection classifier + system prompt isolation |
| AI PII exfiltration | ai-gateway-service | Pre-call PII redaction; provider no-train flag |
| Webhook replay | webhooks (in/out) | HMAC signing + nonce + timestamp window |
| LMS embed CSRF | LTI | LTI 1.3 platform key validation + per-launch nonce |
| Storage takeover | media + content | Signed URLs scoped per object + per-caller |
| Supply chain | All services | SBOM + provenance attestations (SLSA) + lockfiles |
| Insider abuse | platform admin | Just-in-time elevation + four-eyes + audit |
| DoS | edge + APIs | Rate limits + WAF + geographic blocking |
2. Identity & Authentication
- Standards: OAuth 2.1, OIDC, WebAuthn level 2, SAML 2.0.
- Local AuthN: email + password (argon2id, m=64 MiB, t=3, p=1), magic link, WebAuthn passkeys.
- MFA factors: TOTP, SMS (deprecated for sensitive scopes), WebAuthn, recovery codes.
- Adaptive MFA: triggered by risk classifier (new device, atypical IP, behavior).
- Sessions: access JWT (15 min), rotating refresh (30 d sliding, single-use rotation, family-revoke on detected reuse).
- Device binding: every offline-capable device has a public key registered with identity-service; PlayPackage Bundles encrypted with key derived from
(tenantKey, devicePubKey, bundleId). - JWT signing: asymmetric (EdDSA Ed25519) with KMS-backed keys +
kidrotation; JWKS published.
3. Authorization (RBAC + ABAC)
3.1 Coarse RBAC roles (system + tenant)
platform_admin,compliance_officerorg_owner,org_admin,org_managerprovider_admin,author,reviewer,publisherlearner,individual
3.2 Fine-grained ABAC predicates
Examples (serialized as expression trees in permissions.condition):
resource.tenant_id == ctx.tenant_idresource.org_unit_id IN ctx.user.org_unitsresource.visibility IN ('marketplace','public')resource.created_by == ctx.user.idresource.assignment.owner_id == ctx.user.id
3.3 Decision flow
- JWT presented → verified → claims loaded into
RequestContext. - Route declares required
resource:action. - Policy engine evaluates
(role grants action) AND (ABAC predicate true). - Decision logged with
decisionId(linked into AI/HITL audit chains).
Endpoint POST /api/v1/authz/check lets UIs ask before showing actions.
4. Multi-Tenant Isolation (Multi-Layer)
| Layer | Enforcement | Detail |
|---|---|---|
| Edge (CDN) | Per-tenant domain or path; CSP nonces per tenant | |
| Kong + services | Edge (Kong) validates JWT on protected routes; services inject tenant_id into RequestContext and reject mismatched X-Tenant-Id header (see ADR 0001) | |
| Application (use cases) | Use-cases require TenantId parameter; cross-tenant references rejected at construction | |
| Domain (aggregates) | TenantId value object on every aggregate root; invariants reject cross-tenant references | |
| Postgres | RLS enabled on every table: USING (tenant_id = current_setting('app.tenant_id')::uuid) | |
| Postgres connections | Pool wrapper sets app.tenant_id per request (proxy-init) | |
| Storage (S3/R2) | Per-tenant prefix tenants/{tid}/...; signed URLs scoped per object + caller; bucket policy denies cross-tenant prefix access | |
| Search (OpenSearch) | Tenant filter injected; alias-per-tenant for largest | |
| Vectors (pgvector) | Tenant filter on every k-NN; collection partitions for largest | |
| Caches (Redis) | Key prefixes tenants/{tid}/...; eviction never crosses tenants | |
| AI Gateway | Per-tenant prompt pinning, per-tenant budgets, per-tenant cache | |
| Sync | Cursors + mutations + conflicts scoped by tenant + user + device | |
| Logs | tenant_id on every line; PII-scrubbed; per-tenant retention | |
| Backups | Per-region per-tenant; restore test quarterly |
Tenant isolation tests are mandatory in CI: every service runs a "two-tenant simulator" suite that asserts every read/write/event surface refuses cross-tenant access.
5. Data Classification & Encryption
| Class | Examples | At-rest | In-transit |
|---|---|---|---|
| Public | Marketing, public certs | TLS 1.3 | TLS 1.3 |
| Internal | Course catalog metadata | AES-256, KMS shared | TLS 1.3 |
| Confidential | Learner progress, quiz keys | AES-256, per-tenant KMS data keys (envelope) | TLS 1.3 + mTLS internal |
| Restricted | Credentials, payment refs, PHI | AES-256, per-tenant KMS, restricted access | TLS 1.3 + mTLS + JIT access |
| Offline-bundled | PlayPackage Bundles | AES-256-GCM, per-device-derived key | TLS 1.3 |
Key management:
- KMS-backed (HSM root); per-tenant DEK; hierarchical KEK rotation annual; emergency rotation supported.
- HSM-backed signing keys for JWT, JWS provenance, certificate proof, bundle signature.
6. Network & Edge
- TLS 1.3 only; HSTS preload.
- Strong CSP per-route; nonce-based scripts.
- WAF rules: OWASP CRS + custom (LMS-specific).
- Geo controls per tenant.
- Anti-bot on signup + checkout.
- Service mesh (mTLS) inside cluster; per-service identities for inter-service auth.
7. Application Security
- OWASP ASVS L2 baseline; selected modules (auth, sessions, payments) at L3.
- Input validation at every boundary using Zod (frontend) and Ajv (backend) against shared schemas.
- Output encoding: React escaping + DOMPurify for any innerHTML (rare).
- ORM-only DB access; no string-built SQL.
- File uploads scanned (AV + content-safety) before becoming addressable.
- SCORM imports validated + sandboxed (no eval in zip; manifest-driven).
- Webhook signatures HMAC-SHA256 with nonce + timestamp (5-min window).
8. AI Safety & Governance (Mandatory; full surface in 03 ai-gateway-service)
- All AI calls routed via ai-gateway-service.
- Pre-call: moderation; PII redaction (configurable); prompt-injection shield (heuristic + classifier).
- Routing: local → small cloud → large cloud; per-tenant budget gate.
- Post-call: moderation; structured-output schema validation; refusal handling.
- Provenance: every artifact carries
AIProvenance(see 12). - HITL: AI-generated authoring blocks
status='draft_ai'until accepted;decisionIdties acceptance into audit chain. - No training on tenant data: outbound provider configs explicitly disable; verified at integration test layer with provider-specific assertions.
- Tenant-scoped embeddings: never cross-tenant; deletion follows tenant + user lifecycle.
- EU AI Act: each AI capability is classified (limited / high-risk); high-risk capabilities (e.g., AI grading, AI risk-scoring of learners) require additional documentation, post-market monitoring, and explicit human override paths.
- Bias monitoring: AI assignment recommendations + AI grading evaluated quarterly against demographic-parity + equalized-odds metrics on consenting sample data.
- Right to explanation: UI surfaces "why this recommendation" / "why this score" using model rationale + feature attribution where available.
- Refusal & dispute: users may dispute any AI decision; routes to human reviewer with SLA.
9. Offline License Enforcement
- Every PlayPackage Bundle issued with a LicenseEnvelope (see 12) signed by tenant key.
- Enforces: expiry, device binding, feature gating (AI tutor on/off offline, certificate eligibility, copy/download).
- Player refuses to mount expired/revoked bundles; revocation propagates via sync within minutes online.
- License envelope tampering detected via signature verification on every mount.
10. Tamper Detection (Offline)
- Bundle SHA-256 verified against signed manifest at mount.
- Failure → unmount +
content.bundle.tamper_detected.v1queued for next sync. - Repeated failures → device flagged; user offered fresh download; admin alert.
11. Audit & Logging
- Append-only audit log for: identity events, role/permission changes, data access decisions, AI calls, billing actions, GDPR requests, license grants, certificate issuance, sync conflicts, tenant data residency changes.
- Daily Merkle anchoring: root hash committed to internal anchor store + emitted as
audit.merkle.anchored.v1. Optionally anchored externally per tenant policy. - Tamper evidence: any audit table change without anchor mismatch is detected by daily verification.
- PII scrubbing in operational logs; full PII allowed only in dedicated audit log under restricted access.
- Per-tenant export: compliance officer can export audit slice via analytics-service.
12. Compliance Posture
| Standard | Status | Notes |
|---|---|---|
| GDPR | Required | DSR flow + lawful basis registry + DPA |
| SOC 2 Type II | Required | Annual audit; logging + access reviews |
| ISO 27001 | Required | ISMS docs aligned with this spec set |
| HIPAA (opt-in) | Available | BAA + restricted AI providers + PHI tagging |
| FERPA (opt-in) | Available | Education records + parental access flow |
| ISO/IEC 42001 (AI MS) | Adopt | Aligns with EU AI Act |
| EU AI Act | Required | Risk classification + transparency obligations |
| WCAG 2.2 AA | Required | All tenant-facing UIs |
| PCI DSS | Out-of-scope (tokenized) | Card data never touches our DB |
| KSA / UAE PDPL | Required for region | Data residency + lawful basis |
| Schrems II | Required | Cloud LLM transfers via SCCs + supplementary measures |
13. Data Subject Rights (GDPR / equivalents)
- Access / portability:
POST /api/v1/me/data-exportraisesgdpr.subject_request.received.v1; each service contributes; aggregator zip emailed. - Erasure: request raised; saga across services; financial/audit data may be retained under legal basis with
redactedflag. - Rectification: standard profile/preference endpoints.
- Objection / restriction: AI-decision opt-out and opt-down (manual review only).
- Right not to be subject to automated decision-making: AI features that meet the threshold (high-risk) provide explicit human-only path.
14. Data Residency
- Per-tenant region pin (
us,eu,me,ap). - Cross-region replication only with explicit opt-in.
- Tenant residency change is a saga (see 04).
- Vector + AI cache stay in-region; cross-region embeddings forbidden.
- Backups stay in-region; DR cross-region only with opt-in.
15. Operational Security
- Just-in-time access: internal staff request elevated tenant access; auto-expires; logged with reason; four-eyes for restricted actions.
- Bastion + audit: all production access via bastion with session recording.
- Secrets: vault-backed; rotated; per-environment; never in source.
- SBOM: generated per build; signed; vulnerability scan gate.
- Patch SLA: critical 24h, high 7d, medium 30d.
- DR drills: quarterly; full region failover annually.
16. Incident Response
- 24×7 on-call rotation per service.
- Severity matrix (Sev1 → Sev4) with comms playbooks.
- Customer-impacting incidents disclosed within contractual SLA (typically 72 h for data incidents).
- Post-incident review within 5 business days; corrective actions tracked.
17. Privacy & Data Minimization
- Collect only what's needed; defaults privacy-preserving.
- AI features default-off pending tenant opt-in.
- Telemetry user-identifying fields hashed; opt-out per user.
- Cookies: strictly necessary by default; consent for analytics; granular per region.
18. Testing & Verification
- Tenant isolation suite in every service.
- AuthZ test matrix (role × resource × condition).
- DAST + SAST in CI; targeted pen-test per release; bug-bounty program.
- AI safety suite: prompt-injection corpus; PII corpus; jailbreak corpus; bias evals.
- Offline integrity suite: bundle tamper, license expiry, revocation under sync.
- Audit proofs: daily Merkle root verified by independent job.
19. Why
Trust is the product's substrate. Multi-tenant SaaS with AI + offline can fail in many subtle ways — cross-tenant cache hits, AI provenance loss, offline license bypass. Concentrating these mitigations into a single doc + a single set of mandates (RLS, gateway, sync protocol, audit) means we enforce them consistently across 19 services rather than re-deriving them per team.