Testing
:::info Source
Sourced from services/ai-gateway-service/TESTING_STRATEGY.md in the documentation repo.
:::
1. Coverage
Domain 95%/98% branch/80% mutation. Integration 85%.
2. Unit Tests
- Prompt template rendering.
- Safety pipeline: each stage (moderation, PII, injection, schema).
- Budget atomic debit (concurrent callers).
- Model router preference resolution.
- Cache key derivation.
- Provenance VO invariants.
- AIClient port contract (shape validation).
3. Integration Tests
- Postgres + pgvector + Redis + mock providers.
- Full call flow with each prompt.
- Fallback from primary to secondary provider.
- Budget exhaustion path.
- Embedding + k-NN round trip.
4. Contract Tests
- AIClient port contract verified by every consumer (authoring, delivery, assessment, analytics, billing, notification, media).
- OpenAPI diff in CI.
5. AI Eval Harness
For every prompt, maintain:
- Eval set: 100–500 representative inputs with expected output shape / quality metrics.
- Regression eval: new prompt version must score ≥ current.
- Safety eval: adversarial inputs (injection, jailbreak, harmful content) expected to refuse.
- Bias eval: demographic parity test corpus (high-risk prompts).
Run on every prompt version PR + nightly against active prompts.
6. Safety Tests
- Known prompt injection vectors (e.g., "ignore previous instructions") → detected/blocked.
- PII in input → redacted or blocked per policy.
- CSAM simulation → blocked + reported path.
- Jailbreak attempts → refused.
7. E2E
- J-04: learner asks tutor; tutor responds with citations + provenance.
- J-05 continuation: author co-author flow with AI block.
- J-13: admin publishes new prompt version; staged rollout.
8. Load Tests
- 1k concurrent completions; p95 first-token < 600ms.
- Cache stampede test: 10k simultaneous same-key requests → dedup.
- Budget debit under concurrency: no negative balance, no double-debit.
9. Chaos
- Kill primary provider mid-request → fallback succeeds.
- NATS partition → audit events buffer in outbox.
- KMS 30s outage → tolerated (short-window provider keys cached).
- Cache failure → direct calls continue.
10. Bias Monitoring
- Quarterly automated eval on high-risk prompts.
- Statistical tests (chi-square on category distribution across demographics).
- Report reviewed by compliance.
11. Offline Tests
- Local model produces output with same contract shape.
- Local completions queue; sync on reconnect; audit reconciled.
12. Security Tests
- Cross-tenant prompt access → 403.
- Cross-tenant embedding k-NN → filtered (tenant filter injected).
- Audit log tamper detection (Merkle) works.
- Provider API key not in logs (CI grep).
13. CI Gates
- Unit + integration + Pact + AIClient contract green.
- All active prompts pass eval.
- Safety + bias eval on affected prompts.
- Two-tenant iso.
- Mutation ≥ 80%.