Skip to main content

Testing

:::info Source Sourced from services/ai-gateway-service/TESTING_STRATEGY.md in the documentation repo. :::

1. Coverage

Domain 95%/98% branch/80% mutation. Integration 85%.

2. Unit Tests

  • Prompt template rendering.
  • Safety pipeline: each stage (moderation, PII, injection, schema).
  • Budget atomic debit (concurrent callers).
  • Model router preference resolution.
  • Cache key derivation.
  • Provenance VO invariants.
  • AIClient port contract (shape validation).

3. Integration Tests

  • Postgres + pgvector + Redis + mock providers.
  • Full call flow with each prompt.
  • Fallback from primary to secondary provider.
  • Budget exhaustion path.
  • Embedding + k-NN round trip.

4. Contract Tests

  • AIClient port contract verified by every consumer (authoring, delivery, assessment, analytics, billing, notification, media).
  • OpenAPI diff in CI.

5. AI Eval Harness

For every prompt, maintain:

  • Eval set: 100–500 representative inputs with expected output shape / quality metrics.
  • Regression eval: new prompt version must score ≥ current.
  • Safety eval: adversarial inputs (injection, jailbreak, harmful content) expected to refuse.
  • Bias eval: demographic parity test corpus (high-risk prompts).

Run on every prompt version PR + nightly against active prompts.

6. Safety Tests

  • Known prompt injection vectors (e.g., "ignore previous instructions") → detected/blocked.
  • PII in input → redacted or blocked per policy.
  • CSAM simulation → blocked + reported path.
  • Jailbreak attempts → refused.

7. E2E

  • J-04: learner asks tutor; tutor responds with citations + provenance.
  • J-05 continuation: author co-author flow with AI block.
  • J-13: admin publishes new prompt version; staged rollout.

8. Load Tests

  • 1k concurrent completions; p95 first-token < 600ms.
  • Cache stampede test: 10k simultaneous same-key requests → dedup.
  • Budget debit under concurrency: no negative balance, no double-debit.

9. Chaos

  • Kill primary provider mid-request → fallback succeeds.
  • NATS partition → audit events buffer in outbox.
  • KMS 30s outage → tolerated (short-window provider keys cached).
  • Cache failure → direct calls continue.

10. Bias Monitoring

  • Quarterly automated eval on high-risk prompts.
  • Statistical tests (chi-square on category distribution across demographics).
  • Report reviewed by compliance.

11. Offline Tests

  • Local model produces output with same contract shape.
  • Local completions queue; sync on reconnect; audit reconciled.

12. Security Tests

  • Cross-tenant prompt access → 403.
  • Cross-tenant embedding k-NN → filtered (tenant filter injected).
  • Audit log tamper detection (Merkle) works.
  • Provider API key not in logs (CI grep).

13. CI Gates

  • Unit + integration + Pact + AIClient contract green.
  • All active prompts pass eval.
  • Safety + bias eval on affected prompts.
  • Two-tenant iso.
  • Mutation ≥ 80%.