Testing

:::info Source Sourced from services/ai-gateway-service/TESTING_STRATEGY.md in the documentation repo. :::

1. Coverage

Domain 95%/98% branch/80% mutation. Integration 85%.

2. Unit Tests

Prompt template rendering.
Safety pipeline: each stage (moderation, PII, injection, schema).
Budget atomic debit (concurrent callers).
Model router preference resolution.
Cache key derivation.
Provenance VO invariants.
AIClient port contract (shape validation).

3. Integration Tests

Postgres + pgvector + Redis + mock providers.
Full call flow with each prompt.
Fallback from primary to secondary provider.
Budget exhaustion path.
Embedding + k-NN round trip.

4. Contract Tests

AIClient port contract verified by every consumer (authoring, delivery, assessment, analytics, billing, notification, media).
OpenAPI diff in CI.

5. AI Eval Harness

For every prompt, maintain:

Eval set: 100–500 representative inputs with expected output shape / quality metrics.
Regression eval: new prompt version must score ≥ current.
Safety eval: adversarial inputs (injection, jailbreak, harmful content) expected to refuse.
Bias eval: demographic parity test corpus (high-risk prompts).

Run on every prompt version PR + nightly against active prompts.

6. Safety Tests

Known prompt injection vectors (e.g., "ignore previous instructions") → detected/blocked.
PII in input → redacted or blocked per policy.
CSAM simulation → blocked + reported path.
Jailbreak attempts → refused.

7. E2E

J-04: learner asks tutor; tutor responds with citations + provenance.
J-05 continuation: author co-author flow with AI block.
J-13: admin publishes new prompt version; staged rollout.

8. Load Tests

1k concurrent completions; p95 first-token < 600ms.
Cache stampede test: 10k simultaneous same-key requests → dedup.
Budget debit under concurrency: no negative balance, no double-debit.

9. Chaos

Kill primary provider mid-request → fallback succeeds.
NATS partition → audit events buffer in outbox.
KMS 30s outage → tolerated (short-window provider keys cached).
Cache failure → direct calls continue.

10. Bias Monitoring

Quarterly automated eval on high-risk prompts.
Statistical tests (chi-square on category distribution across demographics).
Report reviewed by compliance.

11. Offline Tests

Local model produces output with same contract shape.
Local completions queue; sync on reconnect; audit reconciled.

12. Security Tests

Cross-tenant prompt access → 403.
Cross-tenant embedding k-NN → filtered (tenant filter injected).
Audit log tamper detection (Merkle) works.
Provider API key not in logs (CI grep).

13. CI Gates

Unit + integration + Pact + AIClient contract green.
All active prompts pass eval.
Safety + bias eval on affected prompts.
Two-tenant iso.
Mutation ≥ 80%.

1. Coverage​

2. Unit Tests​

3. Integration Tests​

4. Contract Tests​

5. AI Eval Harness​

6. Safety Tests​

7. E2E​

8. Load Tests​

9. Chaos​

10. Bias Monitoring​

11. Offline Tests​

12. Security Tests​

13. CI Gates​