AI Gateway Service — Testing Strategy
Status: populated Owner: TBD Last updated: 2026-04-17 Companion: Service Template · standards/TESTING_STANDARDS.md
1. Coverage targets
| Layer | Target |
|---|---|
| Unit | ≥ 85% statements, ≥ 80% branches |
| Integration | ≥ 80% critical paths |
| Contract | 100% of published events + OpenAPI |
| E2E | All P1 scenarios |
2. Unit tests
| Unit | Coverage |
|---|---|
AIDecision state machine | valid + invalid transitions |
ProviderRouter | rule matching, residency filtering, fallback |
QuotaService | window rollover, concurrent consume |
ModerationPipeline | threshold logic, short-circuit on block |
ProvenanceFactory | all required fields stamped |
| Prompt template resolver | semver match, locale fallback |
| PHI redactor | no raw text leaks to log formatter |
3. Integration tests (mandatory)
| Spec | What |
|---|---|
tenant-isolation.spec | Cross-tenant reads/writes forbidden by RLS and app guard |
outbox.spec | Assist commits decision + outbox row in same tx; relay publishes exactly-once |
inbox.spec | Consumed config.* and tenant.* events deduped |
policy-timeout.spec | AI_POLICY_DENY when access-policy times out |
quota-exceeded.spec | 429 + ai_gateway.quota.exceeded.v1 |
provider-fallback.spec | Primary errors → fallback provider; provider.degraded emitted |
moderation-block.spec | 422 path + flagged event |
hitl-flow.spec | draft → under_review → accepted; accepted event consumed by owner |
phi-logging.spec | Assert raw instructions and draftText never appear in default event payloads |
4. Contract tests
| Type | Tool |
|---|---|
| OpenAPI (REST) | openapi diff + Dredd |
| Event schema | Ajv against @ghasi/event-envelope |
| Pact consumer | patient-chart, medication, portal expectations |
5. E2E
- Playwright suite: reviewer dashboard accept/reject flow; portal triage end-to-end; cross-service accept in patient-chart.
- Load: k6 script, 100 RPS/assist sustained 10 min, ramp to 500 RPS 2 min.
6. Safety & safety-adjacent tests
| Scenario | Assertion |
|---|---|
| Prompt-injection sample set | Block rate ≥ 95% on curated corpus |
| PHI-sniff sample set | Block rate ≥ 98% on synthetic PHI prompts |
| Red-team triage prompts | No medical advice without disclaimer + triage escalation paths |
| Regression vs previous prompt template version | Delta report reviewed by clinical SME |
7. Non-functional tests
- Chaos: provider kill, NATS partition, Redis outage — verify fail-closed behaviour and event delivery recovery.
- Security: static + dynamic analysis, dependency scan, secret scan.