Testing
:::info Source
Sourced from services/search-service/TESTING_STRATEGY.md in the documentation repo.
:::
Inherits platform baselines from docs/16-testing-strategy-qa.md. Minimum 80% coverage. TDD required.
1. Test Pyramid
┌────────────────┐
│ e2e / cross │ ~10%
├────────────────┤
│ contract │ ~10%
├────────────────┤
│ integration │ ~25%
├────────────────┤
│ unit │ ~55%
└────────────────┘
Total minimum coverage: 80% lines, 75% branches. Ranking / ranking-adjacent code requires 90%.
2. Unit Tests
Scope
- Domain model validators (all invariants D1..D8 from DOMAIN_MODEL.md).
- Query AST parser and filter translator.
- RRF merge, hybrid score blend, MMR diversification.
- Authorization filter builder.
- Content hash + sanitizer.
- Projector mapping functions (event → document).
- Rate limiter sliding window logic.
- Cache key derivation.
Tools
vitest(TypeScript/Node runtime).fast-checkfor property tests on query parser + auth filter.msw(mock service worker) for ai-gateway stubs.
Examples
describe('hybridBlend', () => {
test('returns lexical-only when alpha=1', () => {
expect(hybridBlend({ lex: 0.8, sem: 0.2 }, 1.0)).toBe(0.8);
});
test.prop([fc.float({ min: 0, max: 1 })])('alpha is convex', (alpha) => {
const s = hybridBlend({ lex: 1, sem: 0 }, alpha);
expect(s).toBeLessThanOrEqual(1);
expect(s).toBeGreaterThanOrEqual(0);
});
});
describe('buildAuthFilter', () => {
test('blocks cross-tenant by default', () => {
const f = buildAuthFilter({ tenantId: 'A', roles: ['learner'] }, 'A');
expect(JSON.stringify(f)).toContain('"tenantId":"A"');
});
});
3. Integration Tests
Scope
- Real OpenSearch (via testcontainers).
- Real Postgres (testcontainers).
- Real Redis.
- Mocked ai-gateway (HTTP mock).
- NATS ephemeral stream (testcontainers).
Fixtures
- Seed a multi-tenant corpus: 3 tenants × 100 docs across every type.
- Deterministic embeddings (hash → float vector) so semantic tests are reproducible.
- 20 golden queries with known expected top-3 per tenant.
Examples
- Publish a
catalog.course_version.published.v1event → assert document searchable within 2s. - Reindex lifecycle: create → running → completed; alias swap verified; old index deleted.
- DLQ: inject a malformed event → observe DLQ row + alert trigger.
- Tenant erasure:
tenant.user.removed.v1→ user doc absent in search within 30s.
4. Contract Tests
Consumer side (search-service validates it understands producers)
For every subscribed subject, a Pact/JSON-Schema contract is fetched from the source service's schema registry; search-service runs a verifier on each PR.
| Producer | Subjects |
|---|---|
| catalog-service | course_version.published.v1, course.deleted.v1, course.visibility_changed.v1 |
| authoring-service | block.updated.v1, lesson.updated.v1 |
| marketplace-service | listing.approved.v1, listing.withdrawn.v1 |
| certification-service | certificate.issued.v1, certificate.revoked.v1 |
| tenant-service | membership_activated.v1, membership_deactivated.v1, policy.updated.v1 |
Producer side (search-service's own events)
Consumers (analytics, sync) register contracts → search-service runs a producer verifier.
5. E2E Tests
Scenario-level, runs against staging.
| Scenario | Steps | Assertion |
|---|---|---|
| Publish course → search | Publish course, wait 3s, GET /search?q=<title> | Document appears |
| Authoring block edit → doc freshness | Edit block body, wait 3s, semantic query | Updated body reflected |
| Tenant removal | Remove membership, wait 60s | User doc gone |
| Hybrid fallback | Kill pgvector, run hybrid query | Lexical-only result, degraded=true |
| Recommendation cache | Request recs twice in 5s | Second served from cache (trace span cache.hit=true) |
| Cross-tenant negative | Issue query with X-Tenant-Id ≠ JWT tid | 403 |
| Reindex | Start reindex, wait until completed, query | Same results, no gaps |
| Offline | Flight mode → query local trie | Offline banner, results from cache |
Framework: Playwright (for app-layer) + custom Node runner for API-only flows.
6. Sync E2E
| Scenario | Expected |
|---|---|
| Airplane mode while browsing | Offline doc set + cached recommendations shown |
| Reconnect with queued feedback | Feedback flushed, no duplicates (idempotent by localId) |
| Logout | Offline store wiped |
| Switch device | Pulls fresh bundle within 30s |
| Tenant removed | Bundle invalidated via sync.bundle.revoked.v1 |
7. AI / Quality Evaluation
Not a CI gate — nightly pipeline.
7.1 Ranking eval
- Golden judgments: 500 (q, relevantDocIds) tuples per tenant segment.
- Compute NDCG@10, MRR, Recall@20.
- Regression threshold: ≤ -1% vs baseline → block promotion.
7.2 L2R training
- Weekly cadence.
- Offline job reads from analytics-service; trains LambdaMART; runs eval; if NDCG@10 > baseline + 0.5%, promote artifact.
7.3 PII leakage audit
- Monthly sample of 500
visibility=publicdocuments. - Presidio scan + manual review.
- Target: zero leaks.
8. Performance Tests
Load profile
| Scenario | Load | Duration | Target |
|---|---|---|---|
| Steady state | 500 RPS mixed | 30m | p95 ≤ 250ms |
| Spike | 2000 RPS sustained | 5m | p99 ≤ 800ms |
| Embedding burst | bulk index 50k docs | - | ≤ 10 min |
| Reindex 1M docs | - | - | ≤ 1h |
Tool: k6 + custom JS scripts. Runs nightly on staging.
9. Chaos / Failure Injection
Monthly chaos drill (see FAILURE_MODES.md for scenarios):
- Kill OpenSearch primary.
- Disconnect pgvector.
- Inject 50% network loss to ai-gateway.
- Poison NATS message.
- Tenant erasure under load.
Pass criteria: degraded-but-available, no cross-tenant leaks, recovery within 5m of restoration.
10. Fuzzing
q parameter + filter DSL fuzzed via libfuzzer wrapper. Goals:
- No 5xx.
- No OpenSearch DSL injection.
- No request budget overrun (always < 5s).
Runs 1h/day in CI.
11. Coverage Enforcement
- CI fails if line coverage < 80%, branch < 75%, ranking-pkg line < 90%.
- PR comment posts delta with Codecov.
npm run coverage:gateverified locally.
12. Test Data
Synthetic generator in test/fixtures/generator.ts creates deterministic corpora. No real tenant data in CI. Staging uses anonymized snapshot.
13. TDD Workflow (mandatory)
- Write red test.
- Minimum green impl.
- Refactor; re-run.
- Every new domain invariant = new unit test.
- Every new endpoint = contract + happy path + at least 2 negative paths.
14. Flaky Test Policy
- Any test flaking > 1% over 14d → quarantined + ticket filed → must be fixed within 2 sprints or deleted.
- Integration tests use deterministic embedding stubs to prevent floating-point flakiness.