Testing

:::info Source Sourced from services/search-service/TESTING_STRATEGY.md in the documentation repo. :::

Inherits platform baselines from docs/16-testing-strategy-qa.md. Minimum 80% coverage. TDD required.

1. Test Pyramid

        ┌────────────────┐
        │  e2e / cross   │  ~10%
        ├────────────────┤
        │   contract     │  ~10%
        ├────────────────┤
        │  integration   │  ~25%
        ├────────────────┤
        │    unit        │  ~55%
        └────────────────┘

Total minimum coverage: 80% lines, 75% branches. Ranking / ranking-adjacent code requires 90%.

2. Unit Tests

Scope

Domain model validators (all invariants D1..D8 from DOMAIN_MODEL.md).
Query AST parser and filter translator.
RRF merge, hybrid score blend, MMR diversification.
Authorization filter builder.
Content hash + sanitizer.
Projector mapping functions (event → document).
Rate limiter sliding window logic.
Cache key derivation.

Tools

vitest (TypeScript/Node runtime).
fast-check for property tests on query parser + auth filter.
msw (mock service worker) for ai-gateway stubs.

Examples

describe('hybridBlend', () => {
  test('returns lexical-only when alpha=1', () => {
    expect(hybridBlend({ lex: 0.8, sem: 0.2 }, 1.0)).toBe(0.8);
  });

  test.prop([fc.float({ min: 0, max: 1 })])('alpha is convex', (alpha) => {
    const s = hybridBlend({ lex: 1, sem: 0 }, alpha);
    expect(s).toBeLessThanOrEqual(1);
    expect(s).toBeGreaterThanOrEqual(0);
  });
});

describe('buildAuthFilter', () => {
  test('blocks cross-tenant by default', () => {
    const f = buildAuthFilter({ tenantId: 'A', roles: ['learner'] }, 'A');
    expect(JSON.stringify(f)).toContain('"tenantId":"A"');
  });
});

3. Integration Tests

Scope

Real OpenSearch (via testcontainers).
Real Postgres (testcontainers).
Real Redis.
Mocked ai-gateway (HTTP mock).
NATS ephemeral stream (testcontainers).

Fixtures

Seed a multi-tenant corpus: 3 tenants × 100 docs across every type.
Deterministic embeddings (hash → float vector) so semantic tests are reproducible.
20 golden queries with known expected top-3 per tenant.

Examples

Publish a catalog.course_version.published.v1 event → assert document searchable within 2s.
Reindex lifecycle: create → running → completed; alias swap verified; old index deleted.
DLQ: inject a malformed event → observe DLQ row + alert trigger.
Tenant erasure: tenant.user.removed.v1 → user doc absent in search within 30s.

4. Contract Tests

Consumer side (search-service validates it understands producers)

For every subscribed subject, a Pact/JSON-Schema contract is fetched from the source service's schema registry; search-service runs a verifier on each PR.

Producer	Subjects
catalog-service	`course_version.published.v1`, `course.deleted.v1`, `course.visibility_changed.v1`
authoring-service	`block.updated.v1`, `lesson.updated.v1`
marketplace-service	`listing.approved.v1`, `listing.withdrawn.v1`
certification-service	`certificate.issued.v1`, `certificate.revoked.v1`
tenant-service	`membership_activated.v1`, `membership_deactivated.v1`, `policy.updated.v1`

Producer side (search-service's own events)

Consumers (analytics, sync) register contracts → search-service runs a producer verifier.

5. E2E Tests

Scenario-level, runs against staging.

Scenario	Steps	Assertion
Publish course → search	Publish course, wait 3s, `GET /search?q=<title>`	Document appears
Authoring block edit → doc freshness	Edit block body, wait 3s, semantic query	Updated body reflected
Tenant removal	Remove membership, wait 60s	User doc gone
Hybrid fallback	Kill pgvector, run hybrid query	Lexical-only result, `degraded=true`
Recommendation cache	Request recs twice in 5s	Second served from cache (trace span `cache.hit=true`)
Cross-tenant negative	Issue query with X-Tenant-Id ≠ JWT tid	403
Reindex	Start reindex, wait until completed, query	Same results, no gaps
Offline	Flight mode → query local trie	Offline banner, results from cache

Framework: Playwright (for app-layer) + custom Node runner for API-only flows.

6. Sync E2E

Scenario	Expected
Airplane mode while browsing	Offline doc set + cached recommendations shown
Reconnect with queued feedback	Feedback flushed, no duplicates (idempotent by localId)
Logout	Offline store wiped
Switch device	Pulls fresh bundle within 30s
Tenant removed	Bundle invalidated via `sync.bundle.revoked.v1`

7. AI / Quality Evaluation

Not a CI gate — nightly pipeline.

7.1 Ranking eval

Golden judgments: 500 (q, relevantDocIds) tuples per tenant segment.
Compute NDCG@10, MRR, Recall@20.
Regression threshold: ≤ -1% vs baseline → block promotion.

7.2 L2R training

Weekly cadence.
Offline job reads from analytics-service; trains LambdaMART; runs eval; if NDCG@10 > baseline + 0.5%, promote artifact.

7.3 PII leakage audit

Monthly sample of 500 visibility=public documents.
Presidio scan + manual review.
Target: zero leaks.

8. Performance Tests

Load profile

Scenario	Load	Duration	Target
Steady state	500 RPS mixed	30m	p95 ≤ 250ms
Spike	2000 RPS sustained	5m	p99 ≤ 800ms
Embedding burst	bulk index 50k docs	-	≤ 10 min
Reindex 1M docs	-	-	≤ 1h

Tool: k6 + custom JS scripts. Runs nightly on staging.

9. Chaos / Failure Injection

Monthly chaos drill (see FAILURE_MODES.md for scenarios):

Kill OpenSearch primary.
Disconnect pgvector.
Inject 50% network loss to ai-gateway.
Poison NATS message.
Tenant erasure under load.

Pass criteria: degraded-but-available, no cross-tenant leaks, recovery within 5m of restoration.

10. Fuzzing

q parameter + filter DSL fuzzed via libfuzzer wrapper. Goals:

No 5xx.
No OpenSearch DSL injection.
No request budget overrun (always < 5s).

Runs 1h/day in CI.

11. Coverage Enforcement

CI fails if line coverage < 80%, branch < 75%, ranking-pkg line < 90%.
PR comment posts delta with Codecov.
npm run coverage:gate verified locally.

12. Test Data

Synthetic generator in test/fixtures/generator.ts creates deterministic corpora. No real tenant data in CI. Staging uses anonymized snapshot.

13. TDD Workflow (mandatory)

Write red test.
Minimum green impl.
Refactor; re-run.
Every new domain invariant = new unit test.
Every new endpoint = contract + happy path + at least 2 negative paths.

14. Flaky Test Policy

Any test flaking > 1% over 14d → quarantined + ticket filed → must be fixed within 2 sprints or deleted.
Integration tests use deterministic embedding stubs to prevent floating-point flakiness.

1. Test Pyramid​

2. Unit Tests​

Scope​

Tools​

Examples​

3. Integration Tests​

Scope​

Fixtures​

Examples​

4. Contract Tests​

Consumer side (search-service validates it understands producers)​

Producer side (search-service's own events)​

5. E2E Tests​

6. Sync E2E​

7. AI / Quality Evaluation​

7.1 Ranking eval​

7.2 L2R training​

7.3 PII leakage audit​

8. Performance Tests​

Load profile​

9. Chaos / Failure Injection​

10. Fuzzing​

11. Coverage Enforcement​

12. Test Data​

13. TDD Workflow (mandatory)​

14. Flaky Test Policy​

1. Test Pyramid

2. Unit Tests

Scope

Tools

Examples

3. Integration Tests

Scope

Fixtures

Examples

4. Contract Tests

Consumer side (search-service validates it understands producers)

Producer side (search-service's own events)

5. E2E Tests

6. Sync E2E

7. AI / Quality Evaluation

7.1 Ranking eval

7.2 L2R training

7.3 PII leakage audit

8. Performance Tests

Load profile

9. Chaos / Failure Injection

10. Fuzzing

11. Coverage Enforcement

12. Test Data

13. TDD Workflow (mandatory)

14. Flaky Test Policy