Skip to main content

Testing

:::info Source Sourced from services/search-service/TESTING_STRATEGY.md in the documentation repo. :::

Inherits platform baselines from docs/16-testing-strategy-qa.md. Minimum 80% coverage. TDD required.

1. Test Pyramid

┌────────────────┐
│ e2e / cross │ ~10%
├────────────────┤
│ contract │ ~10%
├────────────────┤
│ integration │ ~25%
├────────────────┤
│ unit │ ~55%
└────────────────┘

Total minimum coverage: 80% lines, 75% branches. Ranking / ranking-adjacent code requires 90%.

2. Unit Tests

Scope

  • Domain model validators (all invariants D1..D8 from DOMAIN_MODEL.md).
  • Query AST parser and filter translator.
  • RRF merge, hybrid score blend, MMR diversification.
  • Authorization filter builder.
  • Content hash + sanitizer.
  • Projector mapping functions (event → document).
  • Rate limiter sliding window logic.
  • Cache key derivation.

Tools

  • vitest (TypeScript/Node runtime).
  • fast-check for property tests on query parser + auth filter.
  • msw (mock service worker) for ai-gateway stubs.

Examples

describe('hybridBlend', () => {
test('returns lexical-only when alpha=1', () => {
expect(hybridBlend({ lex: 0.8, sem: 0.2 }, 1.0)).toBe(0.8);
});

test.prop([fc.float({ min: 0, max: 1 })])('alpha is convex', (alpha) => {
const s = hybridBlend({ lex: 1, sem: 0 }, alpha);
expect(s).toBeLessThanOrEqual(1);
expect(s).toBeGreaterThanOrEqual(0);
});
});

describe('buildAuthFilter', () => {
test('blocks cross-tenant by default', () => {
const f = buildAuthFilter({ tenantId: 'A', roles: ['learner'] }, 'A');
expect(JSON.stringify(f)).toContain('"tenantId":"A"');
});
});

3. Integration Tests

Scope

  • Real OpenSearch (via testcontainers).
  • Real Postgres (testcontainers).
  • Real Redis.
  • Mocked ai-gateway (HTTP mock).
  • NATS ephemeral stream (testcontainers).

Fixtures

  • Seed a multi-tenant corpus: 3 tenants × 100 docs across every type.
  • Deterministic embeddings (hash → float vector) so semantic tests are reproducible.
  • 20 golden queries with known expected top-3 per tenant.

Examples

  • Publish a catalog.course_version.published.v1 event → assert document searchable within 2s.
  • Reindex lifecycle: create → running → completed; alias swap verified; old index deleted.
  • DLQ: inject a malformed event → observe DLQ row + alert trigger.
  • Tenant erasure: tenant.user.removed.v1 → user doc absent in search within 30s.

4. Contract Tests

Consumer side (search-service validates it understands producers)

For every subscribed subject, a Pact/JSON-Schema contract is fetched from the source service's schema registry; search-service runs a verifier on each PR.

ProducerSubjects
catalog-servicecourse_version.published.v1, course.deleted.v1, course.visibility_changed.v1
authoring-serviceblock.updated.v1, lesson.updated.v1
marketplace-servicelisting.approved.v1, listing.withdrawn.v1
certification-servicecertificate.issued.v1, certificate.revoked.v1
tenant-servicemembership_activated.v1, membership_deactivated.v1, policy.updated.v1

Producer side (search-service's own events)

Consumers (analytics, sync) register contracts → search-service runs a producer verifier.

5. E2E Tests

Scenario-level, runs against staging.

ScenarioStepsAssertion
Publish course → searchPublish course, wait 3s, GET /search?q=<title>Document appears
Authoring block edit → doc freshnessEdit block body, wait 3s, semantic queryUpdated body reflected
Tenant removalRemove membership, wait 60sUser doc gone
Hybrid fallbackKill pgvector, run hybrid queryLexical-only result, degraded=true
Recommendation cacheRequest recs twice in 5sSecond served from cache (trace span cache.hit=true)
Cross-tenant negativeIssue query with X-Tenant-Id ≠ JWT tid403
ReindexStart reindex, wait until completed, querySame results, no gaps
OfflineFlight mode → query local trieOffline banner, results from cache

Framework: Playwright (for app-layer) + custom Node runner for API-only flows.

6. Sync E2E

ScenarioExpected
Airplane mode while browsingOffline doc set + cached recommendations shown
Reconnect with queued feedbackFeedback flushed, no duplicates (idempotent by localId)
LogoutOffline store wiped
Switch devicePulls fresh bundle within 30s
Tenant removedBundle invalidated via sync.bundle.revoked.v1

7. AI / Quality Evaluation

Not a CI gate — nightly pipeline.

7.1 Ranking eval

  • Golden judgments: 500 (q, relevantDocIds) tuples per tenant segment.
  • Compute NDCG@10, MRR, Recall@20.
  • Regression threshold: ≤ -1% vs baseline → block promotion.

7.2 L2R training

  • Weekly cadence.
  • Offline job reads from analytics-service; trains LambdaMART; runs eval; if NDCG@10 > baseline + 0.5%, promote artifact.

7.3 PII leakage audit

  • Monthly sample of 500 visibility=public documents.
  • Presidio scan + manual review.
  • Target: zero leaks.

8. Performance Tests

Load profile

ScenarioLoadDurationTarget
Steady state500 RPS mixed30mp95 ≤ 250ms
Spike2000 RPS sustained5mp99 ≤ 800ms
Embedding burstbulk index 50k docs-≤ 10 min
Reindex 1M docs--≤ 1h

Tool: k6 + custom JS scripts. Runs nightly on staging.

9. Chaos / Failure Injection

Monthly chaos drill (see FAILURE_MODES.md for scenarios):

  • Kill OpenSearch primary.
  • Disconnect pgvector.
  • Inject 50% network loss to ai-gateway.
  • Poison NATS message.
  • Tenant erasure under load.

Pass criteria: degraded-but-available, no cross-tenant leaks, recovery within 5m of restoration.

10. Fuzzing

q parameter + filter DSL fuzzed via libfuzzer wrapper. Goals:

  • No 5xx.
  • No OpenSearch DSL injection.
  • No request budget overrun (always < 5s).

Runs 1h/day in CI.

11. Coverage Enforcement

  • CI fails if line coverage < 80%, branch < 75%, ranking-pkg line < 90%.
  • PR comment posts delta with Codecov.
  • npm run coverage:gate verified locally.

12. Test Data

Synthetic generator in test/fixtures/generator.ts creates deterministic corpora. No real tenant data in CI. Staging uses anonymized snapshot.

13. TDD Workflow (mandatory)

  1. Write red test.
  2. Minimum green impl.
  3. Refactor; re-run.
  4. Every new domain invariant = new unit test.
  5. Every new endpoint = contract + happy path + at least 2 negative paths.

14. Flaky Test Policy

  • Any test flaking > 1% over 14d → quarantined + ticket filed → must be fixed within 2 sprints or deleted.
  • Integration tests use deterministic embedding stubs to prevent floating-point flakiness.