TESTING_STRATEGY — notification-service
Sibling: APPLICATION_LOGIC · API_CONTRACTS · EVENT_SCHEMAS · SECURITY_MODEL · LOCAL_DEV_SETUP
Strategic anchors: 02 Enterprise Architecture §13 Testing · standards/SERVICE_TEMPLATE
We follow the platform's test-pyramid plus contract testing on both sides of every wire. Coverage gates and CI gates are enforced per the platform CI configuration; a PR cannot merge if any gate is red.
1. Pyramid
| Level | Approx. count | Runtime | What it covers |
|---|---|---|---|
| Unit (domain) | 600+ tests | <30 s | aggregates, value objects, state machines, invariants, domain services |
| Unit (application) | 300+ tests | <60 s | use cases with in-memory ports |
| Contract (HTTP) | one per route + per error code | <30 s | OpenAPI ↔ implementation |
| Contract (events) | one per published + per consumed subject | <30 s | JSON Schema ↔ payload |
| Contract (webhooks) | one per vendor with positive + negative samples | <30 s | HMAC + parser |
| Integration (slice) | 80+ tests | <3 min | one use case end-to-end against real Postgres + Redis (Testcontainers) |
| Integration (event flow) | 30+ tests | <5 min | consume → enqueue → render → outbox → publish, with NATS/Pub/Sub emulator |
| End-to-end (slice) | 10–20 tests | <10 min | through bff-backoffice-service with stub vendors |
| Performance | nightly | <30 min | enqueue throughput, dispatch concurrency, webhook ingestion |
| Chaos | weekly | varies | vendor outage, DB failover, Pub/Sub backpressure |
| Security | per-PR + nightly | varies | static + dynamic + dependency scans |
Coverage targets: lines ≥ 85 %, branches ≥ 80 %; domain layer ≥ 95 %; use cases ≥ 90 %.
2. Frameworks and conventions
- Vitest for unit/application/contract tests (fast, ESM, native TS).
- Testcontainers (Node) for Postgres 16, Memorystore-compatible Redis (
redis:7-alpine), Pub/Sub emulator, GCS emulator (fsouza/fake-gcs-server). - MSW for HTTP vendor stubs (SendGrid, Twilio, etc.) at the unit/integration boundary.
- Pact (or our internal
event-pacttool) for consumer/provider event contract tests with the platform broker. - k6 for performance and load.
- gremlin / litmus for chaos.
- Playwright for the BFF-driven E2E slices (run in
melmastoonE2E suite).
Tests live under tests/ mirroring src/:
tests/
unit/
domain/
application/
contracts/
http/
events/published/
events/consumed/
webhooks/
integration/
use-cases/
flows/
e2e/
performance/
chaos/
fixtures/
helpers/
3. Domain unit tests (highlights)
- Every state machine: every transition asserted positively and negatively (
IllegalStateTransitionError). - Every invariant: at least one positive (passes) and one negative (throws) test.
PreferenceGate.evaluate(...): matrix of (channel × category × consent × suppression × quietHours × locale-fallback) with golden expectations.TemplateRenderer: golden file tests per renderer profile (mjml-handlebars-1,text-handlebars-1,whatsapp-handlebars-1,inapp-handlebars-1); RTL snapshot for Arabic and Pashto; bidi-marker presence assertions; XSS-injection inputs are sanitised.RateLimiter: token-bucket math; clock injection; window rollovers.LocalisedFormatter: dates/numbers/currency per locale; DST and Hijri-compatible rendering for Pashto/Persian/Urdu.Sendervalidators: PK-PTA registration check; DKIM-required-on-EnqueueNotification check.
4. Application unit tests
- Each use case has a
*.spec.tsfile with in-memory implementations of all ports defined in APPLICATION_LOGIC §1. - Idempotency: replay same request → same response; replay with different body → 409.
- Outbox semantics: every domain event added to outbox in the same transaction as the aggregate change; assert via in-memory transactional repo.
ApplyConsumedEventUseCasematrix: for each consumed subject, given a fixture event, the resulting set ofEnqueueNotificationUseCaseinvocations matches the trigger map.- HITL gate:
PublishTemplateVersionUseCaserejects withoutapproverUserIdwhensource='ai_drafted'. - AI fallback: when
AIClient.fetchAIDraftedContentthrows, the deterministic render is used and a metric is incremented (assert via fake recorder).
5. Contract tests
5.1 HTTP / OpenAPI
openapi.v1.yaml is the source of truth. The contract test:
- Boots NestJS in
testmode with stubbed deps. - Iterates every operation in the spec.
- Sends representative requests (positive + each documented error code).
- Asserts response shape against the spec via
@apidevtools/swagger-cli+ajv.
A separate test (openapi-codegen-drift.test.ts) regenerates types from the spec and asserts identity with the committed types — drift fails CI.
5.2 Published events
Every subject in EVENT_SCHEMAS §3 has a JSON Schema in event-schemas/.... The contract test:
- Loads our published-event factory.
- Generates 100 representative payloads (boundary + property-based via
fast-check). - Validates each against the JSON Schema.
- Sends to a local broker; downstream consumer test fixtures in
tests/contracts/events/published/<subject>.consumer.fixture.tsexpress the expected interpretation; an internalevent-pactrunner asserts.
5.3 Consumed events
For each subject in EVENT_SCHEMAS §5:
- Load the upstream service's published JSON Schema (vendored in
event-schemas/). - Parse with our tolerant zod schema.
- Assert action set (e.g., for
reservation.confirmed.v1→ expected list ofEnqueueNotificationUseCasecalls).
5.4 Vendor webhook contracts
Per vendor in tests/contracts/webhooks/<vendor>/:
valid-signature.test.ts: real-world body/headers fixtures pass HMAC.invalid-signature.test.ts: tampered/missing header → 401.replay.test.ts: same body twice → second produces no additional state change.events-parsing.test.ts: vendor's enum values map to our internal types.headers-skew.test.ts: > 5 min skew → reject.
Vendor sample fixtures are anonymised real captures (license-clean) committed under tests/fixtures/webhooks/<vendor>/.
6. Integration tests (Testcontainers)
tests/integration/ boots real Postgres/Redis/Pub/Sub-emulator/GCS-emulator per file. Two families:
6.1 Use-case slices
Each use case has at least one slice that:
- Migrates a fresh DB schema.
- Sets
app.tenant_idto a fixture tenant. - Inserts seed projections (tenants_local, recipients, templates).
- Executes the use case via the application layer.
- Asserts DB state, outbox rows, Pub/Sub-emulator messages, and Redis side-effects.
- Re-runs to assert idempotency.
6.2 Flow tests
End-to-end domain flows:
- Booking-confirmation flow: publish
reservation.confirmed.v1→ consume → enqueueemail+sms+whatsappper preference → render → dispatch throughMockEmailPortetc. → simulate vendor-webhook callback → assertdelivered.v1and DB state. - Mobile-key delivery flow:
lock_integration.key_credential.issued.v1→mobile_key.issued.whatsappwith token reference → simulated WhatsApp accepted →delivered.v1. - Dunning flow:
billing.subscription.payment_failed.v1→ 3 scheduled rows → tick scheduler → 3 dispatched. - AI-drafted template publish flow: emit
ai.draft_content.ready.v1→ register draft → publish without approver → 403 → publish with approver →template.published.v1. - Suggest-only flow: tenant policy
suggest_only→ enqueue createsschedulednotification + admin in-app review → admin approves → dispatched. - Webhook bounce → suppression flow: mock SendGrid bounce webhook → dedupe → suppression row +
bounced.v1+suppressed.v1. - Quiet-hours deferral flow: enqueue at 23:30 Asia/Kabul with quiet-hours 22–07 →
scheduledfor 07:00 → scheduler tick at 07:00 →dispatched. - Cancellation propagation flow: scheduled pre-arrival reminder + reservation.cancelled.v1 → row marked obsolete → no send.
6.3 RLS tests
A dedicated suite asserts:
- Cross-tenant SELECT returns zero rows even with crafted SQL.
- The app role cannot
BYPASSRLS(assert by attemptingSET ROLE). - Drizzle middleware sets
app.tenant_idper transaction; without it, queries return zero rows.
7. End-to-end through BFF
E2E slices live in the melmastoon-e2e suite (separate repo path):
- Staff logs into backoffice → opens reservation
rsv_*→ clicks "Resend confirmation" → expects a newnotificationto appear in the audit panel within 5 s, status reachingdelivered(vendor stub returns delivered immediately). - Staff publishes a tenant template override → preview rendered → test-send delivered to staff's own email → audit row.
- Guest receives an opt-out URL in the synthetic email → opens link → confirms → returns to app → marketing toggle is off; future marketing send is suppressed.
- Marketing manager creates a marketing batch with 200-row segment → schedule for T+5 min → wait → batch completes; per-row delivery audit visible.
8. Performance tests (k6)
| Scenario | Target |
|---|---|
| Enqueue burst | 2 000 req/s for 5 min, single tenant; p95 ≤ 350 ms; zero 5xx |
| Enqueue sustained | 500 req/s for 30 min, 50 tenants; CPU ≤ 60 %; p95 ≤ 250 ms |
| Dispatch throughput | 5 000 msgs/min/channel against vendor stub; queue drains within 60 s |
| Webhook ingestion | 3 000 req/s/vendor for 5 min; p95 ≤ 120 ms; zero loss |
| WS feed | 10 000 concurrent connections per region; 50 events/s push; p95 push latency ≤ 150 ms |
Performance gates run nightly on staging with a synthetic dataset; regressions > 15 % fail the build the next morning.
9. Chaos tests (weekly)
| Experiment | Expected behaviour |
|---|---|
| Kill primary Cloud SQL → forced failover | enqueue requests retry through Cloud SQL connector reconnect; backlog drains within 5 min |
| Pub/Sub topic 503 for 60 s | outbox relay backs off; dispatch worker keeps running; on recovery, no duplicate publishes (event id stable) |
| Vendor (SendGrid) returns 503 for 5 min | dispatch worker retries with backoff; channel health flips to degraded then down; fallback vendor takes over; alert fires |
| Vendor returns 4xx for invalid recipient at 50 % rate | suppressions added for those addresses; main funnel unaffected |
| Memorystore failover | brief enqueue latency spike; suppression check falls through to DB; no incorrect sends |
| Webhook flood (100k req/min) | Cloud Armor throttles; webhook_inbound write rate caps; no DB OOM |
| Clock skew on a worker (+10 min) | rate-limit windows still consistent (DB-side counters); affected pod self-detects + alerts |
Each chaos experiment has a runbook entry in FAILURE_MODES.
10. Security testing
- SAST:
eslint-plugin-security,semgreprules for Node + secrets pattern. - Dependency scan:
npm audit,osv-scanner, Snyk; CVEs ≥ high block release. - Container scan: Trivy on every image; CVEs ≥ high block release.
- DAST: ZAP baseline against staging weekly; targeted scans on the public webhook + opt-out endpoints.
- Secret scan:
gitleakspre-commit + nightly history scan. - Pentest: annual third-party (see SECURITY_MODEL §14).
- AI-specific: prompt-injection corpus (~200 cases) run nightly against
AIClient.fetchAIDraftedContentwith the orchestrator stub configured to forward to a real model under a low-cost canary tenant.
11. Test data
- Synthetic data factory
@melmastoon/test-fixtures-notificationproduces deterministic recipients, templates, channels, and scheduled rows. - No production data is ever copied into pre-prod environments. A pre-prod refresh seeds from the synthetic factory.
- Personal devices used for "real" channel testing belong to platform staff and are listed in an allowlist; sends to these are categorised
system.
12. CI gates
| Gate | Required |
|---|---|
| Lint + format | ✅ |
| Unit + application | ✅ |
| Contract (HTTP, events, webhooks) | ✅ |
| Integration (use-case slices) | ✅ |
| Coverage thresholds (85 % lines, 80 % branches; domain ≥ 95 %) | ✅ |
| OpenAPI drift | ✅ |
| Event-schema drift | ✅ |
| Migrations dry-run on staging snapshot | ✅ |
pii-grep scan | ✅ |
| SAST + secret scan + dependency scan | ✅ |
| Container scan | ✅ |
| E2E (booking-confirmation, opt-out, template publish) | ✅ on release/* and main |
| Performance gates | ✅ on release/* |
13. Production verification
Post-deploy smoke (run automatically by cloud-deploy after a rollout):
GET /api/v1/internal/healthreturns200 okfrom a fresh pod.- Send a synthetic notification (test tenant) end-to-end; expect
delivered.v1within 60 s. - Send a synthetic webhook (test vendor + signed payload); expect
applied. - Outbox lag p95 ≤ 1 s for 5 min.
If any check fails, the rollout is automatically rolled back per DEPLOYMENT_TOPOLOGY §6.