Skip to main content

SERVICE_READINESS — bff-consumer-service

Sibling: SERVICE_OVERVIEW · SERVICE_RISK_REGISTER · DEPLOYMENT_TOPOLOGY · TESTING_STRATEGY

Cross-cutting: Standards · DEFINITION_OF_DONE · Standards · SERVICE_TEMPLATE §Service-readiness gate

This is the production-readiness gate for bff-consumer-service. Every checkbox must be green before the service receives prod traffic. The gate is owned jointly by the Frontend Platform tech lead and SRE on-call. A signed copy of this checklist is filed in services/bff-consumer-service/_readiness/<release>.md for each promotion.

1. Documentation completeness

  • All 17 specs in this folder are complete (no TBD, no stubs).
  • 03-microservices/bff-consumer-service.md is up to date.
  • OpenAPI spec generated from controllers and committed (services/bff-consumer-service/openapi.json).
  • Event schemas registered in @ghasi/event-envelope/schemas/bff-consumer/ and CI conformance test green.
  • All ADRs that affect this BFF are linked from SERVICE_OVERVIEW.

2. Code quality

  • pnpm lint clean (ESLint strict, import-boundary rules, security plugin).
  • pnpm typecheck clean (tsc --noEmit, strict).
  • No any outside an explicit // allow-any justification.
  • No as unknown as outside test code.
  • No // eslint-disable-next-line outside an issue-linked comment.

3. Test coverage

  • Unit coverage ≥ 90% statements / 85% branches.
  • Critical-file coverage 100% (HandoffSigner, BotDetector, all orchestrators).
  • Integration tests pass against ephemeral Postgres + Memorystore.
  • Mandatory tests pass: there is no tenant-isolation.spec.ts for this BFF (no tenant-scoped writes); the equivalent gate is anonymous-isolation.spec.ts (no cross-session bleed).
  • outbox.spec.ts proves outbox ⇄ Pub/Sub at-least-once delivery.
  • inbox.spec.ts proves cache-invalidation events deduped by message ID.
  • Pact consumer pacts published; provider verification reports green for: search-aggregation-service, pricing-service, property-service, theme-config-service, tenant-service.
  • Pact provider pact verified for bff-tenant-booking-service's consumer.
  • Stryker mutation score ≥ 75% on critical files.
  • Playwright E2E nightly green on stage for the four flows in TESTING_STRATEGY §6.

4. Performance

  • k6 load steady-state profile passes (p95 < 700 ms; error < 0.1%).
  • k6 load campaign spike profile passes (p95 < 900 ms warm; cache hit > 90%).
  • k6 load bot wave profile passes (legitimate p95 unaffected; bot rejection > 95%).
  • Long-soak passes 8 h (no memory growth > 10%).
  • /handoff p95 < 120 ms confirmed in stage.

5. Observability

  • All SLIs defined in OBSERVABILITY.md are emitting.
  • All SLOs declared in error-budget-policy doc.
  • Dashboards published in Cloud Monitoring + Grafana.
  • All alerts have ack'd runbooks linked from FAILURE_MODES.md.
  • Trace-tag coverage verified: tenant.id (always null on this BFF), request.id, session.id, route.name, cache.outcome, bot.score, handoff.id, upstream.name on every span.
  • Log fields verified: traceId, requestId, sessionId, route, latencyMs.
  • PII filter verified: no raw IP, raw UA, email, name in logs.

6. Security

  • Threat model reviewed (SECURITY_MODEL.md §Threat model).
  • All secrets in Secret Manager; none in env vars or repo.
  • HMAC key rotation drill executed in stage in last 90 days; previous-key window honored.
  • Cloud Armor WAF policy active with bot-management rules.
  • reCAPTCHA Enterprise integration verified end-to-end.
  • DAST report has zero high/critical findings.
  • Dependency audit clean (pnpm audit no high/critical).
  • Trivy image scan clean (no high/critical CVEs).
  • Cosign signature verified by binary authorization in prod cluster.
  • Cookie attributes verified: HttpOnly; Secure; SameSite=Lax.
  • CORS allow-list verified for prod consumer-web origin.
  • Penetration test signed off by security-reviewer in last 12 months.

7. Reliability

  • Cloud Run min instances = 2 per region.
  • Multi-region: primary asia-south1, DR-warm europe-west4.
  • DR drill executed in stage in last 90 days; RTO ≤ 30 min met.
  • Circuit breakers configured for every upstream.
  • Per-route deadline + retry policy reviewed against 02 §10.
  • Memorystore HA enabled with standby replica.
  • Cloud SQL HA enabled with cross-region read replica.

8. Release process

  • CI pipeline includes: lint, typecheck, unit, integration, contract, build, scan, sign, deploy-dev, smoke.
  • Canary deploy to prod: 5% / 25% / 100% with metric guardrail.
  • Rollback budget verified: ≤ 5 min from rollback decision to traffic shifted off.
  • Feature flags for new endpoints documented; default off.
  • Release notes drafted and reviewed.

9. Operations

  • On-call rotation assigned (Frontend Platform).
  • PagerDuty escalation policy verified.
  • Runbooks present and rehearsed for: F-1, F-9, F-11, F-13, F-18, F-19 (per FAILURE_MODES catalogue).
  • Cost dashboard published; budget alerts at 50/80/100/120%.
  • On-call handoff doc points to this folder.
  • Backup + restore tested for Cloud SQL.

10. Compliance / data governance

  • PII inventory in SECURITY_MODEL.md reviewed by data steward.
  • DPIA filed for anonymous tracking + reCAPTCHA.
  • Cookie consent flow integrated with consumer web (no telemetry until consent in EU).
  • Data retention enforced: MetaPageView 90 d, ConversionFunnelEvent 90 d, BotScore 7 d, handoff_replay_log 30 d.

11. Sign-off

RoleNameDateSignature
Service tech lead (Frontend Platform)
SRE on-call (rotating)
Security reviewer
Data steward
Eng manager / Director

A snapshot of this completed checklist is committed to services/bff-consumer-service/_readiness/<release-tag>.md at promotion time.