SERVICE_READINESS — bff-consumer-service
Sibling: SERVICE_OVERVIEW · SERVICE_RISK_REGISTER · DEPLOYMENT_TOPOLOGY · TESTING_STRATEGY
Cross-cutting: Standards · DEFINITION_OF_DONE · Standards · SERVICE_TEMPLATE §Service-readiness gate
This is the production-readiness gate for bff-consumer-service. Every checkbox must be green before the service receives prod traffic. The gate is owned jointly by the Frontend Platform tech lead and SRE on-call. A signed copy of this checklist is filed in services/bff-consumer-service/_readiness/<release>.md for each promotion.
1. Documentation completeness
- All 17 specs in this folder are complete (no
TBD, no stubs). - 03-microservices/bff-consumer-service.md is up to date.
- OpenAPI spec generated from controllers and committed (
services/bff-consumer-service/openapi.json). - Event schemas registered in
@ghasi/event-envelope/schemas/bff-consumer/and CI conformance test green. - All ADRs that affect this BFF are linked from
SERVICE_OVERVIEW.
2. Code quality
-
pnpm lintclean (ESLint strict, import-boundary rules, security plugin). -
pnpm typecheckclean (tsc --noEmit, strict). - No
anyoutside an explicit// allow-anyjustification. - No
as unknown asoutside test code. - No
// eslint-disable-next-lineoutside an issue-linked comment.
3. Test coverage
- Unit coverage ≥ 90% statements / 85% branches.
- Critical-file coverage 100% (
HandoffSigner,BotDetector, all orchestrators). - Integration tests pass against ephemeral Postgres + Memorystore.
- Mandatory tests pass: there is no
tenant-isolation.spec.tsfor this BFF (no tenant-scoped writes); the equivalent gate isanonymous-isolation.spec.ts(no cross-session bleed). -
outbox.spec.tsproves outbox ⇄ Pub/Sub at-least-once delivery. -
inbox.spec.tsproves cache-invalidation events deduped by message ID. - Pact consumer pacts published; provider verification reports green for:
search-aggregation-service,pricing-service,property-service,theme-config-service,tenant-service. - Pact provider pact verified for
bff-tenant-booking-service's consumer. - Stryker mutation score ≥ 75% on critical files.
- Playwright E2E nightly green on stage for the four flows in TESTING_STRATEGY §6.
4. Performance
- k6 load
steady-stateprofile passes (p95 < 700 ms; error < 0.1%). - k6 load
campaign spikeprofile passes (p95 < 900 ms warm; cache hit > 90%). - k6 load
bot waveprofile passes (legitimate p95 unaffected; bot rejection > 95%). - Long-soak passes 8 h (no memory growth > 10%).
-
/handoffp95 < 120 ms confirmed in stage.
5. Observability
- All SLIs defined in
OBSERVABILITY.mdare emitting. - All SLOs declared in error-budget-policy doc.
- Dashboards published in Cloud Monitoring + Grafana.
- All alerts have ack'd runbooks linked from
FAILURE_MODES.md. - Trace-tag coverage verified:
tenant.id(always null on this BFF),request.id,session.id,route.name,cache.outcome,bot.score,handoff.id,upstream.nameon every span. - Log fields verified:
traceId,requestId,sessionId,route,latencyMs. - PII filter verified: no raw IP, raw UA, email, name in logs.
6. Security
- Threat model reviewed (
SECURITY_MODEL.md§Threat model). - All secrets in Secret Manager; none in env vars or repo.
- HMAC key rotation drill executed in stage in last 90 days; previous-key window honored.
- Cloud Armor WAF policy active with bot-management rules.
- reCAPTCHA Enterprise integration verified end-to-end.
- DAST report has zero high/critical findings.
- Dependency audit clean (
pnpm auditno high/critical). - Trivy image scan clean (no high/critical CVEs).
- Cosign signature verified by binary authorization in prod cluster.
- Cookie attributes verified:
HttpOnly; Secure; SameSite=Lax. - CORS allow-list verified for prod consumer-web origin.
- Penetration test signed off by
security-reviewerin last 12 months.
7. Reliability
- Cloud Run min instances = 2 per region.
- Multi-region: primary
asia-south1, DR-warmeurope-west4. - DR drill executed in stage in last 90 days; RTO ≤ 30 min met.
- Circuit breakers configured for every upstream.
- Per-route deadline + retry policy reviewed against 02 §10.
- Memorystore HA enabled with standby replica.
- Cloud SQL HA enabled with cross-region read replica.
8. Release process
- CI pipeline includes: lint, typecheck, unit, integration, contract, build, scan, sign, deploy-dev, smoke.
- Canary deploy to prod: 5% / 25% / 100% with metric guardrail.
- Rollback budget verified: ≤ 5 min from rollback decision to traffic shifted off.
- Feature flags for new endpoints documented; default off.
- Release notes drafted and reviewed.
9. Operations
- On-call rotation assigned (Frontend Platform).
- PagerDuty escalation policy verified.
- Runbooks present and rehearsed for: F-1, F-9, F-11, F-13, F-18, F-19 (per FAILURE_MODES catalogue).
- Cost dashboard published; budget alerts at 50/80/100/120%.
- On-call handoff doc points to this folder.
- Backup + restore tested for Cloud SQL.
10. Compliance / data governance
- PII inventory in
SECURITY_MODEL.mdreviewed by data steward. - DPIA filed for anonymous tracking + reCAPTCHA.
- Cookie consent flow integrated with consumer web (no telemetry until consent in EU).
- Data retention enforced:
MetaPageView90 d,ConversionFunnelEvent90 d,BotScore7 d,handoff_replay_log30 d.
11. Sign-off
| Role | Name | Date | Signature |
|---|---|---|---|
| Service tech lead (Frontend Platform) | |||
| SRE on-call (rotating) | |||
| Security reviewer | |||
| Data steward | |||
| Eng manager / Director |
A snapshot of this completed checklist is committed to services/bff-consumer-service/_readiness/<release-tag>.md at promotion time.