Skip to main content

SERVICE_RISK_REGISTER — bff-tenant-booking-service

Sibling: SERVICE_READINESS · FAILURE_MODES · SECURITY_MODEL

Living register of known risks. Quarterly review by Frontend Platform tech lead + SRE + security.

Severity: L low, M medium, H high, C critical. Status: open, monitored, accepted, closed.

1. Strategic risks

IDRiskL×IMitigationOwnerStatusReview
R-S-1Tenant flash sale creates 30× spike that overwhelms pricing-service quote endpointM×HPer-tenant rate limits; flashSale mode auto-scale to 60 instances; pre-warm bootstrap cache; capacity test required for every announced saleFE Platform + Pricingmonitoredper sale
R-S-2Custom-domain support sprawl (50+ tenant domains) creates operational toilH×MSelf-serve DNS-01 cert provisioning; automated DNS verification probe; tenant-managed-domain SLAs documentedSREopenquarterly
R-S-3Tenant brand violation via injected theme content (e.g., XSS in brand name)M×HTheme content sanitized at theme-config-service; CSP nonce; allow-list of HTML tags in policy summariesSecurity + FEmonitoredquarterly
R-S-4Sharia-compliance regression in pricing/AI passthroughL×HcomplianceProfile propagation tested; AI suggestions filtered (AI §12); audit log on every quoteCompliance + FEmonitoredquarterly
R-S-5Phase 2 guest sign-in introduces account-takeover surfaceM×HPhase 2 design lives in _future/; security review with iam-service team before scaffolding; MFA requiredSecurityopenannual

2. Performance & reliability risks

IDRiskL×IMitigationOwnerStatusReview
R-P-1Slowest-of-N composition on /availability makes p95 unbounded if pricing-service slowH×MPer-call deadlines; partial result composer; priceFromCheapest=null fallback; SLO tracked separatelyFE Platformmonitoredquarterly
R-P-2Memorystore session-tier eviction during traffic spike loses booking draftsL×Hnoeviction policy on session tier; alarm on memory utilization; auto-scale to 6 GiB during flashSaleSREmonitoredper sale
R-P-3reservation-service confirm latency drift causes /return timeoutsM×HConfirm SLO tracked; idempotency keys absorb retries; UI polling as fallbackFE Platform + Reservationmonitoredquarterly
R-P-4payment-gateway-service ambiguous-return cases create stranded reservationsM×HPolling fallback in UI for 60 s; manual reconciliation runbook; monthly metric review of ambiguous-return rateFE Platform + Paymentmonitoredmonthly
R-P-5Cloud SQL HA failover > 60 s during peakL×MDR drill verifies; idem keys absorb; trended monthlySREacceptedannual
R-P-6Outbox grows unbounded if Pub/Sub down for hoursL×MAlert at 5k/50k/250k; manual flush script; outbox-relay redriveSREmonitoredquarterly

3. Security risks

IDRiskL×IMitigationOwnerStatusReview
R-Sec-1HMAC handoff key compromise enables tenant impersonationL×CSecret Manager restricted access; rotation 90 d with 7-d overlap; verifier logs all attempts; rotation drill quarterlySecurity + FEmonitoredquarterly
R-Sec-2Cross-tenant cache key collision leaks dataL×CTenant-scoped cache keys; tenant-isolation.spec.ts; nightly cross-tenant probeFE Platformmonitoredquarterly
R-Sec-3Payment-return tampering (forged returnState)L×CServer-side verifyReturn mandatory; provider-state opaque; payment_return_invalid alertSecurity + Paymentmonitoredquarterly
R-Sec-4Booking-flow scraper undercuts pricingM×MPer-IP / per-session rate limits; reCAPTCHA on anomalies; behavioural anomaly scoreSecurity + FEopenquarterly
R-Sec-5Cookie hijack via subdomain takeoverL×HSubdomain-scoped cookie; CAA records; SNI cert per tenant; periodic DNS auditSecuritymonitoredquarterly
R-Sec-6PII leak via free-text specialRequests field carrying NID/passportM×MInput length cap (500 chars); pre-storage redaction heuristic; truncated to 200 chars in cold mirror; periodic synthetic PII probeSecurity + Datamonitoredquarterly
R-Sec-7Custom-domain MITM via misconfigured DNSL×HDNS-01 challenge required; CAA records advised; tenant onboarding playbookSecurityopenannual

4. Compliance & data risks

IDRiskL×IMitigationOwnerStatusReview
R-C-1EU traffic without consent bannerL×HApp defers telemetry until consent; BFF respects X-Consent: declined; DPIA on fileLegal + FEmonitoredannual
R-C-2DSR for guest email returns inconsistent results across servicesM×MTenant-orchestrated DSR via tenant-service; this BFF documented as snapshot-only sourceData Stewardmonitoredquarterly
R-C-3Data residency for EU users (Memorystore in asia-south1)M×MPhase 2 region-affinity routing; current scope: short-lived booking-time PII; legal review confirmed acceptableLegalacceptedannual
R-C-4Booking confirmation email contains PII; downstream notification-service audit neededL×MNotification service has its own retention policy; this BFF only emits event, not emailNotification + Legalmonitoredannual

5. Operational risks

IDRiskL×IMitigationOwnerStatusReview
R-O-1Bus-factor 1 on the booking saga orchestration codeM×MPair-on-call; runbook completeness review; rotating reviewerEng Managermonitoredannual
R-O-2Schema drift from upstream service released without contract testL×HPact verification gate; OpenAPI diff gate; schema-drift alertPlatform Engmonitoredquarterly
R-O-3On-call burn-out from custom-domain DNS failuresM×MTenant-self-serve dashboard; tenant-specific alerts batchedSREmonitoredquarterly
R-O-4Misconfigured per-tenant rate limit blocks legitimate flash-sale trafficL×MTenant-specific rate-limit overrides via flag; pre-sale capacity auditSRE + FEmonitoredper sale

6. Cost risks

IDRiskL×IMitigationOwnerStatusReview
R-Cost-1Cloud CDN cost inflation from non-cacheable variantsM×MVary header strict; query-param normalization; quarterly reviewFE Platformmonitoredquarterly
R-Cost-2Pub/Sub volume from telemetry exceeds budgetM×MSample rates per event; cost alarm at 120%SREmonitoredquarterly
R-Cost-3Cert Manager cost growth with custom-domain countL×LTrack per-cert cost; bundle by tenant tierFE Platformacceptedannual
R-Cost-4High-volume guest abandonment increases telemetry without revenueM×LFunnel rate review; sampling-rate calibration on flow.step_completed.v1SRE + Productmonitoredquarterly

7. Risk acceptance log

IDDate acceptedAccepted byReasonRe-evaluation
R-P-52026-04-15SREHA failover within SLA on every drill in last 12 months2027-04-15
R-C-32026-04-15LegalBooking-time PII short-lived; Phase 2 will introduce region affinity2027-04-15
R-Cost-32026-04-15FE PlatformCert Manager cost negligible vs revenue contribution2027-04-15

8. Review cadence

  • Quarterly: FE Platform tech lead + SRE on-call + security reviewer.
  • Per major release: any touched row re-rated.
  • Per incident: post-mortem owners audit register.