Skip to main content

SERVICE_RISK_REGISTER — bff-consumer-service

Sibling: SERVICE_READINESS · FAILURE_MODES · SECURITY_MODEL

Living register of known risks. Each row carries an owner, a current likelihood × impact rating, mitigation status, and a review cadence. The register is reviewed quarterly by the Frontend Platform tech lead with SRE and security reviewers.

Severity scale (likelihood × impact):

  • L = low
  • M = medium
  • H = high
  • C = critical (production-stopping)

Status: open (active mitigation), monitored (mitigated but watched), accepted (residual risk acknowledged), closed (no longer applicable).

1. Strategic risks

IDRiskLikelihoodImpactMitigationOwnerStatusReview
R-S-1Marketing campaign creates 50× traffic spike (vs. 10× planned)MHCloud Armor rate-limit ratchet runbook; pre-warm autoscale to 80 instances; campaignMode flag toggled by ops; pre-event capacity test required for every major campaignFE Platformmonitoredper campaign
R-S-2A bot operator scrapes the entire cross-tenant catalog and undercuts the meta layerHMBot detector with multi-signal scoring; reCAPTCHA Enterprise; per-fingerprint rate limit; legal cease-and-desist playbookSecurity + FE Platformopenquarterly
R-S-3Meta-layer search results stale during a tenant's price flash saleMMmelmastoon.search_aggregation.listing.indexed.v1 invalidates hot listing cache; tenant promotional events trigger same-second invalidate; priceFromCheapest always re-fetched; staleness banner shown when fallback usedFE Platform + Pricingmonitoredquarterly
R-S-4Phase 2 authenticated wishlist sync explodes in scope and pollutes anonymous pathMMPhase 2 design lives in _future/; feature flag scoped to authenticated users only; design review with architecture team before scaffolding startsArchitectureopenannual

2. Performance & reliability risks

IDRiskLikelihoodImpactMitigationOwnerStatusReview
R-P-1Single-flight collapses under sufficiently adversarial cache-key skew (e.g., random millisecond filters in query)LHCache key derivation strips cosmetic noise (rounding, ordering); fingerprinted-bot patterns trigger early; alarms on stampede metricFE Platformmonitoredquarterly
R-P-2Memorystore eviction during traffic spike loses sessions and degrades conversionMM5 GiB working set with 30-day TTL; alert on eviction rate; auto-scale to 10 GiB during campaignModeSREmonitoredper campaign
R-P-3Outbox table grows unbounded if Pub/Sub publisher fails for hoursLMAlert at 5k / 50k / 250k row depth; manual flush script; outbox-relay redrive; storage-budget alarmSREmonitoredquarterly
R-P-4Cloud SQL HA failover takes > 60 s and degrades mutating endpointsLLDR drill verifies actual failover time; idempotency keys absorb retries; monitor failover times trended over 12 monthsSREacceptedannual
R-P-5One slow upstream (e.g., pricing-service) drags the slowest-of-N composition latencyHMPer-call deadlines; partial-result composer; priceFromCheapest=null fallback; SLO budget for upstream-attributed latency tracked separatelyFE Platformmonitoredquarterly

3. Security risks

IDRiskLikelihoodImpactMitigationOwnerStatusReview
R-Sec-1HMAC handoff key compromise leaks ability to impersonate handoff into tenant bookingLCKey in Secret Manager with restricted access; rotation every 90 days with 7-day overlap; tenant BFF logs all handoff verifications and DLQs anomalies; rotation drill quarterlySecurity + FE Platformmonitoredquarterly
R-Sec-2Bot detector false-negative — sophisticated bot blends in and harvests pricingMMMulti-signal scoring; behavioural anomalies tracked; manual review of high-volume sessions; legal channel for repeat offendersSecurityopenquarterly
R-Sec-3Cookie hijack via XSS on @ghasi/app-web-metaLHHttpOnly cookie; CSP with strict allow-list; CSRF protection on mutating endpoints via Origin header; periodic XSS audit on the consumer web appSecurity + FEmonitoredquarterly
R-Sec-4Cross-tenant data leak via misconfigured cache keyLCCache keys explicitly include tenant context where applicable (only for BrandPeek); tests assert key isolation; review every new caching adapterFE Platformmonitoredquarterly
R-Sec-5reCAPTCHA leak (secret in client)LMServer-side verification only; site key public by design; secret in Secret ManagerSecurityacceptedannual
R-Sec-6Search query log carries identifying user dataMMSearch query strings hashed before logging; geo coords rounded to ~ 1 km; UA bucketed not stored raw; test asserts no email/phone patterns reach logsSecurity + Datamonitoredquarterly
R-Sec-7DDoS bypasses Cloud Armor via low-volume distributed attackMMPer-fingerprint rate limit; behavioural anomaly detection; on-call playbook for L7 DDoSSRE + Securitymonitoredquarterly

4. Compliance & data risks

IDRiskLikelihoodImpactMitigationOwnerStatusReview
R-C-1EU traffic without consent banner triggers GDPR violationLHConsumer web defers all telemetry until consent given; BFF accepts X-Consent: declined header and skips telemetry; DPIA on fileLegal + FEmonitoredannual
R-C-2Cookie banner blocks bot-detection signal collection and increases FP rateMMBot-detection runs from request signals (UA, IP-bucket, cadence) that don't require cookie consent; CAPTCHA challenge falls back gracefullyFE Platformmonitoredannual
R-C-3Data residency for EU users (Memorystore session in asia-south1)MMRegion-affinity routing planned in Phase 2; current scope: anonymous data only; no PII; legal review confirmed acceptableLegalacceptedannual

5. Operational risks

IDRiskLikelihoodImpactMitigationOwnerStatusReview
R-O-1On-call burn-out due to bot-related noiseMMAlert tuning quarterly; bot-related alerts batched and not individually paging; weekly bot-review reviewSREmonitoredquarterly
R-O-2Schema drift from upstream service released without contract testLHPact provider verification gate; OpenAPI diff gate; nightly schema sync between BFF and upstreamsPlatform Engmonitoredquarterly
R-O-3Loss of a single Frontend Platform engineer creates bus-factor 1 on the orchestratorsLMPair-on-call rota; runbooks complete; quarterly ops review with rotating reviewerEng Managermonitoredannual

6. Cost risks

IDRiskLikelihoodImpactMitigationOwnerStatusReview
R-Cost-1Cloud CDN costs spiral due to non-cacheable parameters in URLsMMQuery parameter normalization; Vary header strict; analytics dashboard tracks cache hit by route; quarterly reviewFE Platformmonitoredquarterly
R-Cost-2Pub/Sub volume from telemetry exceeds budgetMMSample rate per event documented; sampling enforced at outbox enqueue; adjustable per-event from feature flags; cost alarm at 120% of monthly budgetSREmonitoredquarterly
R-Cost-3Excessive Trace export volume during incidentLLTrace sampler caps at 5% steady-state, raised to 100% during incident with timer-bound revertSREmonitoredquarterly

7. Risk acceptance log

IDDate acceptedAccepted byReasonRe-evaluation date
R-Sec-52026-04-15SecurityreCAPTCHA secret stored correctly; site key public by design2027-04-15
R-P-42026-04-15SRECloud SQL HA failover within SLA on every drill in last 12 months2027-04-15
R-C-32026-04-15LegalNo PII in cross-region session blob; Phase 2 will introduce region affinity2027-04-15

8. Review cadence

  • Quarterly: Frontend Platform tech lead + SRE on-call + security reviewer convene; revisit every row marked monitored and open; promote / demote severity; close mitigated rows; capture new risks discovered since last review.
  • Per major release: any risk row touched by the release is re-rated.
  • Per incident: post-mortem owners audit this register and add any new risk surfaced.