SERVICE_RISK_REGISTER — bff-consumer-service

Sibling: SERVICE_READINESS · FAILURE_MODES · SECURITY_MODEL

Living register of known risks. Each row carries an owner, a current likelihood × impact rating, mitigation status, and a review cadence. The register is reviewed quarterly by the Frontend Platform tech lead with SRE and security reviewers.

Severity scale (likelihood × impact):

L = low
M = medium
H = high
C = critical (production-stopping)

Status: open (active mitigation), monitored (mitigated but watched), accepted (residual risk acknowledged), closed (no longer applicable).

1. Strategic risks

ID	Risk	Likelihood	Impact	Mitigation	Owner	Status	Review
R-S-1	Marketing campaign creates 50× traffic spike (vs. 10× planned)	M	H	Cloud Armor rate-limit ratchet runbook; pre-warm autoscale to 80 instances; `campaignMode` flag toggled by ops; pre-event capacity test required for every major campaign	FE Platform	monitored	per campaign
R-S-2	A bot operator scrapes the entire cross-tenant catalog and undercuts the meta layer	H	M	Bot detector with multi-signal scoring; reCAPTCHA Enterprise; per-fingerprint rate limit; legal cease-and-desist playbook	Security + FE Platform	open	quarterly
R-S-3	Meta-layer search results stale during a tenant's price flash sale	M	M	`melmastoon.search_aggregation.listing.indexed.v1` invalidates hot listing cache; tenant promotional events trigger same-second invalidate; `priceFromCheapest` always re-fetched; staleness banner shown when fallback used	FE Platform + Pricing	monitored	quarterly
R-S-4	Phase 2 authenticated wishlist sync explodes in scope and pollutes anonymous path	M	M	Phase 2 design lives in `_future/`; feature flag scoped to authenticated users only; design review with architecture team before scaffolding starts	Architecture	open	annual

2. Performance & reliability risks

ID	Risk	Likelihood	Impact	Mitigation	Owner	Status	Review
R-P-1	Single-flight collapses under sufficiently adversarial cache-key skew (e.g., random millisecond filters in query)	L	H	Cache key derivation strips cosmetic noise (rounding, ordering); fingerprinted-bot patterns trigger early; alarms on stampede metric	FE Platform	monitored	quarterly
R-P-2	Memorystore eviction during traffic spike loses sessions and degrades conversion	M	M	5 GiB working set with 30-day TTL; alert on eviction rate; auto-scale to 10 GiB during `campaignMode`	SRE	monitored	per campaign
R-P-3	Outbox table grows unbounded if Pub/Sub publisher fails for hours	L	M	Alert at 5k / 50k / 250k row depth; manual flush script; outbox-relay redrive; storage-budget alarm	SRE	monitored	quarterly
R-P-4	Cloud SQL HA failover takes > 60 s and degrades mutating endpoints	L	L	DR drill verifies actual failover time; idempotency keys absorb retries; monitor failover times trended over 12 months	SRE	accepted	annual
R-P-5	One slow upstream (e.g., `pricing-service`) drags the slowest-of-N composition latency	H	M	Per-call deadlines; partial-result composer; `priceFromCheapest=null` fallback; SLO budget for upstream-attributed latency tracked separately	FE Platform	monitored	quarterly

3. Security risks

ID	Risk	Likelihood	Impact	Mitigation	Owner	Status	Review
R-Sec-1	HMAC handoff key compromise leaks ability to impersonate handoff into tenant booking	L	C	Key in Secret Manager with restricted access; rotation every 90 days with 7-day overlap; tenant BFF logs all handoff verifications and DLQs anomalies; rotation drill quarterly	Security + FE Platform	monitored	quarterly
R-Sec-2	Bot detector false-negative — sophisticated bot blends in and harvests pricing	M	M	Multi-signal scoring; behavioural anomalies tracked; manual review of high-volume sessions; legal channel for repeat offenders	Security	open	quarterly
R-Sec-3	Cookie hijack via XSS on `@ghasi/app-web-meta`	L	H	`HttpOnly` cookie; CSP with strict allow-list; CSRF protection on mutating endpoints via Origin header; periodic XSS audit on the consumer web app	Security + FE	monitored	quarterly
R-Sec-4	Cross-tenant data leak via misconfigured cache key	L	C	Cache keys explicitly include tenant context where applicable (only for `BrandPeek`); tests assert key isolation; review every new caching adapter	FE Platform	monitored	quarterly
R-Sec-5	reCAPTCHA leak (secret in client)	L	M	Server-side verification only; site key public by design; secret in Secret Manager	Security	accepted	annual
R-Sec-6	Search query log carries identifying user data	M	M	Search query strings hashed before logging; geo coords rounded to ~ 1 km; UA bucketed not stored raw; test asserts no email/phone patterns reach logs	Security + Data	monitored	quarterly
R-Sec-7	DDoS bypasses Cloud Armor via low-volume distributed attack	M	M	Per-fingerprint rate limit; behavioural anomaly detection; on-call playbook for L7 DDoS	SRE + Security	monitored	quarterly

4. Compliance & data risks

ID	Risk	Likelihood	Impact	Mitigation	Owner	Status	Review
R-C-1	EU traffic without consent banner triggers GDPR violation	L	H	Consumer web defers all telemetry until consent given; BFF accepts `X-Consent: declined` header and skips telemetry; DPIA on file	Legal + FE	monitored	annual
R-C-2	Cookie banner blocks bot-detection signal collection and increases FP rate	M	M	Bot-detection runs from request signals (UA, IP-bucket, cadence) that don't require cookie consent; CAPTCHA challenge falls back gracefully	FE Platform	monitored	annual
R-C-3	Data residency for EU users (Memorystore session in `asia-south1`)	M	M	Region-affinity routing planned in Phase 2; current scope: anonymous data only; no PII; legal review confirmed acceptable	Legal	accepted	annual

5. Operational risks

ID	Risk	Likelihood	Impact	Mitigation	Owner	Status	Review
R-O-1	On-call burn-out due to bot-related noise	M	M	Alert tuning quarterly; bot-related alerts batched and not individually paging; weekly bot-review review	SRE	monitored	quarterly
R-O-2	Schema drift from upstream service released without contract test	L	H	Pact provider verification gate; OpenAPI diff gate; nightly schema sync between BFF and upstreams	Platform Eng	monitored	quarterly
R-O-3	Loss of a single Frontend Platform engineer creates bus-factor 1 on the orchestrators	L	M	Pair-on-call rota; runbooks complete; quarterly ops review with rotating reviewer	Eng Manager	monitored	annual

6. Cost risks

ID	Risk	Likelihood	Impact	Mitigation	Owner	Status	Review
R-Cost-1	Cloud CDN costs spiral due to non-cacheable parameters in URLs	M	M	Query parameter normalization; Vary header strict; analytics dashboard tracks cache hit by route; quarterly review	FE Platform	monitored	quarterly
R-Cost-2	Pub/Sub volume from telemetry exceeds budget	M	M	Sample rate per event documented; sampling enforced at outbox enqueue; adjustable per-event from feature flags; cost alarm at 120% of monthly budget	SRE	monitored	quarterly
R-Cost-3	Excessive Trace export volume during incident	L	L	Trace sampler caps at 5% steady-state, raised to 100% during incident with timer-bound revert	SRE	monitored	quarterly

7. Risk acceptance log

ID	Date accepted	Accepted by	Reason	Re-evaluation date
R-Sec-5	2026-04-15	Security	reCAPTCHA secret stored correctly; site key public by design	2027-04-15
R-P-4	2026-04-15	SRE	Cloud SQL HA failover within SLA on every drill in last 12 months	2027-04-15
R-C-3	2026-04-15	Legal	No PII in cross-region session blob; Phase 2 will introduce region affinity	2027-04-15

8. Review cadence

Quarterly: Frontend Platform tech lead + SRE on-call + security reviewer convene; revisit every row marked monitored and open; promote / demote severity; close mitigated rows; capture new risks discovered since last review.
Per major release: any risk row touched by the release is re-rated.
Per incident: post-mortem owners audit this register and add any new risk surfaced.

1. Strategic risks​

2. Performance & reliability risks​

3. Security risks​

4. Compliance & data risks​

5. Operational risks​

6. Cost risks​

7. Risk acceptance log​

8. Review cadence​