Skip to main content

SERVICE_RISK_REGISTER — theme-config-service

Sibling: FAILURE_MODES · SECURITY_MODEL · SERVICE_READINESS

This is the living risk register. Each risk is rated on likelihood (1=rare, 5=almost certain) × impact (1=trivial, 5=catastrophic) and tracked through mitigation, owner, review date, and status.

IDRiskLIScoreMitigationOwnerStatusReview
TCS-R-001Cross-tenant data leak via RLS bypass155RLS on every table; tenant_id GUC enforced; runtime metric + alert on theme_rls_violations_total > 0; integration test in CI; quarterly red-teamSecLeadMitigatedQ1 2027
TCS-R-002Stored XSS in tenant-branded site2510dompurify allow-list at write + render; CSP script-src 'self' on booking origin; CSP report-uri monitoredFrontendPlatformMitigatedQ1 2027
TCS-R-003Bundle tampering at the CDN edge155SHA-256 verification at first BFF read; GCS object versioning; Terraform drift detection; CMEKSREMitigatedQ2 2027
TCS-R-004Publishing latency spike during high traffic339Pre-publish bundle build off the request path is in place; CDN warm-up worker; load test profile publish_burst.k6.jsSREMitigatedQ2 2027
TCS-R-005OCC + concurrent edits cause user frustration in chains326Backoffice UI surfaces "stale draft" with diff view; soft-lock per draft (advisory only) — Phase 2FrontendPlatformIn progressQ2 2027
TCS-R-006Preview link spread / brute force236256-bit secret; SHA storage; 60 rpm/token + 600 rpm/tenant rate limit; auto IP block; revocation on edit/publishFrontendPlatformMitigatedQ1 2027
TCS-R-007AI-generated translation degrades UX in low-resource locales4312Locale-pack quality eval suite; HITL approver required; tenant opt-in to AI; rollback always availableAI LeadIn progressQ1 2027
TCS-R-008AI prompt injection from tenant-controlled keywords236Brand keywords sanitised; orchestrator-side instruction hierarchy; output schema enforcement; HITL gateAI LeadMitigatedQ1 2027
TCS-R-009AI cost runaway for a single tenant326Per-tenant monthly budget; per-surface rate limit; alert on first budget exhaustionFinance + AI LeadMitigatedcontinuous
TCS-R-010Layout preset deactivation breaks tenants236Deactivation policy: flag is_active=false does not affect already-published versions; new publishes blocked with actionable error; 30-day deprecation noticeFrontendPlatformMitigatedper-deprecation
TCS-R-011Broken asset URL after file-storage-service deletion428media.deleted.v1 handler + daily scanner; backoffice banner; image fallback with alt text; never breaks page loadFrontendPlatformMitigatedQ1 2027
TCS-R-012Regression in bundle SHA stability across Node minor versions236Canonicaliser test suite pinned to test fixtures; Node version pinned in container; CI matrix tests against the next Node minorFrontendPlatformMitigatedper Node release
TCS-R-013Memorystore outage → origin overload236Origin GCS scales; circuit breaker degrades gracefully; SLO target accommodates 2× origin readsSREMitigatedQ2 2027
TCS-R-014Pub/Sub backlog after sustained outbox failures236Outbox retry-with-backoff; per-tenant alerting; manual replay runbookSREMitigatedQ2 2027
TCS-R-015Poor RTL parity in tokens for new locales339Logical-property derivation enforced at validation; eval scenario ar_locale_rtl.spec.ts; design-system docs include RTL checklistFrontendPlatformIn progressQ1 2027
TCS-R-016Bundle size growth pushes first-paint over budget42840 KB gzipped budget; warning at publish; periodic audit of content-block size; lazy-loadable blocks (Phase 2)FrontendPlatformIn progressQ2 2027
TCS-R-017Phase-2 chain-branding inheritance complexity339Designed as override-on-property over tenant baseline; explicit ADR planned (ADR-0006-chain-branding) before buildArchitecturePlannedQ3 2027
TCS-R-018Tenant onboarding theme provisioning failure → first impression broken248Idempotent handler; default scaffold guarantees something publishes; theme.publish_rejected.v1 triggers backoffice promptFrontendPlatformMitigatedQ1 2027
TCS-R-019Migration introducing breaking schema change144Migration policy is expand-then-contract (see MIGRATION_PLAN); two-revision rollouts; CI drift checkDBAMitigatedper migration
TCS-R-020Region failure (europe-west1)144DR replica in europe-west4; promotion runbook; quarterly DR drillSREMitigatedQ3 2027
TCS-R-021Color contrast validator false-negatives236Reference test vectors from W3C; eval suite; manual audit on every prompt changea11yMitigatedQ1 2027
TCS-R-022Author with theme:author accidentally publishes wrong theme224RBAC requires theme:publish; UI confirmation dialog with diff; rollback always available; audit trailFrontendPlatformMitigatedcontinuous
TCS-R-023Notification-service consumer of email-theme falls behind224Internal endpoint cached at notification side; explicit theme.email_theme_updated.v1 event; degradation = uses prior themeFrontendPlatform + NotifMitigatedQ1 2027
TCS-R-024Desktop bundle drift causes wrong brand in offline operator console224SHA comparison on push; periodic poll; "stale" diagnostic event; tamper detectionFrontendPlatform + DesktopMitigatedQ2 2027
TCS-R-025LayoutPreset registry single point of failure for global UX144Read-only registry with materialised view; cache; default scaffold has fallback preset compiled inFrontendPlatformMitigatedQ3 2027
TCS-R-026Compliance — DSAR fulfilment delays133All actor data routed through iam-service; this service contributes audit log entries via standard pipelinePrivacyMitigatedper regulator
TCS-R-027Vendor lock-in to GCP CDN + GCS326Bundle is plain JSON in a documented bucket layout; portable to S3 + CloudFront in <2 sprint per ADR-0001 alternativesArchitectureAcceptedannual
TCS-R-028AI orchestrator API change breaks theme service236Versioned orchestrator client; contract tests; orchestrator deprecation policyAI LeadMitigatedcontinuous

Review cadence

  • Weekly: owners review their In progress items on standup.
  • Monthly: the service tech lead reviews the full register; new risks added; mitigated risks moved to "Accepted residual" if appropriate.
  • Quarterly: Architecture + Security joint review; cross-service risk dependencies reconciled.

Risk score ≥ 10 escalation

Any risk scoring ≥ 10 (likelihood × impact) is escalated to the platform risk committee. Currently:

  • TCS-R-007 (AI translation quality in low-resource locales) is the only active red. Mitigation owner is the AI Lead; eval suite improvements + HITL gating are the primary controls. Track to ≤ 9 by Q1 2027.

References