SERVICE_RISK_REGISTER — theme-config-service
Sibling: FAILURE_MODES · SECURITY_MODEL · SERVICE_READINESS
This is the living risk register. Each risk is rated on likelihood (1=rare, 5=almost certain) × impact (1=trivial, 5=catastrophic) and tracked through mitigation, owner, review date, and status.
| ID | Risk | L | I | Score | Mitigation | Owner | Status | Review |
|---|---|---|---|---|---|---|---|---|
| TCS-R-001 | Cross-tenant data leak via RLS bypass | 1 | 5 | 5 | RLS on every table; tenant_id GUC enforced; runtime metric + alert on theme_rls_violations_total > 0; integration test in CI; quarterly red-team | SecLead | Mitigated | Q1 2027 |
| TCS-R-002 | Stored XSS in tenant-branded site | 2 | 5 | 10 | dompurify allow-list at write + render; CSP script-src 'self' on booking origin; CSP report-uri monitored | FrontendPlatform | Mitigated | Q1 2027 |
| TCS-R-003 | Bundle tampering at the CDN edge | 1 | 5 | 5 | SHA-256 verification at first BFF read; GCS object versioning; Terraform drift detection; CMEK | SRE | Mitigated | Q2 2027 |
| TCS-R-004 | Publishing latency spike during high traffic | 3 | 3 | 9 | Pre-publish bundle build off the request path is in place; CDN warm-up worker; load test profile publish_burst.k6.js | SRE | Mitigated | Q2 2027 |
| TCS-R-005 | OCC + concurrent edits cause user frustration in chains | 3 | 2 | 6 | Backoffice UI surfaces "stale draft" with diff view; soft-lock per draft (advisory only) — Phase 2 | FrontendPlatform | In progress | Q2 2027 |
| TCS-R-006 | Preview link spread / brute force | 2 | 3 | 6 | 256-bit secret; SHA storage; 60 rpm/token + 600 rpm/tenant rate limit; auto IP block; revocation on edit/publish | FrontendPlatform | Mitigated | Q1 2027 |
| TCS-R-007 | AI-generated translation degrades UX in low-resource locales | 4 | 3 | 12 | Locale-pack quality eval suite; HITL approver required; tenant opt-in to AI; rollback always available | AI Lead | In progress | Q1 2027 |
| TCS-R-008 | AI prompt injection from tenant-controlled keywords | 2 | 3 | 6 | Brand keywords sanitised; orchestrator-side instruction hierarchy; output schema enforcement; HITL gate | AI Lead | Mitigated | Q1 2027 |
| TCS-R-009 | AI cost runaway for a single tenant | 3 | 2 | 6 | Per-tenant monthly budget; per-surface rate limit; alert on first budget exhaustion | Finance + AI Lead | Mitigated | continuous |
| TCS-R-010 | Layout preset deactivation breaks tenants | 2 | 3 | 6 | Deactivation policy: flag is_active=false does not affect already-published versions; new publishes blocked with actionable error; 30-day deprecation notice | FrontendPlatform | Mitigated | per-deprecation |
| TCS-R-011 | Broken asset URL after file-storage-service deletion | 4 | 2 | 8 | media.deleted.v1 handler + daily scanner; backoffice banner; image fallback with alt text; never breaks page load | FrontendPlatform | Mitigated | Q1 2027 |
| TCS-R-012 | Regression in bundle SHA stability across Node minor versions | 2 | 3 | 6 | Canonicaliser test suite pinned to test fixtures; Node version pinned in container; CI matrix tests against the next Node minor | FrontendPlatform | Mitigated | per Node release |
| TCS-R-013 | Memorystore outage → origin overload | 2 | 3 | 6 | Origin GCS scales; circuit breaker degrades gracefully; SLO target accommodates 2× origin reads | SRE | Mitigated | Q2 2027 |
| TCS-R-014 | Pub/Sub backlog after sustained outbox failures | 2 | 3 | 6 | Outbox retry-with-backoff; per-tenant alerting; manual replay runbook | SRE | Mitigated | Q2 2027 |
| TCS-R-015 | Poor RTL parity in tokens for new locales | 3 | 3 | 9 | Logical-property derivation enforced at validation; eval scenario ar_locale_rtl.spec.ts; design-system docs include RTL checklist | FrontendPlatform | In progress | Q1 2027 |
| TCS-R-016 | Bundle size growth pushes first-paint over budget | 4 | 2 | 8 | 40 KB gzipped budget; warning at publish; periodic audit of content-block size; lazy-loadable blocks (Phase 2) | FrontendPlatform | In progress | Q2 2027 |
| TCS-R-017 | Phase-2 chain-branding inheritance complexity | 3 | 3 | 9 | Designed as override-on-property over tenant baseline; explicit ADR planned (ADR-0006-chain-branding) before build | Architecture | Planned | Q3 2027 |
| TCS-R-018 | Tenant onboarding theme provisioning failure → first impression broken | 2 | 4 | 8 | Idempotent handler; default scaffold guarantees something publishes; theme.publish_rejected.v1 triggers backoffice prompt | FrontendPlatform | Mitigated | Q1 2027 |
| TCS-R-019 | Migration introducing breaking schema change | 1 | 4 | 4 | Migration policy is expand-then-contract (see MIGRATION_PLAN); two-revision rollouts; CI drift check | DBA | Mitigated | per migration |
| TCS-R-020 | Region failure (europe-west1) | 1 | 4 | 4 | DR replica in europe-west4; promotion runbook; quarterly DR drill | SRE | Mitigated | Q3 2027 |
| TCS-R-021 | Color contrast validator false-negatives | 2 | 3 | 6 | Reference test vectors from W3C; eval suite; manual audit on every prompt change | a11y | Mitigated | Q1 2027 |
| TCS-R-022 | Author with theme:author accidentally publishes wrong theme | 2 | 2 | 4 | RBAC requires theme:publish; UI confirmation dialog with diff; rollback always available; audit trail | FrontendPlatform | Mitigated | continuous |
| TCS-R-023 | Notification-service consumer of email-theme falls behind | 2 | 2 | 4 | Internal endpoint cached at notification side; explicit theme.email_theme_updated.v1 event; degradation = uses prior theme | FrontendPlatform + Notif | Mitigated | Q1 2027 |
| TCS-R-024 | Desktop bundle drift causes wrong brand in offline operator console | 2 | 2 | 4 | SHA comparison on push; periodic poll; "stale" diagnostic event; tamper detection | FrontendPlatform + Desktop | Mitigated | Q2 2027 |
| TCS-R-025 | LayoutPreset registry single point of failure for global UX | 1 | 4 | 4 | Read-only registry with materialised view; cache; default scaffold has fallback preset compiled in | FrontendPlatform | Mitigated | Q3 2027 |
| TCS-R-026 | Compliance — DSAR fulfilment delays | 1 | 3 | 3 | All actor data routed through iam-service; this service contributes audit log entries via standard pipeline | Privacy | Mitigated | per regulator |
| TCS-R-027 | Vendor lock-in to GCP CDN + GCS | 3 | 2 | 6 | Bundle is plain JSON in a documented bucket layout; portable to S3 + CloudFront in <2 sprint per ADR-0001 alternatives | Architecture | Accepted | annual |
| TCS-R-028 | AI orchestrator API change breaks theme service | 2 | 3 | 6 | Versioned orchestrator client; contract tests; orchestrator deprecation policy | AI Lead | Mitigated | continuous |
Review cadence
- Weekly: owners review their
In progressitems on standup. - Monthly: the service tech lead reviews the full register; new risks added; mitigated risks moved to "Accepted residual" if appropriate.
- Quarterly: Architecture + Security joint review; cross-service risk dependencies reconciled.
Risk score ≥ 10 escalation
Any risk scoring ≥ 10 (likelihood × impact) is escalated to the platform risk committee. Currently:
- TCS-R-007 (AI translation quality in low-resource locales) is the only active red. Mitigation owner is the AI Lead; eval suite improvements + HITL gating are the primary controls. Track to ≤ 9 by Q1 2027.
References
- Failure modes:
FAILURE_MODES - Threat model:
SECURITY_MODEL §1 - Migration policy:
MIGRATION_PLAN