AI_INTEGRATION — theme-config-service

Sibling: APPLICATION_LOGIC · DOMAIN_MODEL · SECURITY_MODEL

Platform anchors: docs/08-ai-architecture.md

theme-config-service is an AI-assisted authoring service but never an AI-autonomous service. Every AI surface is HITL-gated: the model drafts; a human approves; the system applies. This document specifies the integration contract, prompt structure, model routing, redaction, provenance, evaluation, and rollback behaviour.

1. AI surfaces

Surface	Purpose	Trigger	Output	Auto-applied?
Palette suggestion	Derive secondary/accent/status tokens from one primary color + brand keywords	`POST /themes/:id/ai-suggest-palette`	`Partial<ColorTokens>`	No (HITL)
Translation drafting	Draft missing locale entries from the default-locale source	`POST /themes/:id/ai-draft-translations`	`Map<key, translatedText>`	No (HITL)
Content drafting	Draft a content block in a target locale (e.g. "About us" in `ar-SA` from `en-US` source)	`POST /content-blocks/:id/ai-draft`	`MarkupEntry`	No (HITL)
Contrast remediation suggestion	When a token pair fails WCAG AA, propose an adjusted hex that passes	Inline editor button	`Partial<ColorTokens>`	No (HITL)
Layout preset recommendation	Suggest a layout preset based on tenant's property type + photos	Backoffice "Suggest layout" affordance	`LayoutSelections`	No (HITL)
Copy proofreading	Detect typos, tone inconsistencies, and brand-voice drift in copy strings	Inline editor lint	`Suggestion[]`	No (HITL)

There is no streaming AI surface in this service — all calls are request/response and bounded by an orchestrator-side budget.

2. AI orchestration

All AI calls are routed through ai-orchestrator-service. theme-config-service does not hold model API keys; it speaks only to the orchestrator over mTLS using its workload identity:

interface AIClient {
  suggestPalette(input: SuggestPaletteInput): Promise<{ tokens: Partial<ColorTokens>; provenance: AIProvenance }>;
  draftTranslations(input: DraftTranslationsInput): Promise<{ entries: ReadonlyMap<string, string>; provenance: AIProvenance }>;
  draftContent(input: DraftContentInput): Promise<{ markup: MarkupEntry; provenance: AIProvenance }>;
  suggestContrastFix(input: SuggestContrastFixInput): Promise<{ tokens: Partial<ColorTokens>; provenance: AIProvenance }>;
  recommendLayout(input: RecommendLayoutInput): Promise<{ selections: Partial<LayoutSelections>; provenance: AIProvenance }>;
  proofreadCopy(input: ProofreadCopyInput): Promise<{ suggestions: ProofreadSuggestion[]; provenance: AIProvenance }>;
}

The orchestrator chooses the model per docs/08-ai-architecture.md §5:

Surface	Default model	Fallback	Rationale
Palette suggestion	`claude-3-5-sonnet` (latest)	`gpt-4.1-mini`	Color reasoning is design-heavy; Sonnet's tool-use is reliable for emitting strict JSON
Translation drafting (high-resource locales: en, fr, ar, fa)	`gpt-4.1-mini`	`gemini-2.0-flash`	Cost-optimized; we still HITL
Translation drafting (low-resource: ps-AF, dialectal Dari)	`claude-3-5-sonnet`	local Pashto MT model on Vertex	Quality-critical for our home market
Content drafting	`claude-3-5-sonnet`	`gpt-4.1`	Long-form quality + brand-safe output
Contrast remediation	deterministic algorithm + `gpt-4.1-mini` for naming	none	Algorithm is exact; LLM only names the new shade
Layout recommendation	`gemini-2.0-pro` (vision)	`claude-3-5-sonnet` (vision)	Vision input from property photos
Copy proofreading	`gpt-4.1-mini`	`claude-3-5-haiku`	Cheap, high-throughput

Routing is the orchestrator's responsibility; this service treats it as a black box.

3. Prompt + response shapes

All prompts are versioned in services/theme-config-service/prompts/ and referenced by the orchestrator via a promptId + promptVersion envelope. Example shapes follow.

3.1 Palette suggestion

System prompt (id: theme.palette.system, v3):

You are a brand designer for a multi-tenant hotel SaaS. Given a primary brand color and optional brand keywords, derive a complete tenant palette that:

Passes WCAG 2.1 AA contrast on every pair documented in the schema.

Respects warm/cool harmony with the primary.

Uses sober status colors (success / warning / error / info) that hotel guests will read as neutral and trustworthy.

Outputs only the JSON object that conforms to the provided schema. Do not explain.

User prompt template:

Primary color: {{primaryColor}}
Brand keywords (optional): {{brandKeywordsJoined}}
Tenant market context (optional, for cultural color reading): {{marketContext}}
Schema: {{schemaSnippet}}

Response constraint: the orchestrator forces JSON-mode (response_format: json_schema) against theme.palette.suggestion.schema.v1.json. Any non-conforming response is rejected at the orchestrator and surfaced as MELMASTOON.AI.OUTPUT_INVALID.

3.2 Translation drafting

System prompt (id: theme.translate.system, v4):

You are translating UI strings for a hotel booking flow. Preserve ICU placeholders ({name}, {count, plural, one {…} other {…}}) exactly. Match the formality the hospitality industry uses in the target locale. Do not add commentary. Output only the JSON map of key → translated string.

User prompt template:

Source locale: {{sourceLocale}}
Target locale: {{targetLocale}}
Brand voice: {{brandVoiceShort}}
Hotel name: {{hotelName}}
Entries:
{{#each keys}}
  - key: {{this.key}}
    source: {{this.sourceText}}
    context: {{this.context}}
{{/each}}

Response is JSON { "<key>": "<translated>" }. Validation: every input key must be present; placeholder set must equal source.

3.3 Content drafting (e.g. about-us in target locale)

System prompt enforces brand-voice + safety rails (no medical claims, no exaggerated guarantees). User prompt carries property context (name, location, amenities, source markup). Output is MarkupEntry; we sanitise post-hoc through the same dompurify allow-list used for human-authored markup.

3.4 Contrast remediation

The deterministic component computes the minimum L* shift on the failing token to reach AA, in the direction (lighter or darker) that preserves the hue best. The LLM only produces a human-readable name like "Deeper Hazara Turquoise" for the new shade.

4. Redaction & data minimisation

Per docs/08-ai-architecture.md §7, only the minimum necessary payload crosses the model trust boundary:

Surface	Sent to model	Never sent to model
Palette suggestion	Primary color, brand keywords, market context (e.g. `AF`)	Tenant ID, tenant name, user ID, IP, full revenue / booking data
Translation drafting	Locale codes, source strings, brand voice short, hotel name	Tenant ID, guest data, booking data
Content drafting	Property name, location label, amenity short list, source markup	Guest data, internal pricing, revenue
Layout recommendation	Property type label, ≤ 6 hero photos	Personally identifying info in photos (faces auto-blurred by `file-storage-service` before send)
Copy proofreading	Locale, copy strings, brand voice short	Tenant ID, user ID, surrounding business data

The orchestrator additionally:

Strips request headers (no Authorization, no X-Tenant-Id reach the model).
Replaces tenant identifiers with stable opaque tokens for the request span.
Logs a redacted prompt hash for audit; the raw prompt is encrypted and held for 30 days for incident response only.

5. Provenance

Every aggregate that carries AI-drafted content stores aiProvenance:

interface AIProvenance {
  model: string;                  // e.g. 'claude-3-5-sonnet-2026-01'
  modelProvider: 'anthropic'|'openai'|'google'|'self_hosted';
  promptId: string;               // 'theme.palette.system'
  promptVersion: string;          // 'v3'
  promptHash: string;             // sha256 of the rendered prompt (post-redaction)
  responseHash: string;           // sha256 of the raw response
  tokensIn: number;
  tokensOut: number;
  latencyMs: number;
  costUsd: number;                // orchestrator-attributed
  createdAt: ISODate;
  redactionApplied: boolean;
  approverUserId?: UserId;        // set on apply
  approvedAt?: ISODate;
  approverNote?: string;
}

This data is:

Persisted on the aggregate (theme_versions.ai_provenance, content_blocks.ai_provenance, locale_packs.ai_provenance).
Echoed in the published bundle's meta.aiProvenance for downstream auditability.
Surfaced in theme.published.v1.aiProvenance for the audit service.
Used by HITL gating: any aggregate with non-null aiProvenance.model and null approverUserId blocks publish.

6. HITL approval flow

Author opens "AI suggest palette"
        │
        ▼
POST /themes/:id/ai-suggest-palette
        │
        ▼  draftedTokens + provenance persisted in palette_suggestions (status='pending')
        │
        ▼  hitlTaskUrl returned
        │
        ▼
HITL approver opens task in backoffice
        │
        ▼  reviews drafted tokens; can edit before approving
        │
        ▼
POST /themes/:id/ai-suggest-palette/:suggestionId/apply
        │   { themeVersionId, approverNote }
        ▼
ApplyAiPaletteSuggestionUseCase
        │   merges tokens onto target draft
        │   attaches aiProvenance with approverUserId + approvedAt
        │   marks suggestion status='approved'
        ▼
PatchThemeVersionUseCase emits theme.draft_updated.v1

Reject path: POST .../reject sets status='rejected'; suggestion is retained for audit but never applied.

Eligibility: only roles with theme:approve_ai (typically tenant_admin, brand_owner) can call apply. The author of the suggestion cannot approve it (separation of duties).

7. Evaluation harness

Per docs/08-ai-architecture.md §10, every AI surface ships with an offline evaluation harness in services/theme-config-service/evals/:

Eval	Dataset size	Pass criteria	Run frequency
`palette.contrast`	200 primary colors × 5 keyword sets	≥ 99 % of generated palettes pass WCAG AA on every documented pair	per prompt change + nightly
`palette.harmony`	100 hand-graded palettes	≥ 90 % graded "acceptable" or higher by 2/3 designers	per prompt change
`translate.placeholders`	500 ICU strings × 8 target locales	100 % placeholder parity	per prompt change + nightly
`translate.fluency`	200 strings × 4 locales (ps, fa, ar, fr), graded	≥ 95 % "fluent" by native graders (sampling)	weekly
`content.brand_safety`	50 prompts × 3 sensitive contexts	0 unsafe outputs	per prompt change
`content.hallucination`	50 prompts (no real amenity for property)	0 fabricated amenities	per prompt change
`contrast.fix.minimum_shift`	200 failing pairs	algorithmic exact match in 100 % of cases	per change to deterministic engine
`proofread.precision`	200 strings, hand-labelled	≥ 90 % precision @ recall 0.7	weekly

Evals run in CI for the prompt directory; failures block deployment of new prompts.

8. Cost & rate budgets

The orchestrator enforces per-tenant monthly budgets (default $20 / month for AI surfaces; configurable per tenant plan). On budget exhaustion, requests return 429 MELMASTOON.AI.BUDGET_EXCEEDED.

Per-request rate limits (in addition to the API rate limits in API_CONTRACTS §7):

Surface	Per-tenant rpm	Per-actor rpm
Palette	10	5
Translations	30	15 (batch up to 200 keys per call)
Content draft	10	5
Contrast fix	60	30
Layout recommend	5	5
Proofread	60	30

9. Failure modes

Failure	Symptom	Service behaviour
Orchestrator unreachable	gRPC `UNAVAILABLE`	Return `503 MELMASTOON.AI.UNAVAILABLE` with `Retry-After`
Output schema invalid	Orchestrator-side validation fails	Return `502 MELMASTOON.AI.OUTPUT_INVALID`; suggestion not persisted
Output unsafe (safety filter)	Orchestrator returns safety-blocked	Return `422 MELMASTOON.AI.OUTPUT_BLOCKED`; suggestion logged for review
Budget exhausted	Orchestrator returns `429`	Pass through; surface in UI
Approver attempts to apply unapproved	Use case checks role	`403 MELMASTOON.AI.HITL_INELIGIBLE_APPROVER`
Approver is also author	Use case checks separation of duties	`403 MELMASTOON.AI.HITL_SAME_ACTOR_FORBIDDEN`
Stale suggestion (target draft changed since suggestion was created)	Apply detects version mismatch	`409 MELMASTOON.AI.SUGGESTION_STALE`; UI invites refresh

10. Auditability & opt-out

Every AI request and every HITL decision is logged to audit-service via theme.ai.suggestion_created.v1 / …approved.v1 / …rejected.v1 (sibling stream of the main theme events).
Tenants can opt out of AI assistance at the tenant config level (tenant.config.featureFlags.ai_assistance = false); the AI endpoints then return 403 MELMASTOON.AI.DISABLED_FOR_TENANT.
No AI-generated content is automatically included in any published bundle — every byte that ships through the CDN passed a human review.

11. References

Platform AI architecture: docs/08-ai-architecture.md
Use cases: APPLICATION_LOGIC §2.11–2.12
Provenance shape: DOMAIN_MODEL §6.2
HITL RBAC: SECURITY_MODEL

1. AI surfaces​

2. AI orchestration​

3. Prompt + response shapes​

3.1 Palette suggestion​

3.2 Translation drafting​

3.3 Content drafting (e.g. about-us in target locale)​

3.4 Contrast remediation​

4. Redaction & data minimisation​

5. Provenance​

6. HITL approval flow​

7. Evaluation harness​

8. Cost & rate budgets​

9. Failure modes​

10. Auditability & opt-out​

11. References​