AI_INTEGRATION — notification-service

Sibling: DOMAIN_MODEL · APPLICATION_LOGIC · SECURITY_MODEL · OBSERVABILITY

Strategic anchors: 08 AI Architecture · 07 Security/Compliance/Tenancy · 02 Enterprise Architecture §10 AI Integration

notification-service is an AI consumer, never an AI runner. All model calls are routed through ai-orchestrator-service, which owns prompt templates, model selection, key handling, content safety, cost accounting, and telemetry. We never embed an LLM SDK, we never hold a model API key, and we never call Vertex / OpenAI / Anthropic directly. Every AI-derived artefact carries an AIProvenance record and (for guest-facing send) crosses a Human-in-the-Loop (HITL) gate before dispatch.

1. AI capabilities consumed

Capability	Surface	Model class (orchestrator-selected)	HITL?	Latency budget
AI-drafted personalised messages (per-recipient body for transactional + reminder)	Inline during `EnqueueNotificationUseCase` via `AIClient.fetchAIDraftedContent` (sync, p99 ≤ 600 ms) or async `ai.draft_content.ready.v1`	low-cost chat (e.g., gpt-4o-mini class)	per-tenant policy: `disabled \| suggest_only \| auto_send`; default `suggest_only` for new tenants; `auto_send` only for `marketing` and `reminder` categories	inline 600 ms; async unbounded
Multilingual translation (rendering body in a guest's locale when no human translation exists)	Async via orchestrator at `TemplateVersion` publish time; produces a draft locale	high-quality translation model	yes — every machine-translated locale must be approved by a tenant translator/reviewer before becoming `active`	n/a (offline)
Tone control for staff-authored ad-hoc sends (warm / professional / apologetic / promotional)	Backoffice UI calls orchestrator's "rewrite with tone" tool; result returned to staff for review, then submitted via `notifications.create`	low-cost chat	implicit — staff reviews the rewrite themselves before submit	sub-second UX
Sentiment-aware response suggestions for inbound guest replies (SMS-IN, WhatsApp inbound, email reply-to)	When inbound webhook arrives at `inbox-service`, it calls orchestrator's "suggest reply" tool and posts suggestions into the staff thread; if staff selects, they call `POST /api/v1/notifications` to send	mid-tier chat with structured output	yes — staff explicitly picks a suggestion; nothing auto-sends	sub-second per suggestion
Template lint / quality score at draft time (readability, grammar, missing variables, RTL/LTR mistakes)	Backoffice UI calls orchestrator's "lint" tool; we display warnings on the template editor	low-cost chat	n/a — advisory only	sub-second
Variable extraction from staff prose (staff types "Send Ahmed his confirmation in Pashto with a 10% loyalty offer"; we extract `{guest, locale, promo}`)	Backoffice UI utility	low-cost chat with JSON-mode	yes — staff confirms parsed slots before submit	sub-second
Suppression-reason classifier for free-text vendor bounce diagnostics that don't map cleanly to enums	Async classifier in `ProcessVendorWebhookUseCase` (best-effort enrichment; never gates persistence)	small classifier	no	non-blocking

We do not use AI for: send/no-send decisions on transactional categories (deterministic only), choosing a vendor (rule-based via Channel.fallbackVendor and health), routing across channels (preference-based), or anything that affects security/compliance categories.

2. Routing through ai-orchestrator-service

All capabilities above are exposed by ai-orchestrator-service as named tools with versioned contracts:

Tool name (orchestrator-side)	Purpose	We invoke via
`notification.draft_personalised_body.v1`	Generate per-recipient body	`AIClient.fetchAIDraftedContent(draftId)` (we hold a `draftId` from a prior async hint, OR we invoke synchronously by capability key)
`notification.translate_template_locale.v1`	Translate a template version's body into a target locale	event-driven workflow; orchestrator emits `ai.draft_content.ready.v1` → we run `RegisterAIDraftedTemplateUseCase`
`notification.rewrite_tone.v1`	Tone rewrite	called by `bff-backoffice-service`; we never see the call directly
`notification.suggest_reply.v1`	Inbound-reply suggestions	owned by `inbox-service`; result reaches us only when staff submits a `notifications.create`
`notification.lint_template.v1`	Template lint	called by `bff-backoffice-service`
`notification.classify_bounce_reason.v1`	Bounce reason enrichment	we publish a small fact-event to a Pub/Sub topic; orchestrator returns a labelled event we consume non-critically

Why this indirection? It keeps model selection, prompt safety, model cost accounting, and content-policy enforcement in one platform layer; we just consume strongly-typed outputs.

The orchestrator request always carries the requesting service identity, tenant id, capability key, and a purposeOfProcessing claim used for downstream audit and DPA compliance (08 §5).

3. AIProvenance attached to every AI-derived artefact

interface AIProvenanceRef {
  draftId: string;                  // orchestrator-issued
  capabilityKey: string;            // 'notification.draft_personalised_body.v1' etc.
  modelHandle: string;              // opaque, orchestrator-managed (e.g., 'or:llm:openai:gpt-4o-mini@2026-01')
  promptTemplateRef: string;        // versioned prompt ref maintained by orchestrator
  inputDigest: string;              // sha256 of canonical input
  outputDigest: string;             // sha256 of canonical output
  safety: {
    classification: 'green'|'yellow'|'red';
    flagged?: string[];             // categories that fired
  };
  hitl?: {
    approvalRequestId?: string;
    approverUserId?: string;
    approvedAt?: string;
    rationale?: string;
  };
  cost: {
    inputTokens: number;
    outputTokens: number;
    estimatedUsdMicro: string;      // bigint string
  };
  generatedAt: string;
}

This object is persisted on:

notifications.ai_provenance (per-message personalisation)
template_versions.ai_provenance (per-version drafting/translation)
the corresponding notification.requested.v1, template.published.v1 events

Provenance is required for any artefact that originated from an AI tool. Missing provenance fails the publish/enqueue use case with MELMASTOON.AI.PROVENANCE_MISSING.

4. HITL gate

4.1 What gates send

Artefact	Always-on gate	Conditional gate
AI-drafted template version for any category	yes — `PublishTemplateVersionUseCase` requires `approverUserId` when `source='ai_drafted'`	n/a
AI machine-translated locale of an existing template	yes — the new locale lands as `status='draft'` and must be reviewed before activation	n/a
Per-message AI personalisation	conditional on tenant policy `aiPersonalisation`	`disabled` (never), `suggest_only` (HITL via in-app review queue before send), `auto_send` (no per-message HITL — but only categories `marketing` and `reminder` allowed; provenance still attached)
AI-suggested reply to inbound guest message	yes (always) — staff selects a suggestion explicitly	n/a
AI rewrite tone of staff prose	implicit — staff is the author and reviews the rewrite	n/a

4.2 The `suggest_only` review queue

When tenant policy is suggest_only and an AI-personalised body is produced for a candidate notification:

EnqueueNotificationUseCase
  → orchestrator returns RenderedMessage with AIProvenance
  → instead of inserting Notification(status='queued'),
    insert Notification(status='scheduled', scheduledFor=null, suppressed=false) + draftHold=true
  → emit melmastoon.notification.scheduled.v1 reason='suggest_only_review'
  → emit a separate in-app notification to tenant admins / property managers:
       templateKey='ai.review.required.inapp', payload includes draftId and a review URL
  → Staff reviewer either approves (status→queued, dispatch worker takes over)
                    or rejects (status→suppressed reason='hitl_rejected', emit suppressed.v1)
                    or edits → orchestrator records the human edit as an outputOverride;
                              new variant gets a fresh AIProvenance.hitl block with rationale

A queued review is auto-cancelled if the underlying source event is invalidated (e.g., reservation cancelled) — preventing embarrassing post-cancellation sends.

4.3 Time-bound HITL

Reviews older than tenant.aiPolicy.hitlReviewTtlHours (default 24h, configurable up to 168h) auto-fail-safe to:

transactional categories: never auto-approve; deterministic fallback template is sent instead.
marketing categories: auto-cancel (suppressed reason='hitl_expired').
reminder categories: deterministic fallback template (no AI personalisation) is sent.

5. Safety, content policy, and prompt injection

The orchestrator runs platform-level safety; we additionally enforce:

Concern	Control
Prompt injection via guest names / variables	All variables passed to the orchestrator are framed as structured JSON inputs, not concatenated prose. Variables marked `untrusted=true` (`guest.notes`, `guest.specialRequests.freeText`) are instructed in the prompt to be treated as data, not instructions. The orchestrator's safety layer also strips/encodes these before the model sees them.
Guest data leakage between tenants	We never call the orchestrator with cross-tenant context. The orchestrator's tenant-id claim is asserted on every call; mismatched outputs are rejected.
PII minimisation	We send only the minimum context needed: locale, channel, category, template key, allowed variables (declared in `variablesSchema`). We never send full guest profiles or other notifications' bodies.
Brand-safety / hotel-domain guardrails	Prompts include a system block forbidding promotional tone in transactional categories, forbidding price quotes that aren't in the variables, forbidding mentions of competitors, and forbidding any content that looks like medical / legal / financial advice.
Hallucination of money or dates	The model is instructed to only echo numbers/dates that appear in the input variables. Output is post-validated: any digit sequence in the output that is not present in the canonical input variables fails validation and the deterministic template is used instead.
Content-policy classification (`safety.classification`)	`red` outputs are rejected and emit a `failed.v1` with `reason='render_error'`; `yellow` requires HITL irrespective of tenant policy.
RTL/LTR correctness	Output is checked for matching `direction` and balanced bidi marks; mismatch logs a warning and the deterministic template is used.

6. Multilingual translation pipeline

Backoffice → "Translate this template version into ur-PK"
   ↓
bff-backoffice-service → ai-orchestrator-service.translate_template_locale.v1
   ↓ (async)
orchestrator emits melmastoon.ai.draft_content.ready.v1
   { tenantId, draftId, purpose='notification.template.copy',
     templateKey, locale='ur-PK', channel, body, subject, aiProvenance }
   ↓
notification-service.RegisterAIDraftedTemplateUseCase
   → appends a TemplateVersion(source='ai_drafted', status='draft', locales={'ur-PK': {...}})
   → triggers an in-app notification to translation reviewers (using our own service!)
   ↓
Reviewer opens preview → edits if needed → POST /publish with approverUserId
   → PublishTemplateVersionUseCase enforces approver presence; emits template.published.v1

The reviewer UI shows side-by-side: the canonical English (en-US) on one side, the AI-translated target locale on the other, with diff highlights and a "previously translated" reference (the prior active version of the same locale, if any). The reviewer can spot-check by clicking "send to my own number" (test-send route from API_CONTRACTS §4.7).

Locale fallback chain (per DOMAIN_MODEL §2 LOCALE_FALLBACK) means a missing translation never hard-fails a send: we render the next-best locale and emit a soft warning event melmastoon.notification.locale_fallback_used.v1 (currently informational-only) so platform admins see translation gaps.

7. AI in inbound reply suggestion (guest replies to SMS/WhatsApp)

The inbox-service owns inbound parsing. When a guest replies to one of our outbound notifications, inbox-service:

Loads the original Notification (via reply-token / Twilio in-reply-to / WhatsApp message id) for context.
Calls ai-orchestrator-service.notification.suggest_reply.v1 with: outbound body (rendered), guest reply text, locale, sentiment, recommended next-action set.
Returns up to 3 ranked suggestions to the staff inbox UI.

When staff chooses a suggestion, the BFF calls our POST /api/v1/notifications with the chosen body (and aiProvenance reflecting the suggestion they picked). Staff CAN edit before sending; the edit is recorded as hitl.rationale='staff_edit' in provenance.

Sentiment escalation: when the orchestrator tags the inbound reply as negative with confidence ≥ 0.85, the inbox UI surfaces an "Escalate to manager" action — and the suggested reply is never auto-marked safe; it always requires explicit staff confirmation, even on tenants with tone-rewrite "fast-send" enabled.

8. Cost & quotas

The orchestrator publishes melmastoon.ai.usage.recorded.v1 for every call attributed to notification-service. We do not budget at our boundary; instead:

Tenant admins see a per-month AI cost panel under "Notifications → AI usage" (rendered by the BFF; reads events_operational.ai_usage in BigQuery).
Tenant policy can cap per-month USD for AI-personalisation in marketing; once exceeded, future enqueue with aiPersonalisation=requested falls back to deterministic body and emits melmastoon.notification.ai_budget_exhausted.v1 (log-only, no consumer in v1).
Soft budget warnings at 75 % / 90 % / 100 % delivered as in-app notifications to OWNER and BILLING_ADMIN.

9. Data privacy and DPA compliance

The orchestrator's data-processing boundary is regional and is colocated with our regional Postgres (gcp-asia-south1 and gcp-me-central1). Cross-region model calls are blocked by VPC-SC (07 §6).
Per-tenant data sovereignty toggle: tenant.aiPolicy.allowExternalModels=false forces the orchestrator to use only models hosted in-region with no logging at the model provider; on tenants with the toggle off, AI capabilities for those tenants are disabled (UI greyed out).
Per-Recipient consent: notification_preferences.aiConsent (additive opt-in) governs whether AI personalisation MAY be applied to that recipient's outbound messages. Missing or false → deterministic only. Default for new recipients is false for marketing; true (with platform default) for transactional/operational/reminder since the AI capability there is wording-only and content invariants are deterministic-checked.
Right to erasure: iam.user.deleted.v1 propagates to the orchestrator's drafts ledger; provenance rows on our side become unjoinable to the source person after crypto-shred (we keep outputDigest for audit but cannot reconstruct the input).

10. Bias and quality guardrails

Locale parity reviews: every quarter, a sampling job compares delivery acceptance rates and reply sentiment across locales; statistically significant disparities trigger a tracked review action (SERVICE_RISK_REGISTER row R-NTF-12).
Tone consistency: the lint tool grades a tenant's templates against a chosen brand tone vector; periodic drift produces an alert.
Fairness in marketing: AI-personalised marketing copy must not vary the offer (price, discount %, eligibility) by recipient — the model is constrained to vary only wording. Validation happens in the orchestrator (the offer is a structured field that is not exposed to the language tool).
Human escape hatch: every AI-mediated UI surface ("personalised draft", "reply suggestion", "tone rewrite") has a "Use deterministic body" / "Write yourself" affordance with one click.

11. Telemetry

Per-call AI telemetry (emitted by orchestrator, joined to our notifications via draftId):

Metric	Notes
`notif.ai.calls_total{capability,model,outcome,tenant}`	counter
`notif.ai.latency_seconds{capability}`	histogram
`notif.ai.tokens_input/output_total{capability,model}`	counter
`notif.ai.cost_usd_micro_total{tenant,capability}`	counter
`notif.ai.safety_rejections_total{reason}`	counter
`notif.ai.hitl_decisions_total{decision,category}`	counter — `decision ∈ {approved,rejected,edited,expired}`
`notif.ai.fallback_to_deterministic_total{reason}`	counter

Dashboards in OBSERVABILITY §6. Alerts:

AI safety-rejection rate > 1% over 30 min → page on-call.
HITL queue depth > 200 for any tenant > 1h → page tenant ops.
AI fallback rate > 25% on any capability for > 1h → ticket (signals model regression or prompt drift).

12. Disable / kill-switch

Two kill-switches:

Per-tenant (tenant.aiPolicy.enabled=false) — all AI capabilities disabled for that tenant within seconds (Memorystore policy cache TTL 30s).
Platform-global feature flag notifications.ai.enabled=false (LaunchDarkly / Cloud Run env propagated via tenant-service config) — disables all AI capabilities platform-wide. Used for incident response.

When disabled, the deterministic template path runs unmodified; there is no degradation to non-AI sends.

1. AI capabilities consumed​

2. Routing through ai-orchestrator-service​

3. AIProvenance attached to every AI-derived artefact​

4. HITL gate​

4.1 What gates send​

4.2 The suggest_only review queue​

4.3 Time-bound HITL​

5. Safety, content policy, and prompt injection​

6. Multilingual translation pipeline​

7. AI in inbound reply suggestion (guest replies to SMS/WhatsApp)​

8. Cost & quotas​

9. Data privacy and DPA compliance​

10. Bias and quality guardrails​

11. Telemetry​

12. Disable / kill-switch​