Skip to main content

AI_INTEGRATION — notification-service

Sibling: DOMAIN_MODEL · APPLICATION_LOGIC · SECURITY_MODEL · OBSERVABILITY

Strategic anchors: 08 AI Architecture · 07 Security/Compliance/Tenancy · 02 Enterprise Architecture §10 AI Integration

notification-service is an AI consumer, never an AI runner. All model calls are routed through ai-orchestrator-service, which owns prompt templates, model selection, key handling, content safety, cost accounting, and telemetry. We never embed an LLM SDK, we never hold a model API key, and we never call Vertex / OpenAI / Anthropic directly. Every AI-derived artefact carries an AIProvenance record and (for guest-facing send) crosses a Human-in-the-Loop (HITL) gate before dispatch.


1. AI capabilities consumed

CapabilitySurfaceModel class (orchestrator-selected)HITL?Latency budget
AI-drafted personalised messages (per-recipient body for transactional + reminder)Inline during EnqueueNotificationUseCase via AIClient.fetchAIDraftedContent (sync, p99 ≤ 600 ms) or async ai.draft_content.ready.v1low-cost chat (e.g., gpt-4o-mini class)per-tenant policy: disabled | suggest_only | auto_send; default suggest_only for new tenants; auto_send only for marketing and reminder categoriesinline 600 ms; async unbounded
Multilingual translation (rendering body in a guest's locale when no human translation exists)Async via orchestrator at TemplateVersion publish time; produces a draft localehigh-quality translation modelyes — every machine-translated locale must be approved by a tenant translator/reviewer before becoming activen/a (offline)
Tone control for staff-authored ad-hoc sends (warm / professional / apologetic / promotional)Backoffice UI calls orchestrator's "rewrite with tone" tool; result returned to staff for review, then submitted via notifications.createlow-cost chatimplicit — staff reviews the rewrite themselves before submitsub-second UX
Sentiment-aware response suggestions for inbound guest replies (SMS-IN, WhatsApp inbound, email reply-to)When inbound webhook arrives at inbox-service, it calls orchestrator's "suggest reply" tool and posts suggestions into the staff thread; if staff selects, they call POST /api/v1/notifications to sendmid-tier chat with structured outputyes — staff explicitly picks a suggestion; nothing auto-sendssub-second per suggestion
Template lint / quality score at draft time (readability, grammar, missing variables, RTL/LTR mistakes)Backoffice UI calls orchestrator's "lint" tool; we display warnings on the template editorlow-cost chatn/a — advisory onlysub-second
Variable extraction from staff prose (staff types "Send Ahmed his confirmation in Pashto with a 10% loyalty offer"; we extract {guest, locale, promo})Backoffice UI utilitylow-cost chat with JSON-modeyes — staff confirms parsed slots before submitsub-second
Suppression-reason classifier for free-text vendor bounce diagnostics that don't map cleanly to enumsAsync classifier in ProcessVendorWebhookUseCase (best-effort enrichment; never gates persistence)small classifiernonon-blocking

We do not use AI for: send/no-send decisions on transactional categories (deterministic only), choosing a vendor (rule-based via Channel.fallbackVendor and health), routing across channels (preference-based), or anything that affects security/compliance categories.


2. Routing through ai-orchestrator-service

All capabilities above are exposed by ai-orchestrator-service as named tools with versioned contracts:

Tool name (orchestrator-side)PurposeWe invoke via
notification.draft_personalised_body.v1Generate per-recipient bodyAIClient.fetchAIDraftedContent(draftId) (we hold a draftId from a prior async hint, OR we invoke synchronously by capability key)
notification.translate_template_locale.v1Translate a template version's body into a target localeevent-driven workflow; orchestrator emits ai.draft_content.ready.v1 → we run RegisterAIDraftedTemplateUseCase
notification.rewrite_tone.v1Tone rewritecalled by bff-backoffice-service; we never see the call directly
notification.suggest_reply.v1Inbound-reply suggestionsowned by inbox-service; result reaches us only when staff submits a notifications.create
notification.lint_template.v1Template lintcalled by bff-backoffice-service
notification.classify_bounce_reason.v1Bounce reason enrichmentwe publish a small fact-event to a Pub/Sub topic; orchestrator returns a labelled event we consume non-critically

Why this indirection? It keeps model selection, prompt safety, model cost accounting, and content-policy enforcement in one platform layer; we just consume strongly-typed outputs.

The orchestrator request always carries the requesting service identity, tenant id, capability key, and a purposeOfProcessing claim used for downstream audit and DPA compliance (08 §5).


3. AIProvenance attached to every AI-derived artefact

interface AIProvenanceRef {
draftId: string; // orchestrator-issued
capabilityKey: string; // 'notification.draft_personalised_body.v1' etc.
modelHandle: string; // opaque, orchestrator-managed (e.g., 'or:llm:openai:gpt-4o-mini@2026-01')
promptTemplateRef: string; // versioned prompt ref maintained by orchestrator
inputDigest: string; // sha256 of canonical input
outputDigest: string; // sha256 of canonical output
safety: {
classification: 'green'|'yellow'|'red';
flagged?: string[]; // categories that fired
};
hitl?: {
approvalRequestId?: string;
approverUserId?: string;
approvedAt?: string;
rationale?: string;
};
cost: {
inputTokens: number;
outputTokens: number;
estimatedUsdMicro: string; // bigint string
};
generatedAt: string;
}

This object is persisted on:

  • notifications.ai_provenance (per-message personalisation)
  • template_versions.ai_provenance (per-version drafting/translation)
  • the corresponding notification.requested.v1, template.published.v1 events

Provenance is required for any artefact that originated from an AI tool. Missing provenance fails the publish/enqueue use case with MELMASTOON.AI.PROVENANCE_MISSING.


4. HITL gate

4.1 What gates send

ArtefactAlways-on gateConditional gate
AI-drafted template version for any categoryyes — PublishTemplateVersionUseCase requires approverUserId when source='ai_drafted'n/a
AI machine-translated locale of an existing templateyes — the new locale lands as status='draft' and must be reviewed before activationn/a
Per-message AI personalisationconditional on tenant policy aiPersonalisationdisabled (never), suggest_only (HITL via in-app review queue before send), auto_send (no per-message HITL — but only categories marketing and reminder allowed; provenance still attached)
AI-suggested reply to inbound guest messageyes (always) — staff selects a suggestion explicitlyn/a
AI rewrite tone of staff proseimplicit — staff is the author and reviews the rewriten/a

4.2 The suggest_only review queue

When tenant policy is suggest_only and an AI-personalised body is produced for a candidate notification:

EnqueueNotificationUseCase
→ orchestrator returns RenderedMessage with AIProvenance
→ instead of inserting Notification(status='queued'),
insert Notification(status='scheduled', scheduledFor=null, suppressed=false) + draftHold=true
→ emit melmastoon.notification.scheduled.v1 reason='suggest_only_review'
→ emit a separate in-app notification to tenant admins / property managers:
templateKey='ai.review.required.inapp', payload includes draftId and a review URL
→ Staff reviewer either approves (status→queued, dispatch worker takes over)
or rejects (status→suppressed reason='hitl_rejected', emit suppressed.v1)
or edits → orchestrator records the human edit as an outputOverride;
new variant gets a fresh AIProvenance.hitl block with rationale

A queued review is auto-cancelled if the underlying source event is invalidated (e.g., reservation cancelled) — preventing embarrassing post-cancellation sends.

4.3 Time-bound HITL

Reviews older than tenant.aiPolicy.hitlReviewTtlHours (default 24h, configurable up to 168h) auto-fail-safe to:

  • transactional categories: never auto-approve; deterministic fallback template is sent instead.
  • marketing categories: auto-cancel (suppressed reason='hitl_expired').
  • reminder categories: deterministic fallback template (no AI personalisation) is sent.

5. Safety, content policy, and prompt injection

The orchestrator runs platform-level safety; we additionally enforce:

ConcernControl
Prompt injection via guest names / variablesAll variables passed to the orchestrator are framed as structured JSON inputs, not concatenated prose. Variables marked untrusted=true (guest.notes, guest.specialRequests.freeText) are instructed in the prompt to be treated as data, not instructions. The orchestrator's safety layer also strips/encodes these before the model sees them.
Guest data leakage between tenantsWe never call the orchestrator with cross-tenant context. The orchestrator's tenant-id claim is asserted on every call; mismatched outputs are rejected.
PII minimisationWe send only the minimum context needed: locale, channel, category, template key, allowed variables (declared in variablesSchema). We never send full guest profiles or other notifications' bodies.
Brand-safety / hotel-domain guardrailsPrompts include a system block forbidding promotional tone in transactional categories, forbidding price quotes that aren't in the variables, forbidding mentions of competitors, and forbidding any content that looks like medical / legal / financial advice.
Hallucination of money or datesThe model is instructed to only echo numbers/dates that appear in the input variables. Output is post-validated: any digit sequence in the output that is not present in the canonical input variables fails validation and the deterministic template is used instead.
Content-policy classification (safety.classification)red outputs are rejected and emit a failed.v1 with reason='render_error'; yellow requires HITL irrespective of tenant policy.
RTL/LTR correctnessOutput is checked for matching direction and balanced bidi marks; mismatch logs a warning and the deterministic template is used.

6. Multilingual translation pipeline

Backoffice → "Translate this template version into ur-PK"

bff-backoffice-service → ai-orchestrator-service.translate_template_locale.v1
↓ (async)
orchestrator emits melmastoon.ai.draft_content.ready.v1
{ tenantId, draftId, purpose='notification.template.copy',
templateKey, locale='ur-PK', channel, body, subject, aiProvenance }

notification-service.RegisterAIDraftedTemplateUseCase
→ appends a TemplateVersion(source='ai_drafted', status='draft', locales={'ur-PK': {...}})
→ triggers an in-app notification to translation reviewers (using our own service!)

Reviewer opens preview → edits if needed → POST /publish with approverUserId
→ PublishTemplateVersionUseCase enforces approver presence; emits template.published.v1

The reviewer UI shows side-by-side: the canonical English (en-US) on one side, the AI-translated target locale on the other, with diff highlights and a "previously translated" reference (the prior active version of the same locale, if any). The reviewer can spot-check by clicking "send to my own number" (test-send route from API_CONTRACTS §4.7).

Locale fallback chain (per DOMAIN_MODEL §2 LOCALE_FALLBACK) means a missing translation never hard-fails a send: we render the next-best locale and emit a soft warning event melmastoon.notification.locale_fallback_used.v1 (currently informational-only) so platform admins see translation gaps.


7. AI in inbound reply suggestion (guest replies to SMS/WhatsApp)

The inbox-service owns inbound parsing. When a guest replies to one of our outbound notifications, inbox-service:

  1. Loads the original Notification (via reply-token / Twilio in-reply-to / WhatsApp message id) for context.
  2. Calls ai-orchestrator-service.notification.suggest_reply.v1 with: outbound body (rendered), guest reply text, locale, sentiment, recommended next-action set.
  3. Returns up to 3 ranked suggestions to the staff inbox UI.

When staff chooses a suggestion, the BFF calls our POST /api/v1/notifications with the chosen body (and aiProvenance reflecting the suggestion they picked). Staff CAN edit before sending; the edit is recorded as hitl.rationale='staff_edit' in provenance.

Sentiment escalation: when the orchestrator tags the inbound reply as negative with confidence ≥ 0.85, the inbox UI surfaces an "Escalate to manager" action — and the suggested reply is never auto-marked safe; it always requires explicit staff confirmation, even on tenants with tone-rewrite "fast-send" enabled.


8. Cost & quotas

The orchestrator publishes melmastoon.ai.usage.recorded.v1 for every call attributed to notification-service. We do not budget at our boundary; instead:

  • Tenant admins see a per-month AI cost panel under "Notifications → AI usage" (rendered by the BFF; reads events_operational.ai_usage in BigQuery).
  • Tenant policy can cap per-month USD for AI-personalisation in marketing; once exceeded, future enqueue with aiPersonalisation=requested falls back to deterministic body and emits melmastoon.notification.ai_budget_exhausted.v1 (log-only, no consumer in v1).
  • Soft budget warnings at 75 % / 90 % / 100 % delivered as in-app notifications to OWNER and BILLING_ADMIN.

9. Data privacy and DPA compliance

  • The orchestrator's data-processing boundary is regional and is colocated with our regional Postgres (gcp-asia-south1 and gcp-me-central1). Cross-region model calls are blocked by VPC-SC (07 §6).
  • Per-tenant data sovereignty toggle: tenant.aiPolicy.allowExternalModels=false forces the orchestrator to use only models hosted in-region with no logging at the model provider; on tenants with the toggle off, AI capabilities for those tenants are disabled (UI greyed out).
  • Per-Recipient consent: notification_preferences.aiConsent (additive opt-in) governs whether AI personalisation MAY be applied to that recipient's outbound messages. Missing or false → deterministic only. Default for new recipients is false for marketing; true (with platform default) for transactional/operational/reminder since the AI capability there is wording-only and content invariants are deterministic-checked.
  • Right to erasure: iam.user.deleted.v1 propagates to the orchestrator's drafts ledger; provenance rows on our side become unjoinable to the source person after crypto-shred (we keep outputDigest for audit but cannot reconstruct the input).

10. Bias and quality guardrails

  • Locale parity reviews: every quarter, a sampling job compares delivery acceptance rates and reply sentiment across locales; statistically significant disparities trigger a tracked review action (SERVICE_RISK_REGISTER row R-NTF-12).
  • Tone consistency: the lint tool grades a tenant's templates against a chosen brand tone vector; periodic drift produces an alert.
  • Fairness in marketing: AI-personalised marketing copy must not vary the offer (price, discount %, eligibility) by recipient — the model is constrained to vary only wording. Validation happens in the orchestrator (the offer is a structured field that is not exposed to the language tool).
  • Human escape hatch: every AI-mediated UI surface ("personalised draft", "reply suggestion", "tone rewrite") has a "Use deterministic body" / "Write yourself" affordance with one click.

11. Telemetry

Per-call AI telemetry (emitted by orchestrator, joined to our notifications via draftId):

MetricNotes
notif.ai.calls_total{capability,model,outcome,tenant}counter
notif.ai.latency_seconds{capability}histogram
notif.ai.tokens_input/output_total{capability,model}counter
notif.ai.cost_usd_micro_total{tenant,capability}counter
notif.ai.safety_rejections_total{reason}counter
notif.ai.hitl_decisions_total{decision,category}counter — decision ∈ {approved,rejected,edited,expired}
notif.ai.fallback_to_deterministic_total{reason}counter

Dashboards in OBSERVABILITY §6. Alerts:

  • AI safety-rejection rate > 1% over 30 min → page on-call.
  • HITL queue depth > 200 for any tenant > 1h → page tenant ops.
  • AI fallback rate > 25% on any capability for > 1h → ticket (signals model regression or prompt drift).

12. Disable / kill-switch

Two kill-switches:

  • Per-tenant (tenant.aiPolicy.enabled=false) — all AI capabilities disabled for that tenant within seconds (Memorystore policy cache TTL 30s).
  • Platform-global feature flag notifications.ai.enabled=false (LaunchDarkly / Cloud Run env propagated via tenant-service config) — disables all AI capabilities platform-wide. Used for incident response.

When disabled, the deterministic template path runs unmodified; there is no degradation to non-AI sends.