AI_INTEGRATION — notification-service
Sibling: DOMAIN_MODEL · APPLICATION_LOGIC · SECURITY_MODEL · OBSERVABILITY
Strategic anchors: 08 AI Architecture · 07 Security/Compliance/Tenancy · 02 Enterprise Architecture §10 AI Integration
notification-service is an AI consumer, never an AI runner. All model calls are routed through ai-orchestrator-service, which owns prompt templates, model selection, key handling, content safety, cost accounting, and telemetry. We never embed an LLM SDK, we never hold a model API key, and we never call Vertex / OpenAI / Anthropic directly. Every AI-derived artefact carries an AIProvenance record and (for guest-facing send) crosses a Human-in-the-Loop (HITL) gate before dispatch.
1. AI capabilities consumed
| Capability | Surface | Model class (orchestrator-selected) | HITL? | Latency budget |
|---|---|---|---|---|
| AI-drafted personalised messages (per-recipient body for transactional + reminder) | Inline during EnqueueNotificationUseCase via AIClient.fetchAIDraftedContent (sync, p99 ≤ 600 ms) or async ai.draft_content.ready.v1 | low-cost chat (e.g., gpt-4o-mini class) | per-tenant policy: disabled | suggest_only | auto_send; default suggest_only for new tenants; auto_send only for marketing and reminder categories | inline 600 ms; async unbounded |
| Multilingual translation (rendering body in a guest's locale when no human translation exists) | Async via orchestrator at TemplateVersion publish time; produces a draft locale | high-quality translation model | yes — every machine-translated locale must be approved by a tenant translator/reviewer before becoming active | n/a (offline) |
| Tone control for staff-authored ad-hoc sends (warm / professional / apologetic / promotional) | Backoffice UI calls orchestrator's "rewrite with tone" tool; result returned to staff for review, then submitted via notifications.create | low-cost chat | implicit — staff reviews the rewrite themselves before submit | sub-second UX |
| Sentiment-aware response suggestions for inbound guest replies (SMS-IN, WhatsApp inbound, email reply-to) | When inbound webhook arrives at inbox-service, it calls orchestrator's "suggest reply" tool and posts suggestions into the staff thread; if staff selects, they call POST /api/v1/notifications to send | mid-tier chat with structured output | yes — staff explicitly picks a suggestion; nothing auto-sends | sub-second per suggestion |
| Template lint / quality score at draft time (readability, grammar, missing variables, RTL/LTR mistakes) | Backoffice UI calls orchestrator's "lint" tool; we display warnings on the template editor | low-cost chat | n/a — advisory only | sub-second |
Variable extraction from staff prose (staff types "Send Ahmed his confirmation in Pashto with a 10% loyalty offer"; we extract {guest, locale, promo}) | Backoffice UI utility | low-cost chat with JSON-mode | yes — staff confirms parsed slots before submit | sub-second |
| Suppression-reason classifier for free-text vendor bounce diagnostics that don't map cleanly to enums | Async classifier in ProcessVendorWebhookUseCase (best-effort enrichment; never gates persistence) | small classifier | no | non-blocking |
We do not use AI for: send/no-send decisions on transactional categories (deterministic only), choosing a vendor (rule-based via Channel.fallbackVendor and health), routing across channels (preference-based), or anything that affects security/compliance categories.
2. Routing through ai-orchestrator-service
All capabilities above are exposed by ai-orchestrator-service as named tools with versioned contracts:
| Tool name (orchestrator-side) | Purpose | We invoke via |
|---|---|---|
notification.draft_personalised_body.v1 | Generate per-recipient body | AIClient.fetchAIDraftedContent(draftId) (we hold a draftId from a prior async hint, OR we invoke synchronously by capability key) |
notification.translate_template_locale.v1 | Translate a template version's body into a target locale | event-driven workflow; orchestrator emits ai.draft_content.ready.v1 → we run RegisterAIDraftedTemplateUseCase |
notification.rewrite_tone.v1 | Tone rewrite | called by bff-backoffice-service; we never see the call directly |
notification.suggest_reply.v1 | Inbound-reply suggestions | owned by inbox-service; result reaches us only when staff submits a notifications.create |
notification.lint_template.v1 | Template lint | called by bff-backoffice-service |
notification.classify_bounce_reason.v1 | Bounce reason enrichment | we publish a small fact-event to a Pub/Sub topic; orchestrator returns a labelled event we consume non-critically |
Why this indirection? It keeps model selection, prompt safety, model cost accounting, and content-policy enforcement in one platform layer; we just consume strongly-typed outputs.
The orchestrator request always carries the requesting service identity, tenant id, capability key, and a purposeOfProcessing claim used for downstream audit and DPA compliance (08 §5).
3. AIProvenance attached to every AI-derived artefact
interface AIProvenanceRef {
draftId: string; // orchestrator-issued
capabilityKey: string; // 'notification.draft_personalised_body.v1' etc.
modelHandle: string; // opaque, orchestrator-managed (e.g., 'or:llm:openai:gpt-4o-mini@2026-01')
promptTemplateRef: string; // versioned prompt ref maintained by orchestrator
inputDigest: string; // sha256 of canonical input
outputDigest: string; // sha256 of canonical output
safety: {
classification: 'green'|'yellow'|'red';
flagged?: string[]; // categories that fired
};
hitl?: {
approvalRequestId?: string;
approverUserId?: string;
approvedAt?: string;
rationale?: string;
};
cost: {
inputTokens: number;
outputTokens: number;
estimatedUsdMicro: string; // bigint string
};
generatedAt: string;
}
This object is persisted on:
notifications.ai_provenance(per-message personalisation)template_versions.ai_provenance(per-version drafting/translation)- the corresponding
notification.requested.v1,template.published.v1events
Provenance is required for any artefact that originated from an AI tool. Missing provenance fails the publish/enqueue use case with MELMASTOON.AI.PROVENANCE_MISSING.
4. HITL gate
4.1 What gates send
| Artefact | Always-on gate | Conditional gate |
|---|---|---|
| AI-drafted template version for any category | yes — PublishTemplateVersionUseCase requires approverUserId when source='ai_drafted' | n/a |
| AI machine-translated locale of an existing template | yes — the new locale lands as status='draft' and must be reviewed before activation | n/a |
| Per-message AI personalisation | conditional on tenant policy aiPersonalisation | disabled (never), suggest_only (HITL via in-app review queue before send), auto_send (no per-message HITL — but only categories marketing and reminder allowed; provenance still attached) |
| AI-suggested reply to inbound guest message | yes (always) — staff selects a suggestion explicitly | n/a |
| AI rewrite tone of staff prose | implicit — staff is the author and reviews the rewrite | n/a |
4.2 The suggest_only review queue
When tenant policy is suggest_only and an AI-personalised body is produced for a candidate notification:
EnqueueNotificationUseCase
→ orchestrator returns RenderedMessage with AIProvenance
→ instead of inserting Notification(status='queued'),
insert Notification(status='scheduled', scheduledFor=null, suppressed=false) + draftHold=true
→ emit melmastoon.notification.scheduled.v1 reason='suggest_only_review'
→ emit a separate in-app notification to tenant admins / property managers:
templateKey='ai.review.required.inapp', payload includes draftId and a review URL
→ Staff reviewer either approves (status→queued, dispatch worker takes over)
or rejects (status→suppressed reason='hitl_rejected', emit suppressed.v1)
or edits → orchestrator records the human edit as an outputOverride;
new variant gets a fresh AIProvenance.hitl block with rationale
A queued review is auto-cancelled if the underlying source event is invalidated (e.g., reservation cancelled) — preventing embarrassing post-cancellation sends.
4.3 Time-bound HITL
Reviews older than tenant.aiPolicy.hitlReviewTtlHours (default 24h, configurable up to 168h) auto-fail-safe to:
transactionalcategories: never auto-approve; deterministic fallback template is sent instead.marketingcategories: auto-cancel (suppressedreason='hitl_expired').remindercategories: deterministic fallback template (no AI personalisation) is sent.
5. Safety, content policy, and prompt injection
The orchestrator runs platform-level safety; we additionally enforce:
| Concern | Control |
|---|---|
| Prompt injection via guest names / variables | All variables passed to the orchestrator are framed as structured JSON inputs, not concatenated prose. Variables marked untrusted=true (guest.notes, guest.specialRequests.freeText) are instructed in the prompt to be treated as data, not instructions. The orchestrator's safety layer also strips/encodes these before the model sees them. |
| Guest data leakage between tenants | We never call the orchestrator with cross-tenant context. The orchestrator's tenant-id claim is asserted on every call; mismatched outputs are rejected. |
| PII minimisation | We send only the minimum context needed: locale, channel, category, template key, allowed variables (declared in variablesSchema). We never send full guest profiles or other notifications' bodies. |
| Brand-safety / hotel-domain guardrails | Prompts include a system block forbidding promotional tone in transactional categories, forbidding price quotes that aren't in the variables, forbidding mentions of competitors, and forbidding any content that looks like medical / legal / financial advice. |
| Hallucination of money or dates | The model is instructed to only echo numbers/dates that appear in the input variables. Output is post-validated: any digit sequence in the output that is not present in the canonical input variables fails validation and the deterministic template is used instead. |
Content-policy classification (safety.classification) | red outputs are rejected and emit a failed.v1 with reason='render_error'; yellow requires HITL irrespective of tenant policy. |
| RTL/LTR correctness | Output is checked for matching direction and balanced bidi marks; mismatch logs a warning and the deterministic template is used. |
6. Multilingual translation pipeline
Backoffice → "Translate this template version into ur-PK"
↓
bff-backoffice-service → ai-orchestrator-service.translate_template_locale.v1
↓ (async)
orchestrator emits melmastoon.ai.draft_content.ready.v1
{ tenantId, draftId, purpose='notification.template.copy',
templateKey, locale='ur-PK', channel, body, subject, aiProvenance }
↓
notification-service.RegisterAIDraftedTemplateUseCase
→ appends a TemplateVersion(source='ai_drafted', status='draft', locales={'ur-PK': {...}})
→ triggers an in-app notification to translation reviewers (using our own service!)
↓
Reviewer opens preview → edits if needed → POST /publish with approverUserId
→ PublishTemplateVersionUseCase enforces approver presence; emits template.published.v1
The reviewer UI shows side-by-side: the canonical English (en-US) on one side, the AI-translated target locale on the other, with diff highlights and a "previously translated" reference (the prior active version of the same locale, if any). The reviewer can spot-check by clicking "send to my own number" (test-send route from API_CONTRACTS §4.7).
Locale fallback chain (per DOMAIN_MODEL §2 LOCALE_FALLBACK) means a missing translation never hard-fails a send: we render the next-best locale and emit a soft warning event melmastoon.notification.locale_fallback_used.v1 (currently informational-only) so platform admins see translation gaps.
7. AI in inbound reply suggestion (guest replies to SMS/WhatsApp)
The inbox-service owns inbound parsing. When a guest replies to one of our outbound notifications, inbox-service:
- Loads the original
Notification(via reply-token / Twilio in-reply-to / WhatsApp message id) for context. - Calls
ai-orchestrator-service.notification.suggest_reply.v1with: outbound body (rendered), guest reply text, locale, sentiment, recommended next-action set. - Returns up to 3 ranked suggestions to the staff inbox UI.
When staff chooses a suggestion, the BFF calls our POST /api/v1/notifications with the chosen body (and aiProvenance reflecting the suggestion they picked). Staff CAN edit before sending; the edit is recorded as hitl.rationale='staff_edit' in provenance.
Sentiment escalation: when the orchestrator tags the inbound reply as negative with confidence ≥ 0.85, the inbox UI surfaces an "Escalate to manager" action — and the suggested reply is never auto-marked safe; it always requires explicit staff confirmation, even on tenants with tone-rewrite "fast-send" enabled.
8. Cost & quotas
The orchestrator publishes melmastoon.ai.usage.recorded.v1 for every call attributed to notification-service. We do not budget at our boundary; instead:
- Tenant admins see a per-month AI cost panel under "Notifications → AI usage" (rendered by the BFF; reads
events_operational.ai_usagein BigQuery). - Tenant policy can cap per-month USD for AI-personalisation in
marketing; once exceeded, future enqueue withaiPersonalisation=requestedfalls back to deterministic body and emitsmelmastoon.notification.ai_budget_exhausted.v1(log-only, no consumer in v1). - Soft budget warnings at 75 % / 90 % / 100 % delivered as in-app notifications to
OWNERandBILLING_ADMIN.
9. Data privacy and DPA compliance
- The orchestrator's data-processing boundary is regional and is colocated with our regional Postgres (
gcp-asia-south1andgcp-me-central1). Cross-region model calls are blocked by VPC-SC (07 §6). - Per-tenant data sovereignty toggle:
tenant.aiPolicy.allowExternalModels=falseforces the orchestrator to use only models hosted in-region with no logging at the model provider; on tenants with the toggle off, AI capabilities for those tenants are disabled (UI greyed out). - Per-
Recipientconsent:notification_preferences.aiConsent(additive opt-in) governs whether AI personalisation MAY be applied to that recipient's outbound messages. Missing orfalse→ deterministic only. Default for new recipients isfalseformarketing;true(with platform default) fortransactional/operational/remindersince the AI capability there is wording-only and content invariants are deterministic-checked. - Right to erasure:
iam.user.deleted.v1propagates to the orchestrator's drafts ledger; provenance rows on our side become unjoinable to the source person after crypto-shred (we keepoutputDigestfor audit but cannot reconstruct the input).
10. Bias and quality guardrails
- Locale parity reviews: every quarter, a sampling job compares delivery acceptance rates and reply sentiment across locales; statistically significant disparities trigger a tracked review action (SERVICE_RISK_REGISTER row R-NTF-12).
- Tone consistency: the lint tool grades a tenant's templates against a chosen brand tone vector; periodic drift produces an alert.
- Fairness in marketing: AI-personalised marketing copy must not vary the offer (price, discount %, eligibility) by recipient — the model is constrained to vary only wording. Validation happens in the orchestrator (the offer is a structured field that is not exposed to the language tool).
- Human escape hatch: every AI-mediated UI surface ("personalised draft", "reply suggestion", "tone rewrite") has a "Use deterministic body" / "Write yourself" affordance with one click.
11. Telemetry
Per-call AI telemetry (emitted by orchestrator, joined to our notifications via draftId):
| Metric | Notes |
|---|---|
notif.ai.calls_total{capability,model,outcome,tenant} | counter |
notif.ai.latency_seconds{capability} | histogram |
notif.ai.tokens_input/output_total{capability,model} | counter |
notif.ai.cost_usd_micro_total{tenant,capability} | counter |
notif.ai.safety_rejections_total{reason} | counter |
notif.ai.hitl_decisions_total{decision,category} | counter — decision ∈ {approved,rejected,edited,expired} |
notif.ai.fallback_to_deterministic_total{reason} | counter |
Dashboards in OBSERVABILITY §6. Alerts:
- AI safety-rejection rate > 1% over 30 min → page on-call.
- HITL queue depth > 200 for any tenant > 1h → page tenant ops.
- AI fallback rate > 25% on any capability for > 1h → ticket (signals model regression or prompt drift).
12. Disable / kill-switch
Two kill-switches:
- Per-tenant (
tenant.aiPolicy.enabled=false) — all AI capabilities disabled for that tenant within seconds (Memorystore policy cache TTL 30s). - Platform-global feature flag
notifications.ai.enabled=false(LaunchDarkly / Cloud Run env propagated viatenant-serviceconfig) — disables all AI capabilities platform-wide. Used for incident response.
When disabled, the deterministic template path runs unmodified; there is no degradation to non-AI sends.