SECURITY_MODEL — notification-service
Sibling: API_CONTRACTS · DATA_MODEL · AI_INTEGRATION · OBSERVABILITY
Strategic anchors: 07 Security/Compliance/Tenancy · 02 Enterprise Architecture §11 Security · ADR-0002 Multi-Tenancy
The notification path is one of the highest-risk surfaces in the platform: it touches guest PII, sends messages on behalf of tenants, holds sender-IDs and channel credentials, and is a frequent phishing/abuse vector. The controls below assume an honest-but-curious staff and a hostile internet.
1. Trust boundaries
┌──────────────────────────────────────────────────────┐
│ Internet (vendors, end-users) │
│ - Vendor webhook callers (HMAC, no JWT) │
│ - End-user opt-out token bearers │
└──────────────────┬───────────────────────────────────┘
│ TLS 1.3 + WAF
▼
┌──────────────────────────────────────────────────────┐
│ GCP Cloud Load Balancer + Cloud Armor │
│ - DDoS protection, IP throttle, geo rules │
│ - WAF managed rules │
└──────────────────┬───────────────────────────────────┘
│ mTLS (vendor mesh) / TLS+JWT (BFFs)
▼
┌──────────────────────────────────────────────────────┐
│ notification-service Cloud Run │
│ - REST + WS API │
│ - Pub/Sub subscribers │
│ - Background workers │
└─┬────────────────┬───────────────┬──────────────────┬┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ Cloud SQL│ │ Memorystore│ │ GCS │ │ Secret Mgr │
│ Postgres │ │ Redis │ │ (CMEK) │ │ (vendor │
│ (CMEK) │ │ (in VPC) │ │ │ │ creds) │
└──────────┘ └────────────┘ └────────────┘ └────────────┘
▲ │
│ RLS-enforced session │
│ ▼
┌──────────────────┐ ┌──────────────────┐
│ Cloud KMS CMEK │ │ vendor APIs │
│ per region/env │ │ (egress via NAT) │
└──────────────────┘ └──────────────────┘
All east-west traffic between services rides over the internal VPC with mTLS issued by iam-service's SPIFFE/SPIRE intermediate. North-south egress to vendors goes through Cloud NAT with a fixed egress IP per region (so vendors can allowlist us).
2. AuthN
| Caller class | Mechanism | Issued by | Lifetime |
|---|---|---|---|
Backoffice staff via bff-backoffice-service | OAuth2/OIDC → JWT (RS256) with aud='notification-service' | iam-service | 15 min access; 8 h refresh |
Guest via bff-tenant-booking-service (in-app feed, opt-out) | Anonymous booking session token (JWT, scoped) | iam-service (anonymous flow) | 30 min sliding |
| Internal services (iam, billing, tenant, reservation, etc.) | mTLS + SPIFFE id + service JWT | iam-service SPIRE | 1 h |
| Vendors (webhooks) | HMAC-SHA256 over raw body + per-vendor secret stored in Secret Manager | per-vendor | rotated quarterly |
| End users (opt-out token) | bearer token in URL; lookup by sha256(token) against opt_out_tokens | this service | 30 days |
JWTs are validated against the JWKS published by iam-service (cached 10 min, revalidated on kid miss). Token replays beyond 60 s skew are rejected.
For the WebSocket feed (WS /api/v1/notifications/feed/stream), the JWT is passed in Sec-WebSocket-Protocol; we verify before accepting 101 Switching Protocols. Mid-stream re-auth happens on heartbeat; expired token closes the socket with code 4401.
3. AuthZ — RBAC + scope claims
Roles are issued by iam-service per tenant; we evaluate them against route scopes. The full matrix lives in API_CONTRACTS §12; operational summary:
| Scope | Granted to roles |
|---|---|
notifications:read | OWNER, GM, FRONT_DESK, RESERVATIONS, MARKETING_MANAGER, SUPPORT |
notifications:write | OWNER, GM, FRONT_DESK, RESERVATIONS, SUPPORT (transactional/operational only); MARKETING_MANAGER for marketing category |
notifications:batch | OWNER, GM, MARKETING_MANAGER |
notifications:feed.read.self | any authenticated user (guest or staff) for own feed |
notifications:feed.read.any | OWNER, GM, FRONT_DESK, SUPPORT (with property scope) |
notifications:preferences.write.self | the user themselves |
notifications:preferences.write.any | OWNER, GM, SUPPORT with documented consent (audit-tagged) |
notifications:templates.read | OWNER, GM, MARKETING_MANAGER, FRONT_DESK |
notifications:templates.write | OWNER, GM, MARKETING_MANAGER |
notifications:templates.publish | OWNER, GM, MARKETING_MANAGER (HITL-gated for ai_drafted source) |
notifications:channels.read | OWNER, GM, PLATFORM_ENG |
notifications:channels.write | OWNER, PLATFORM_ENG |
notifications:channels.rotate | OWNER, PLATFORM_ENG |
notifications:internal | PLATFORM_ENG only; available only via internal mesh |
notifications:suppressions.read/write | OWNER, GM, COMPLIANCE_OFFICER |
notifications:suppressions.release | OWNER, COMPLIANCE_OFFICER (4-eyes via approval ticket for hard bounces older than 30 days) |
ABAC overlay: X-Property-Id is cross-checked against iam.property_assignments for staff scopes (read/write per property). Deny by default.
4. Database isolation (RLS + roles)
Three Postgres roles in melmastoon_notification:
| Role | Capabilities |
|---|---|
melmastoon_notification_app | SELECT/INSERT/UPDATE/DELETE on tenant-scoped tables; no BYPASSRLS; sets app.tenant_id per connection |
melmastoon_notification_dba | BYPASSRLS; granted only via break-glass (audit-tagged); MFA-required |
melmastoon_notification_ro_analytics | SELECT on materialised views with PII redacted; used by Datastream/BigQuery sink |
Every connection sets SET LOCAL app.tenant_id = '<tenantId>' at the start of every transaction (Drizzle middleware). RLS policies (see DATA_MODEL §3) reject rows with mismatched tenant_id. Cross-tenant queries by the app role are impossible.
CMEK on:
- Cloud SQL data at rest (per-region key in Cloud KMS).
- Cloud SQL backups.
- GCS buckets for rendered/webhooks/attachments.
- Memorystore (TLS in transit; no on-disk keys; in-memory only).
KMS keys rotate every 90 days; rotation is automated; old key versions retained for 5 years to decrypt historical backups.
5. PII handling
| PII class | Where it lives | Encryption | Access |
|---|---|---|---|
| Guest email / phone (recipient address) | recipient_addresses.address_ciphertext (envelope-encrypted DEK + KMS KEK), plus address_hash | yes — envelope; key per tenant; KEK in KMS | server-side decrypt only when sending; never in logs/events; never in BigQuery; never on Electron |
| Vendor message id | delivery_attempts.vendor_message_id | none (opaque) | service + analytics |
| Guest name (display) | recipients.display_name | none (low risk; needed for staff UI) | RLS-bound |
| Free-text notes attached to ad-hoc sends | not persisted; only present in render_snapshot | server-side only; never exfiltrated to AI without explicit opt-in | RLS-bound |
| Webhook raw bodies (may contain user agents, IPs) | gs://…/webhooks/** | CMEK on bucket | restricted to platform engineers; access audited |
Logging policy: addresses are NEVER logged in plaintext; only addressKindHash and recipient_id. A pre-commit log-scanner (pii-grep) blocks PRs that introduce log lines containing @ or phone-number-shaped strings outside an allow-list. CI also runs a Loki query to assert recent prod logs hold no plaintext PII (sample-based).
Crypto-shred on iam.user.deleted.v1: we destroy the per-tenant DEK that wraps the user's address ciphertexts; the row remains for suppression continuity but is unreadable.
6. Vendor credentials
- All vendor API keys / OAuth client secrets / WhatsApp Business tokens / FCM service account JSONs live in Secret Manager (regional, CMEK).
- A
ChannelCredentialrow stores only asecret_ref(secretmanager://projects/.../secrets/.../versions/N); the plaintext is fetched at adapter-init time and held in process memory. - Rotation:
POST /api/v1/notification-channels/{id}/credentials/{credId}/rotatewrites a new Secret Manager version, sets the newChannelCredentialtoactive, leaves the prior onesupersededfor 24 h overlap. After 24 h the superseded version isrevokedand the secret version isdisabled. - Secret access audit: every
Secret Manager AccessSecretVersioncall is exported to BigQuery (audit.secret_access) and tagged withrequesterServiceAccount.
Access surface:
- Cloud Run service account:
notification-service-runtime@<project>.iam.gserviceaccount.com—roles/secretmanager.secretAccessoronly onnotification-*secrets. - Tenant admins NEVER see plaintext credentials. The "create credential" UI accepts a one-shot input, returns success or error, and only echoes back the
secret_refand last-4 fingerprint thereafter.
7. Webhook security
- Each vendor has a dedicated route
POST /api/v1/webhooks/vendors/{vendor}with a per-vendor HMAC verifier picked from a registry. - The verifier reads the raw body before body parsing; SignedHeader + body bytes are concatenated per vendor algorithm; comparison is constant-time.
- Replay window: most vendors include a timestamp in headers; we reject
> 5 minskew (rejected with 401, no body persisted). - Even if an attacker sends a replay within window, the application-level dedupe (
vendor + vendorMessageId + type + occurredAt) prevents double-applying state. - Cloud Armor rule: 1000 req/s/vendor source; 429 above; large bodies (>1 MB) rejected at LB.
For SendGrid / Mailgun / SES / Twilio / Infobip / Meta WhatsApp / FCM, the vendor-specific HMAC algorithms and headers are codified in src/infrastructure/webhooks/<vendor>/verifier.ts. A contract test (tests/contracts/webhooks/<vendor>.test.ts) exercises positive and negative samples per vendor.
8. Opt-out tokens
- Generated as 256-bit random; URL-safe base64;
sha256stored, plaintext only in the URL. - Bound to
(recipient_id, channel, tenant_id); single use; 30-day expiry. - The opt-out endpoint
POST /api/v1/notification-preferences/opt-out/:tokenis public and rate-limited (30 req/min/IP),Cache-Control: no-store. - Successful opt-out emits
notification.opted_out.v1; the corresponding suppression row prevents future sends. - CSRF: the opt-out is a single explicit POST gated by a one-tap confirmation page; we do not rely on session cookies.
9. Rate limiting and abuse control
| Surface | Limit | Action when exceeded |
|---|---|---|
POST /api/v1/notifications | 100 req/s/tenant; 5 req/s/IP from staff origin | 429 with Retry-After; bulk patterns trigger Cloud Armor backoff |
POST /api/v1/notifications/batch | 5 req/min/tenant; max 1 active per category | 429 |
POST /api/v1/notifications/{id}/resend | 3 resends per notification per 24 h | 429 |
| Per-recipient per-day across all channels | tenant-policy default 5; tenant-configurable up to 20 | suppress with reason='rate_limit' |
| WhatsApp marketing per recipient | 1 per 24 h (Meta policy) | suppress |
| Vendor webhooks | 1000 req/s/vendor IP | 429 |
| Opt-out endpoint | 30 req/min/IP | 429; sustained → Cloud Armor block |
| WS feed connections | 1 per recipient | refuse second connect; existing wins |
Repeat 401/429 from a single IP (>50/min) triggers a Cloud Armor adaptive rule for 1h.
10. Sender-ID and brand impersonation
- A
Senderis bound to aChanneland validated against per-tenant verification records:- Email: DKIM/SPF for the
fromEmail.domainmust verify (verified out-of-band viatenant-serviceDNS workflow).EnqueueNotificationUseCaserejects sends whose Sender domain is not verified. - SMS PK/AF:
senderIdmust reference aregistrationRef(PTA/regulator approval). Missing → 422. - WhatsApp:
whatsappPhoneNumberIdmust match Meta's verified number for the tenant.
- Email: DKIM/SPF for the
- Tenant-on-tenant brand impersonation: a tenant cannot configure a Sender that uses another tenant's verified domain or sender id. The verification table is unique per
(domain | senderId | phoneNumberId). - Display-name spoof checks:
Sender.fromNamecannot exactly match well-known platform names ("Melmastoon", "Ghasi", "Support", etc.) for tenant senders.
11. Output sanitisation
- Email HTML: rendered MJML output is sanitised through
dompurifyserver-side with an allowlist; inline scripts and event handlers are stripped before checksum. - SMS / WhatsApp / push bodies: stripped of control chars; emoji allowed; URL footers signed (HMAC tracking) so we can attribute clicks without exposing internal ids.
- Opt-out URLs and tracking pixels live on
https://*.notify.melmastoon.com(separate cookie-isolated domain). - Subject lines have a length cap per channel (
email <= 140, others n/a) enforced post-render.
12. Threat model (STRIDE summary)
| Threat | Attack | Mitigation |
|---|---|---|
| Spoofing | Forged inbound webhook | HMAC + replay-window + dedupe + Cloud Armor |
| Spoofing | Forged staff token | RS256 JWKS validation, tenant claim check, mTLS for service-to-service |
| Tampering | Modify rendered HTML in transit | renderSnapshot.checksum (sha256) + immutable GCS object versioning |
| Repudiation | Tenant denies sending a marketing blast | Outbox + DispatchBatch + template.published.v1 audit trail with publishedBy |
| Information disclosure | Cross-tenant data leak | RLS + app.tenant_id discipline + envelope encryption per tenant |
| Information disclosure | PII in logs | pii-grep pre-commit + sample log scan |
| Denial of service | Abuse of POST /notifications | per-route + per-tenant + per-recipient rate limits + CDN-side WAF |
| Denial of service | Vendor webhook flood | 1000 rps cap + early HMAC reject + persist-then-process |
| Elevation of privilege | Staff with RESERVATIONS triggering marketing batch | Scope check (notifications:batch requires MARKETING_MANAGER+) |
| Elevation of privilege | Tenant admin reading another tenant's templates | RLS + tenant claim check |
| AI prompt injection | Guest name carries ignore previous instructions | Structured input, untrusted=true mark, output validation against canonical input — see AI_INTEGRATION §5 |
| Phishing-by-template | Tenant authors a template that mimics a bank/login | Template lint flags suspicious patterns; OWNER required to publish; per-locale moderation queue for new tenants in first 30 days |
13. Compliance hooks
- GDPR Art 7 (consent) for marketing — recorded in
RecipientPreferences.marketingConsentwith timestamp + source. - GDPR Art 17 (erasure) — see crypto-shred path above.
- CAN-SPAM, CASL, UK PECR: every marketing email carries a physical postal address (from
tenant.profile.postalAddress) and a one-tap unsubscribe; marketing batch publish refuses if address is missing. - WhatsApp Business policy: marketing only with opt-in; transactional only via approved templates.
- Local SMS regulators: PK PTA registered sender-IDs; AF/IR transmission via licensed aggregators; templates in
marketingfor these markets require a registered short-code or business-name sender id. - Data residency: per-tenant region pinning enforced by routing (
tenants_local.region); cross-region replication only for DR backups within the same data-residency zone.
Every compliance-relevant action publishes an event consumed by audit-service (see EVENT_SCHEMAS published list).
14. Penetration test scope
This service is in scope for the platform's annual third-party pentest. Specific test cases:
- HMAC bypass on each vendor webhook route.
- Forced opt-out via guessable tokens.
- Cross-tenant template read via path manipulation.
- Cross-tenant suppression read.
- Prompt injection via guest-controlled variables.
- Brand impersonation via Sender configuration.
- DKIM bypass via misconfigured tenant override.
- Resend abuse (loops via webhook → resend).
- WS feed cross-tenant subscription.
Findings tracked in SERVICE_RISK_REGISTER with severity and time-bound remediation.