Skip to main content

SECURITY_MODEL — notification-service

Sibling: API_CONTRACTS · DATA_MODEL · AI_INTEGRATION · OBSERVABILITY

Strategic anchors: 07 Security/Compliance/Tenancy · 02 Enterprise Architecture §11 Security · ADR-0002 Multi-Tenancy

The notification path is one of the highest-risk surfaces in the platform: it touches guest PII, sends messages on behalf of tenants, holds sender-IDs and channel credentials, and is a frequent phishing/abuse vector. The controls below assume an honest-but-curious staff and a hostile internet.


1. Trust boundaries

┌──────────────────────────────────────────────────────┐
│ Internet (vendors, end-users) │
│ - Vendor webhook callers (HMAC, no JWT) │
│ - End-user opt-out token bearers │
└──────────────────┬───────────────────────────────────┘
│ TLS 1.3 + WAF

┌──────────────────────────────────────────────────────┐
│ GCP Cloud Load Balancer + Cloud Armor │
│ - DDoS protection, IP throttle, geo rules │
│ - WAF managed rules │
└──────────────────┬───────────────────────────────────┘
│ mTLS (vendor mesh) / TLS+JWT (BFFs)

┌──────────────────────────────────────────────────────┐
│ notification-service Cloud Run │
│ - REST + WS API │
│ - Pub/Sub subscribers │
│ - Background workers │
└─┬────────────────┬───────────────┬──────────────────┬┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ Cloud SQL│ │ Memorystore│ │ GCS │ │ Secret Mgr │
│ Postgres │ │ Redis │ │ (CMEK) │ │ (vendor │
│ (CMEK) │ │ (in VPC) │ │ │ │ creds) │
└──────────┘ └────────────┘ └────────────┘ └────────────┘
▲ │
│ RLS-enforced session │
│ ▼
┌──────────────────┐ ┌──────────────────┐
│ Cloud KMS CMEK │ │ vendor APIs │
│ per region/env │ │ (egress via NAT) │
└──────────────────┘ └──────────────────┘

All east-west traffic between services rides over the internal VPC with mTLS issued by iam-service's SPIFFE/SPIRE intermediate. North-south egress to vendors goes through Cloud NAT with a fixed egress IP per region (so vendors can allowlist us).


2. AuthN

Caller classMechanismIssued byLifetime
Backoffice staff via bff-backoffice-serviceOAuth2/OIDC → JWT (RS256) with aud='notification-service'iam-service15 min access; 8 h refresh
Guest via bff-tenant-booking-service (in-app feed, opt-out)Anonymous booking session token (JWT, scoped)iam-service (anonymous flow)30 min sliding
Internal services (iam, billing, tenant, reservation, etc.)mTLS + SPIFFE id + service JWTiam-service SPIRE1 h
Vendors (webhooks)HMAC-SHA256 over raw body + per-vendor secret stored in Secret Managerper-vendorrotated quarterly
End users (opt-out token)bearer token in URL; lookup by sha256(token) against opt_out_tokensthis service30 days

JWTs are validated against the JWKS published by iam-service (cached 10 min, revalidated on kid miss). Token replays beyond 60 s skew are rejected.

For the WebSocket feed (WS /api/v1/notifications/feed/stream), the JWT is passed in Sec-WebSocket-Protocol; we verify before accepting 101 Switching Protocols. Mid-stream re-auth happens on heartbeat; expired token closes the socket with code 4401.


3. AuthZ — RBAC + scope claims

Roles are issued by iam-service per tenant; we evaluate them against route scopes. The full matrix lives in API_CONTRACTS §12; operational summary:

ScopeGranted to roles
notifications:readOWNER, GM, FRONT_DESK, RESERVATIONS, MARKETING_MANAGER, SUPPORT
notifications:writeOWNER, GM, FRONT_DESK, RESERVATIONS, SUPPORT (transactional/operational only); MARKETING_MANAGER for marketing category
notifications:batchOWNER, GM, MARKETING_MANAGER
notifications:feed.read.selfany authenticated user (guest or staff) for own feed
notifications:feed.read.anyOWNER, GM, FRONT_DESK, SUPPORT (with property scope)
notifications:preferences.write.selfthe user themselves
notifications:preferences.write.anyOWNER, GM, SUPPORT with documented consent (audit-tagged)
notifications:templates.readOWNER, GM, MARKETING_MANAGER, FRONT_DESK
notifications:templates.writeOWNER, GM, MARKETING_MANAGER
notifications:templates.publishOWNER, GM, MARKETING_MANAGER (HITL-gated for ai_drafted source)
notifications:channels.readOWNER, GM, PLATFORM_ENG
notifications:channels.writeOWNER, PLATFORM_ENG
notifications:channels.rotateOWNER, PLATFORM_ENG
notifications:internalPLATFORM_ENG only; available only via internal mesh
notifications:suppressions.read/writeOWNER, GM, COMPLIANCE_OFFICER
notifications:suppressions.releaseOWNER, COMPLIANCE_OFFICER (4-eyes via approval ticket for hard bounces older than 30 days)

ABAC overlay: X-Property-Id is cross-checked against iam.property_assignments for staff scopes (read/write per property). Deny by default.


4. Database isolation (RLS + roles)

Three Postgres roles in melmastoon_notification:

RoleCapabilities
melmastoon_notification_appSELECT/INSERT/UPDATE/DELETE on tenant-scoped tables; no BYPASSRLS; sets app.tenant_id per connection
melmastoon_notification_dbaBYPASSRLS; granted only via break-glass (audit-tagged); MFA-required
melmastoon_notification_ro_analyticsSELECT on materialised views with PII redacted; used by Datastream/BigQuery sink

Every connection sets SET LOCAL app.tenant_id = '<tenantId>' at the start of every transaction (Drizzle middleware). RLS policies (see DATA_MODEL §3) reject rows with mismatched tenant_id. Cross-tenant queries by the app role are impossible.

CMEK on:

  • Cloud SQL data at rest (per-region key in Cloud KMS).
  • Cloud SQL backups.
  • GCS buckets for rendered/webhooks/attachments.
  • Memorystore (TLS in transit; no on-disk keys; in-memory only).

KMS keys rotate every 90 days; rotation is automated; old key versions retained for 5 years to decrypt historical backups.


5. PII handling

PII classWhere it livesEncryptionAccess
Guest email / phone (recipient address)recipient_addresses.address_ciphertext (envelope-encrypted DEK + KMS KEK), plus address_hashyes — envelope; key per tenant; KEK in KMSserver-side decrypt only when sending; never in logs/events; never in BigQuery; never on Electron
Vendor message iddelivery_attempts.vendor_message_idnone (opaque)service + analytics
Guest name (display)recipients.display_namenone (low risk; needed for staff UI)RLS-bound
Free-text notes attached to ad-hoc sendsnot persisted; only present in render_snapshotserver-side only; never exfiltrated to AI without explicit opt-inRLS-bound
Webhook raw bodies (may contain user agents, IPs)gs://…/webhooks/**CMEK on bucketrestricted to platform engineers; access audited

Logging policy: addresses are NEVER logged in plaintext; only addressKindHash and recipient_id. A pre-commit log-scanner (pii-grep) blocks PRs that introduce log lines containing @ or phone-number-shaped strings outside an allow-list. CI also runs a Loki query to assert recent prod logs hold no plaintext PII (sample-based).

Crypto-shred on iam.user.deleted.v1: we destroy the per-tenant DEK that wraps the user's address ciphertexts; the row remains for suppression continuity but is unreadable.


6. Vendor credentials

  • All vendor API keys / OAuth client secrets / WhatsApp Business tokens / FCM service account JSONs live in Secret Manager (regional, CMEK).
  • A ChannelCredential row stores only a secret_ref (secretmanager://projects/.../secrets/.../versions/N); the plaintext is fetched at adapter-init time and held in process memory.
  • Rotation: POST /api/v1/notification-channels/{id}/credentials/{credId}/rotate writes a new Secret Manager version, sets the new ChannelCredential to active, leaves the prior one superseded for 24 h overlap. After 24 h the superseded version is revoked and the secret version is disabled.
  • Secret access audit: every Secret Manager AccessSecretVersion call is exported to BigQuery (audit.secret_access) and tagged with requesterServiceAccount.

Access surface:

  • Cloud Run service account: notification-service-runtime@<project>.iam.gserviceaccount.comroles/secretmanager.secretAccessor only on notification-* secrets.
  • Tenant admins NEVER see plaintext credentials. The "create credential" UI accepts a one-shot input, returns success or error, and only echoes back the secret_ref and last-4 fingerprint thereafter.

7. Webhook security

  • Each vendor has a dedicated route POST /api/v1/webhooks/vendors/{vendor} with a per-vendor HMAC verifier picked from a registry.
  • The verifier reads the raw body before body parsing; SignedHeader + body bytes are concatenated per vendor algorithm; comparison is constant-time.
  • Replay window: most vendors include a timestamp in headers; we reject > 5 min skew (rejected with 401, no body persisted).
  • Even if an attacker sends a replay within window, the application-level dedupe (vendor + vendorMessageId + type + occurredAt) prevents double-applying state.
  • Cloud Armor rule: 1000 req/s/vendor source; 429 above; large bodies (>1 MB) rejected at LB.

For SendGrid / Mailgun / SES / Twilio / Infobip / Meta WhatsApp / FCM, the vendor-specific HMAC algorithms and headers are codified in src/infrastructure/webhooks/<vendor>/verifier.ts. A contract test (tests/contracts/webhooks/<vendor>.test.ts) exercises positive and negative samples per vendor.


8. Opt-out tokens

  • Generated as 256-bit random; URL-safe base64; sha256 stored, plaintext only in the URL.
  • Bound to (recipient_id, channel, tenant_id); single use; 30-day expiry.
  • The opt-out endpoint POST /api/v1/notification-preferences/opt-out/:token is public and rate-limited (30 req/min/IP), Cache-Control: no-store.
  • Successful opt-out emits notification.opted_out.v1; the corresponding suppression row prevents future sends.
  • CSRF: the opt-out is a single explicit POST gated by a one-tap confirmation page; we do not rely on session cookies.

9. Rate limiting and abuse control

SurfaceLimitAction when exceeded
POST /api/v1/notifications100 req/s/tenant; 5 req/s/IP from staff origin429 with Retry-After; bulk patterns trigger Cloud Armor backoff
POST /api/v1/notifications/batch5 req/min/tenant; max 1 active per category429
POST /api/v1/notifications/{id}/resend3 resends per notification per 24 h429
Per-recipient per-day across all channelstenant-policy default 5; tenant-configurable up to 20suppress with reason='rate_limit'
WhatsApp marketing per recipient1 per 24 h (Meta policy)suppress
Vendor webhooks1000 req/s/vendor IP429
Opt-out endpoint30 req/min/IP429; sustained → Cloud Armor block
WS feed connections1 per recipientrefuse second connect; existing wins

Repeat 401/429 from a single IP (>50/min) triggers a Cloud Armor adaptive rule for 1h.


10. Sender-ID and brand impersonation

  • A Sender is bound to a Channel and validated against per-tenant verification records:
    • Email: DKIM/SPF for the fromEmail.domain must verify (verified out-of-band via tenant-service DNS workflow). EnqueueNotificationUseCase rejects sends whose Sender domain is not verified.
    • SMS PK/AF: senderId must reference a registrationRef (PTA/regulator approval). Missing → 422.
    • WhatsApp: whatsappPhoneNumberId must match Meta's verified number for the tenant.
  • Tenant-on-tenant brand impersonation: a tenant cannot configure a Sender that uses another tenant's verified domain or sender id. The verification table is unique per (domain | senderId | phoneNumberId).
  • Display-name spoof checks: Sender.fromName cannot exactly match well-known platform names ("Melmastoon", "Ghasi", "Support", etc.) for tenant senders.

11. Output sanitisation

  • Email HTML: rendered MJML output is sanitised through dompurify server-side with an allowlist; inline scripts and event handlers are stripped before checksum.
  • SMS / WhatsApp / push bodies: stripped of control chars; emoji allowed; URL footers signed (HMAC tracking) so we can attribute clicks without exposing internal ids.
  • Opt-out URLs and tracking pixels live on https://*.notify.melmastoon.com (separate cookie-isolated domain).
  • Subject lines have a length cap per channel (email <= 140, others n/a) enforced post-render.

12. Threat model (STRIDE summary)

ThreatAttackMitigation
SpoofingForged inbound webhookHMAC + replay-window + dedupe + Cloud Armor
SpoofingForged staff tokenRS256 JWKS validation, tenant claim check, mTLS for service-to-service
TamperingModify rendered HTML in transitrenderSnapshot.checksum (sha256) + immutable GCS object versioning
RepudiationTenant denies sending a marketing blastOutbox + DispatchBatch + template.published.v1 audit trail with publishedBy
Information disclosureCross-tenant data leakRLS + app.tenant_id discipline + envelope encryption per tenant
Information disclosurePII in logspii-grep pre-commit + sample log scan
Denial of serviceAbuse of POST /notificationsper-route + per-tenant + per-recipient rate limits + CDN-side WAF
Denial of serviceVendor webhook flood1000 rps cap + early HMAC reject + persist-then-process
Elevation of privilegeStaff with RESERVATIONS triggering marketing batchScope check (notifications:batch requires MARKETING_MANAGER+)
Elevation of privilegeTenant admin reading another tenant's templatesRLS + tenant claim check
AI prompt injectionGuest name carries ignore previous instructionsStructured input, untrusted=true mark, output validation against canonical input — see AI_INTEGRATION §5
Phishing-by-templateTenant authors a template that mimics a bank/loginTemplate lint flags suspicious patterns; OWNER required to publish; per-locale moderation queue for new tenants in first 30 days

13. Compliance hooks

  • GDPR Art 7 (consent) for marketing — recorded in RecipientPreferences.marketingConsent with timestamp + source.
  • GDPR Art 17 (erasure) — see crypto-shred path above.
  • CAN-SPAM, CASL, UK PECR: every marketing email carries a physical postal address (from tenant.profile.postalAddress) and a one-tap unsubscribe; marketing batch publish refuses if address is missing.
  • WhatsApp Business policy: marketing only with opt-in; transactional only via approved templates.
  • Local SMS regulators: PK PTA registered sender-IDs; AF/IR transmission via licensed aggregators; templates in marketing for these markets require a registered short-code or business-name sender id.
  • Data residency: per-tenant region pinning enforced by routing (tenants_local.region); cross-region replication only for DR backups within the same data-residency zone.

Every compliance-relevant action publishes an event consumed by audit-service (see EVENT_SCHEMAS published list).


14. Penetration test scope

This service is in scope for the platform's annual third-party pentest. Specific test cases:

  • HMAC bypass on each vendor webhook route.
  • Forced opt-out via guessable tokens.
  • Cross-tenant template read via path manipulation.
  • Cross-tenant suppression read.
  • Prompt injection via guest-controlled variables.
  • Brand impersonation via Sender configuration.
  • DKIM bypass via misconfigured tenant override.
  • Resend abuse (loops via webhook → resend).
  • WS feed cross-tenant subscription.

Findings tracked in SERVICE_RISK_REGISTER with severity and time-bound remediation.