Skip to main content

Notification Service — Deployment Topology

Status: populated Owner: SRE Last updated: 2026-04-18

1. Runtime

Node.js 22 LTS, TypeScript 5.x strict.
NestJS 10 + @nestjs/platform-fastify.
Image: distroless Node, multi-stage build, non-root.
Listen port: 3030 HTTP (health + internal API), 9464 metrics.

2. Kubernetes

Object	Value
Kind	Deployment
Replicas	2 (min) / HPA 2–4
HPA triggers	CPU 70%, custom metric `notif_nats_consumer_lag` (target 2000 across subjects)
Rolling update	`maxSurge=1, maxUnavailable=0`
PDB	`minAvailable=1`
Resources	`requests: 200m CPU / 256Mi`; `limits: 500m CPU / 512Mi`
ServiceAccount	`notification-service` (Vault K8s auth)

3. Dependencies at Runtime

Dep	Mode	Notes
PostgreSQL	Primary writer + reader	Connection pool 10 per pod
Redis	Cluster	Recipient cache only (`notif:recipients:*`, TTL 300s)
NATS JetStream	3-node cluster	4 durable consumers (one per subject)
SendGrid API	External HTTPS	API key from Vault
sms-orchestrator	HTTP (internal K8s DNS)	For SMS channel delivery
auth-service	HTTP (internal K8s DNS)	Recipient resolution
Vault	Sidecar	API keys + DB credentials

4. Network Policy

Ingress: admin-dashboard pod only (port 3030).
Egress: NATS cluster, PG, Redis, SendGrid (external), sms-orchestrator (internal), auth-service (internal).
/metrics accessible from Prometheus namespace.
No ingress from Kong or external internet.

5. Scaling Model

Event volume is low relative to the SMS pipeline (notifications are a small fraction of total events).
CPU-bound on Mjml HTML compilation for email.
SendGrid and sms-orchestrator are the throughput limits, not this service.
Expected peak: ~1000 notifications/hour during invoice generation cron run.

6. Rollout

Standard rolling update (no canary needed for internal-only service).
Template changes deployed via admin-dashboard without service restart.
Rollback: revert Deployment image tag.

1. Runtime
2. Kubernetes
3. Dependencies at Runtime
4. Network Policy
5. Scaling Model
6. Rollout