Skip to main content

Notification Service — Deployment Topology

Status: populated Owner: SRE Last updated: 2026-04-18

1. Runtime

  • Node.js 22 LTS, TypeScript 5.x strict.
  • NestJS 10 + @nestjs/platform-fastify.
  • Image: distroless Node, multi-stage build, non-root.
  • Listen port: 3030 HTTP (health + internal API), 9464 metrics.

2. Kubernetes

ObjectValue
KindDeployment
Replicas2 (min) / HPA 2–4
HPA triggersCPU 70%, custom metric notif_nats_consumer_lag (target 2000 across subjects)
Rolling updatemaxSurge=1, maxUnavailable=0
PDBminAvailable=1
Resourcesrequests: 200m CPU / 256Mi; limits: 500m CPU / 512Mi
ServiceAccountnotification-service (Vault K8s auth)

3. Dependencies at Runtime

DepModeNotes
PostgreSQLPrimary writer + readerConnection pool 10 per pod
RedisClusterRecipient cache only (notif:recipients:*, TTL 300s)
NATS JetStream3-node cluster4 durable consumers (one per subject)
SendGrid APIExternal HTTPSAPI key from Vault
sms-orchestratorHTTP (internal K8s DNS)For SMS channel delivery
auth-serviceHTTP (internal K8s DNS)Recipient resolution
VaultSidecarAPI keys + DB credentials

4. Network Policy

  • Ingress: admin-dashboard pod only (port 3030).
  • Egress: NATS cluster, PG, Redis, SendGrid (external), sms-orchestrator (internal), auth-service (internal).
  • /metrics accessible from Prometheus namespace.
  • No ingress from Kong or external internet.

5. Scaling Model

  • Event volume is low relative to the SMS pipeline (notifications are a small fraction of total events).
  • CPU-bound on Mjml HTML compilation for email.
  • SendGrid and sms-orchestrator are the throughput limits, not this service.
  • Expected peak: ~1000 notifications/hour during invoice generation cron run.

6. Rollout

  • Standard rolling update (no canary needed for internal-only service).
  • Template changes deployed via admin-dashboard without service restart.
  • Rollback: revert Deployment image tag.