Notification Service — Deployment Topology
Status: populated
Owner: SRE
Last updated: 2026-04-18
1. Runtime
- Node.js 22 LTS, TypeScript 5.x strict.
- NestJS 10 +
@nestjs/platform-fastify.
- Image: distroless Node, multi-stage build, non-root.
- Listen port:
3030 HTTP (health + internal API), 9464 metrics.
2. Kubernetes
| Object | Value |
|---|
| Kind | Deployment |
| Replicas | 2 (min) / HPA 2–4 |
| HPA triggers | CPU 70%, custom metric notif_nats_consumer_lag (target 2000 across subjects) |
| Rolling update | maxSurge=1, maxUnavailable=0 |
| PDB | minAvailable=1 |
| Resources | requests: 200m CPU / 256Mi; limits: 500m CPU / 512Mi |
| ServiceAccount | notification-service (Vault K8s auth) |
3. Dependencies at Runtime
| Dep | Mode | Notes |
|---|
| PostgreSQL | Primary writer + reader | Connection pool 10 per pod |
| Redis | Cluster | Recipient cache only (notif:recipients:*, TTL 300s) |
| NATS JetStream | 3-node cluster | 4 durable consumers (one per subject) |
| SendGrid API | External HTTPS | API key from Vault |
| sms-orchestrator | HTTP (internal K8s DNS) | For SMS channel delivery |
| auth-service | HTTP (internal K8s DNS) | Recipient resolution |
| Vault | Sidecar | API keys + DB credentials |
4. Network Policy
- Ingress: admin-dashboard pod only (port 3030).
- Egress: NATS cluster, PG, Redis, SendGrid (external), sms-orchestrator (internal), auth-service (internal).
/metrics accessible from Prometheus namespace.
- No ingress from Kong or external internet.
5. Scaling Model
- Event volume is low relative to the SMS pipeline (notifications are a small fraction of total events).
- CPU-bound on Mjml HTML compilation for email.
- SendGrid and sms-orchestrator are the throughput limits, not this service.
- Expected peak: ~1000 notifications/hour during invoice generation cron run.
6. Rollout
- Standard rolling update (no canary needed for internal-only service).
- Template changes deployed via admin-dashboard without service restart.
- Rollback: revert Deployment image tag.