Billing Service — Deployment Topology
Status: populated
Owner: SRE
Last updated: 2026-04-18
1. Runtime
- Node.js 22 LTS, TypeScript 5.x strict.
- NestJS 10 +
@nestjs/platform-fastify.
- Image: distroless Node, multi-stage build, non-root.
- Listen ports:
3020 HTTP, 9464 metrics.
2. Kubernetes
| Object | Value |
|---|
| Kind | Deployment |
| Replicas | 2 (min) / HPA 2–6 |
| HPA triggers | CPU 70%, custom metric billing_nats_consumer_lag (target 5000) |
| Rolling update | maxSurge=1, maxUnavailable=0 |
| PDB | minAvailable=1 |
| Resources | requests: 300m CPU / 512Mi; limits: 1 CPU / 1Gi |
| ServiceAccount | billing-service (Vault K8s auth) |
3. Dependencies at Runtime
| Dep | Mode | Notes |
|---|
| PostgreSQL | Primary writer + read replica for queries | Connection pool 15 per pod |
| Redis | Cluster | Pricing cache only; key billing:pricing:* |
| NATS JetStream | 3-node cluster | Durable consumer billing-consumer on BILLING_EVENTS |
| S3-compatible store | HTTP SDK | Invoice PDF storage; bucket ghasi-invoices |
| Vault | Sidecar | DB credentials, S3 credentials, NATS credentials |
4. Kong Routes
| Route | Upstream | Scope |
|---|
GET /v1/billing/usage | billing-service:3020 | billing:read |
GET /v1/billing/invoices | billing-service:3020 | billing:read |
GET /v1/billing/invoices/{id}/download | billing-service:3020 | billing:read |
POST /v1/admin/invoices/{id}/void | billing-service:3020 | platform.finance |
* /v1/admin/pricing | billing-service:3020 | billing:admin |
* /v1/admin/operator-costs | billing-service:3020 | billing:admin |
5. Invoice Cron
Kubernetes CronJob in the same namespace:
- Schedule:
5 0 1 * * (00:05 UTC on 1st of each month)
- Image: same billing-service image with
CMD ["node", "dist/cmd/generate-invoices.js"]
restartPolicy: OnFailure; backoffLimit: 2
- Concurrency:
Forbid (prevent double run if previous not done)
6. Scaling Model
- Ingest path (NATS consumer): I/O-bound on PG + Redis; scales with consumer lag.
- REST API: lightweight; minimal scaling needed.
- Invoice cron: single-pod, not latency-sensitive; bounded by number of accounts.
7. Rollout
- Canary: 10% traffic for 30 min (REST API path only; NATS consumer not canary-able).
- Promote if: ingestion error rate < 0.1%, REST API P95 ≤ 300 ms.
- Rollback: revert Deployment image tag; cron CronJob image tag.