Skip to main content

Billing Service — Deployment Topology

Status: populated Owner: SRE Last updated: 2026-04-18

1. Runtime

  • Node.js 22 LTS, TypeScript 5.x strict.
  • NestJS 10 + @nestjs/platform-fastify.
  • Image: distroless Node, multi-stage build, non-root.
  • Listen ports: 3020 HTTP, 9464 metrics.

2. Kubernetes

ObjectValue
KindDeployment
Replicas2 (min) / HPA 2–6
HPA triggersCPU 70%, custom metric billing_nats_consumer_lag (target 5000)
Rolling updatemaxSurge=1, maxUnavailable=0
PDBminAvailable=1
Resourcesrequests: 300m CPU / 512Mi; limits: 1 CPU / 1Gi
ServiceAccountbilling-service (Vault K8s auth)

3. Dependencies at Runtime

DepModeNotes
PostgreSQLPrimary writer + read replica for queriesConnection pool 15 per pod
RedisClusterPricing cache only; key billing:pricing:*
NATS JetStream3-node clusterDurable consumer billing-consumer on BILLING_EVENTS
S3-compatible storeHTTP SDKInvoice PDF storage; bucket ghasi-invoices
VaultSidecarDB credentials, S3 credentials, NATS credentials

4. Kong Routes

RouteUpstreamScope
GET /v1/billing/usagebilling-service:3020billing:read
GET /v1/billing/invoicesbilling-service:3020billing:read
GET /v1/billing/invoices/{id}/downloadbilling-service:3020billing:read
POST /v1/admin/invoices/{id}/voidbilling-service:3020platform.finance
* /v1/admin/pricingbilling-service:3020billing:admin
* /v1/admin/operator-costsbilling-service:3020billing:admin

5. Invoice Cron

Kubernetes CronJob in the same namespace:

  • Schedule: 5 0 1 * * (00:05 UTC on 1st of each month)
  • Image: same billing-service image with CMD ["node", "dist/cmd/generate-invoices.js"]
  • restartPolicy: OnFailure; backoffLimit: 2
  • Concurrency: Forbid (prevent double run if previous not done)

6. Scaling Model

  • Ingest path (NATS consumer): I/O-bound on PG + Redis; scales with consumer lag.
  • REST API: lightweight; minimal scaling needed.
  • Invoice cron: single-pod, not latency-sensitive; bounded by number of accounts.

7. Rollout

  • Canary: 10% traffic for 30 min (REST API path only; NATS consumer not canary-able).
  • Promote if: ingestion error rate < 0.1%, REST API P95 ≤ 300 ms.
  • Rollback: revert Deployment image tag; cron CronJob image tag.