Skip to main content

DEPLOYMENT_TOPOLOGY — billing-service

All workloads run on Google Cloud Platform in the same region as the tenant data plane. The service ships as four runtime artifacts produced from one repository, three of them on Cloud Run and one as a Cloud Run Job.

1. Workloads

WorkloadRuntimePurposeMin / Max replicasConcurrency
billing-apiCloud Run serviceREST API + event consumers (in-process subscribers)3 / 5080
billing-outbox-drainerCloud Run serviceDrains _outbox → Pub/Sub for both per-tenant and central schemas2 / 101 (single-flight per shard)
billing-tenant-migratorCloud Run JobPer-tenant schema provisioning + forward DDL on tenant.created.v1 and on releaseinvoked per tenantn/a
billing-subscription-cycleCloud Run Job (Cloud Scheduler)Monthly subscription billing cycle per tenant fanoutinvoked monthly + retryn/a
billing-cash-analytics-jobCloud Run Job (Cloud Scheduler)Nightly daily reconciliation + AI cash pattern detectionnightlyn/a

2. Region & data residency

  • Primary region: asia-south1 (Mumbai) — same as Cloud SQL primary.
  • DR region: europe-west3 (Frankfurt) — Cloud SQL cross-region replica + Cloud Run secondary deployments held in standby.
  • Cross-region failover is operator-driven; RTO ≤ 15 min, RPO ≤ 1 min.
  • Per-tenant data residency overrides (e.g., a Saudi tenant requesting me-central1) are reserved for v2 — see SERVICE_RISK_REGISTER.

3. Networking

  • All Cloud Run services share a single VPC connector and serverless VPC access.
  • Cloud SQL reachable via Private IP only (no public IP).
  • Pub/Sub via private Google access.
  • mTLS within the VPC for service-to-service (payment-gateway-service, iam-service, notification-service, file-storage-service, ai-orchestrator-service).

4. Containers

  • Base image: distroless gcr.io/distroless/nodejs22-debian12.
  • Multi-stage Dockerfile: install deps with pnpm install --frozen-lockfile, build tsc, prune dev deps, ship dist/.
  • Image size target: ≤ 200 MB.
  • SBOM produced at build (syft); grype scan blocks on critical CVEs.
  • Image signed with cosign; Cloud Run admission policy verifies signature.

5. Configuration

  • Configuration is environment-driven via Workload Identity-mounted secrets:
    • DB_HOST, DB_PORT, DB_USER, DB_PASSWORD (via Secret Manager)
    • PUBSUB_PROJECT, PUBSUB_OUTBOX_TOPICS (CSV)
    • IAM_JWKS_URL, IAM_STEPUP_AUDIENCE=billing-service
    • AI_ORCHESTRATOR_URL, AI_BUDGET_DEFAULT_QPS
    • FILE_STORAGE_BUCKET_PATTERN=billing-invoices-{tenantId}
    • OBSERVABILITY_OTLP_ENDPOINT
  • Tenant-specific config lives in tenant-service (tenant.settings.billing.*) and is fetched per request through the TenantSettingsClient with a 5-minute Memorystore cache.

6. Resource sizing

WorkloadCPUMemoryNotes
billing-api2 vCPU2 GiBPDF render shares the request worker; cap at 2 concurrent renders per instance
billing-outbox-drainer1 vCPU512 MiBone shard per per-tenant schema chunk + 1 shard for central
billing-tenant-migrator1 vCPU512 MiBruns to completion
billing-subscription-cycle2 vCPU1 GiBparallel fanout per region with bounded worker pool (16)
billing-cash-analytics-job2 vCPU1 GiBnightly window
Cloud SQL Postgres 168 vCPU / 32 GiB starter, scale on IOdedicated SSDHA + cross-region replica + PITR 7d

7. Release & deployment

  • Trunk-based on main; feature branches → PR → merge queue.
  • Cloud Build pipeline: lint → typecheck → unit → application → integration → contract → build image → cosign sign → push → deploy to staging → smoke E2E → manual gate to prod.
  • Prod deployment uses Cloud Run revision traffic split for canary: 5% → 25% → 100% with 10-min soak between steps; an SLO burn alert auto-rolls-back.
  • Per-tenant migrator runs before rolling the API revision when the migration adds columns or constraints; backward-compatible migrations only (the standard "expand → cleanup" pattern across 2 releases).
  • Cloud Scheduler jobs paused during release; resumed once billing-api revision reaches 100%.

8. Pub/Sub topology

TopicPartitionsSubscribersRetentionDLQ
melmastoon.billing.foliotenant-attribute partitioninganalytics, reporting, audit, sync, bff-backoffice7 dyes
melmastoon.billing.invoicesamenotification, analytics, reporting, audit7 dyes
melmastoon.billing.cash_drawersameaudit, reporting, bff-backoffice, notification7 dyes
melmastoon.billing.subscriptiontenant + globaltenant-service, notification, analytics, reporting, audit7 dyes
melmastoon.billing.usagetenantanalytics, reporting, billing3 dyes
<topic>.dlqn/aon-call manual triage14 dn/a

9. Health checks

  • /healthz — liveness; returns 200 if process is up.
  • /readyz — readiness; checks Cloud SQL connectivity, Pub/Sub publisher init, Memorystore reachability; returns 503 on degraded dependency to drain traffic.
  • /version — returns git sha + build time.

10. Auto-scaling

  • billing-api scales on concurrency (target 60) and CPU > 60%.
  • billing-outbox-drainer scales on outbox lag custom metric (target ≤ 10 s).
  • Cloud Scheduler jobs scale internally via worker pool.

11. Cost guardrails

  • Per-tenant Cloud SQL storage budget alert at 80% of plan-included quota.
  • Pub/Sub publish cost dashboard split per topic.
  • Cloud Run min-instance bill capped via FinOps review; emergency override via runbook.

12. Cross-references