DEPLOYMENT_TOPOLOGY — pricing-service
Sibling: OBSERVABILITY · SECURITY_MODEL · FAILURE_MODES
Strategic anchors: 02 §12 Deployment · ADR-0001 Core stack
The pricing-service is a stateless Node.js (NestJS, TypeScript) workload deployed on GCP. It runs as a containerised service on Cloud Run for the public/admin HTTP surface and as a separate deployment on GKE Autopilot for long-running consumers (Pub/Sub pull subscribers, outbox relay, cron). Both surfaces share the same image with different entrypoints. We never run on a non-GCP cloud and we never run pricing data in serverless edge runtimes (state proximity matters too much).
1. Environments
| Env | Region | Project | Purpose |
|---|---|---|---|
dev | me-central2 | melmastoon-dev | shared developer environment |
staging | me-central2 | melmastoon-staging | release-candidate, full data shape |
prod | me-central2 (primary) + europe-west4 (DR read replica) | melmastoon-prod | live |
local | n/a | n/a | docker-compose; see LOCAL_DEV_SETUP |
Each environment has its own Cloud SQL instance, Memorystore, Pub/Sub topics, and KMS keyring. No cross-env access.
2. Topology
┌─────────────────────────┐
│ Cloud Armor + GCLB │
└────────────┬────────────┘
│ TLS 1.3
┌────────────▼────────────┐
│ API Gateway (Apigee) │ → JWT validation, rate-limit, X-Tenant-Id
└────┬─────────────┬──────┘
┌──────────────▼─────┐ ┌────▼────────────────────┐
│ pricing-public │ │ pricing-admin │ Cloud Run services
│ (Cloud Run, scale │ │ (Cloud Run, scale 1-20) │
│ 2-200, vCPU 2, │ │ vCPU 1, mem 1 Gi) │
│ mem 2 Gi) │ │ │
└────────┬───────────┘ └────────┬────────────────┘
│ │
│ mTLS (workload identity)
│ │
▼ ▼
┌────────────────────────────────────────┐
│ Cloud SQL Postgres 16 HA (regional) │
│ schemas: pricing, pricing_quote │
└────────────────────────────────────────┘
▲ ▲
│ │
┌────────┴───────────────────────┴────────┐
│ Memorystore Redis 7 (HA) │
└────────────────────────────────────────┘
▲
│
┌────────┴────────────────────────────────┐
│ GKE Autopilot — pricing-workers ns: │
│ • outbox-relay (Pub/Sub publisher) │
│ • inbox-consumer (4 subscriptions) │
│ • quote-expiry-cron (every 5 m) │
│ • fx-refresh-cron (hourly) │
│ • dynamic-suggestion-batch (nightly) │
└─────────────────────────────────────────┘
▲
│
┌────────┴────────────┐
│ Pub/Sub topics: │
│ melmastoon.pricing.*│
└─────────────────────┘
Internal service-mesh: GKE Anthos Service Mesh (Istio); Cloud Run services participate via Cloud Run + ASM integration. mTLS is STRICT mesh-wide.
3. Container image
- Base:
gcr.io/distroless/nodejs20-debian12 - Built via Cloud Build, signed with Binary Authorization (Sigstore cosign).
- SBOM (CycloneDX) generated and attached on every build.
- Multi-arch: linux/amd64 only (Cloud Run + GKE Autopilot are amd64).
- Image size target: < 180 MB compressed.
Two entrypoints in the image:
ENTRYPOINT ["node", "dist/main.js"]
# CMD selectable via env: SERVICE_ROLE = http | worker
4. Cloud Run services
pricing-public (handles /v1/pricing/* and /internal/v1/quotes:*)
| Setting | Value |
|---|---|
| Min instances | 2 |
| Max instances | 200 |
| Concurrency per instance | 80 |
| CPU | 2 vCPU (always allocated) |
| Memory | 2 Gi |
| Timeout | 10 s |
| Ingress | internal+lb (gateway only) |
| VPC connector | yes (to reach Cloud SQL via Private IP, Memorystore, AI orchestrator) |
| Service account | pricing-public@melmastoon-prod.iam.gserviceaccount.com |
| Startup CPU boost | enabled |
| Health probes | startup /v1/livez, liveness /v1/livez, readiness /v1/readyz |
pricing-admin (handles /v1/admin/*)
| Setting | Value |
|---|---|
| Min instances | 1 |
| Max instances | 20 |
| Concurrency | 40 |
| CPU | 1 vCPU |
| Memory | 1 Gi |
5. GKE Autopilot — pricing-workers namespace
| Workload | Replicas | CPU req | Mem req | Notes |
|---|---|---|---|---|
outbox-relay | 3 | 250 m | 512 Mi | leader-elects via Postgres advisory lock; only one active publisher per shard |
inbox-consumer | 4 | 500 m | 1 Gi | each replica subscribes to all 4 inbox subscriptions; Pub/Sub handles fan-out |
quote-expiry-cron | 1 | 250 m | 512 Mi | CronJob */5 * * * * |
fx-refresh-cron | 1 | 250 m | 512 Mi | CronJob 5 * * * * (hourly, offset 5 m) |
dynamic-suggestion-batch | 1 | 500 m | 1 Gi | CronJob 0 2 * * * (02:00 me-central2) |
All workloads use Workload Identity bound to the pricing-workers@… Google service account.
6. Data plane
| Component | Spec |
|---|---|
| Cloud SQL Postgres 16 | db-custom-8-32768 (8 vCPU, 32 Gi) primary; HA enabled; 2 read replicas in me-central2; cross-region read replica in europe-west4 for DR |
| Memorystore Redis 7 | Standard tier, 5 Gi, HA, AUTH enabled |
| Pub/Sub | per-aggregate topics (see EVENT_SCHEMAS); CMEK enabled; per-topic DLQ subscription |
| GCS | bucket gs://melmastoon-prod-pricing-events/ for BigQuery sink staging; lifecycle rule: delete after 90 d |
| BigQuery | dataset melmastoon_prod.pricing for event mirror + analytics |
| Cloud KMS | keyring melmastoon-prod-pricing with keys db, redis, pubsub, gcs; auto-rotation 365 d |
7. Networking
- VPC
melmastoon-prod-vpc, subnetme-central2-pricing-subnet(10.40.16.0/20). - Cloud SQL via Private IP only; no public IP.
- Memorystore reachable only inside VPC.
- Egress to AI orchestrator over private service connect.
- Egress to FX provider through a NAT with a static IP, allow-listed by the provider.
- Cloud Armor policy
pricing-edgewith WAF rules: SQLi, XSS, generic header validation, geo-allowAF, TJ, IR, AE, QA, SA, OM, KW, EU, US (admin only).
8. CI/CD
- GitHub Actions in
melmastoon/platform-monorepo, workflowservices/pricing-service/.github/workflows/ci.yml. - Path filter: any change under
services/pricing-service/**orpackages/pricing-engine/**. - Stages:
lint → typecheck → unit → property → integration → contracts → openapi-conformance → image-build → trivy-scan → push → terraform-plan → cloud-deploy. - Promotion:
dev(auto on merge tomain),staging(auto with smoke tests),prod(manual approval after 24h staging soak). - Rollback: Cloud Run revision rollback in < 30 s; GKE workers rollback via
kubectl rollout undo.
9. Disaster recovery
| Asset | RPO | RTO | Strategy |
|---|---|---|---|
| Cloud SQL primary | 5 m | 15 m | Cross-region read replica + automated promotion runbook |
| Pub/Sub | 0 (in-region replicated) | n/a | Multi-zone by default |
| Application image | 0 | 5 m | Multi-region Artifact Registry mirror |
| Configuration | 0 | 5 m | Terraform state in GCS bucket with versioning |
| KMS keys | 0 | n/a | Multi-region keyrings |
Quarterly DR drill executes the failover runbook against staging.