DEPLOYMENT_TOPOLOGY — reservation-service
Sibling: LOCAL_DEV_SETUP · OBSERVABILITY · FAILURE_MODES
Strategic anchors: 02 §12 GCP Reference Topology · 04 §11 Pub/Sub topology
reservation-service runs on Google Cloud Run (managed, regional) with the platform's standard NestJS base image. It ships as two Cloud Run services — the request-handling API and the hold-expiry worker — to keep their scaling and IAM boundaries independent.
1. Runtime
| Property | Value |
|---|---|
| Language | TypeScript |
| Runtime | Node 20 LTS |
| Framework | NestJS 10 (composition root only; domain framework-free) |
| Container base | gcr.io/melmastoon-platform/node-20-distroless:<sha> |
| Boot script | node --enable-source-maps dist/main.js (telemetry initialized first) |
| Health endpoints | GET /internal/health (liveness), GET /internal/ready (readiness) |
| Graceful shutdown | SIGTERM → drain in-flight HTTP and inbox handlers; max 30 s |
2. Cloud Run services
2.1 reservation-service (API + inbox handlers)
| Setting | Value |
|---|---|
| Region | me-central1 (primary) and asia-south1 (active for region-pinned tenants) |
| Min replicas | 3 per region (hot path; eliminates cold start during business hours) |
| Max replicas | 30 per region |
| Concurrency per instance | 80 |
| CPU | 2 vCPU (always-allocated) |
| Memory | 1 GiB |
| VPC connector | melmastoon-private-connector |
| Egress | private VPC (Cloud SQL, Memorystore, Pub/Sub via private endpoints) |
| Ingress | internal + load-balancer (Kong upstream) |
| Authentication | IAM (service-to-service); Pub/Sub push principal whitelisted; Cloud Scheduler principal whitelisted |
| Service account | reservation-svc@<project>.iam.gserviceaccount.com |
2.2 reservation-hold-expiry-worker (separate Cloud Run job)
| Setting | Value |
|---|---|
| Schedule | Cloud Scheduler */30 * * * * * (every 30 s) |
| Trigger | HTTPS POST to /internal/jobs/expire-holds on a dedicated single-replica Cloud Run service (not job) — chosen for steady-state warm cache |
| Min replicas | 1 |
| Max replicas | 1 (single writer for the sweeper batch) |
| CPU | 1 vCPU (CPU-on-request) |
| Memory | 512 MiB |
| Service account | reservation-holds-sweeper@<project>.iam.gserviceaccount.com (RLS-bypass on reservation_holds only) |
The sweeper is intentionally separated so its IAM scope is narrower and its outage cannot pin API capacity. It always runs single-replica to avoid two sweepers contending for the same hold row.
3. Infrastructure dependencies
| Dependency | Provisioning |
|---|---|
| Cloud SQL Postgres 15 (HA primary + read replica) | Shared instance with other PMS-core services; schema reservation; per-service IAM database users |
| Memorystore (Redis 7) | Shared with PMS-core for hot caches; namespaced keys reservation:* |
| GCP Pub/Sub | One topic per produced subject; pull subscriptions for inbox; DLQs per subscription; ordering enabled per <tenantId>:<aggregateId> ordering key |
| Cloud KMS | Per-tenant CMK ring melmastoon-tenants for guest field-level DEKs |
| Secret Manager | tenantSalt per tenant (HMAC for hash-for-search); no payment or lock secrets |
| Cloud Scheduler | reservation-hold-expiry-30s job |
| Cloud Storage | None (we hold only mediaId references) |
| VPC Service Controls | Service is in the melmastoon-prod-perimeter VPC-SC perimeter; egress to non-perimeter Pub/Sub blocked |
4. Network topology
Internet ──► Kong (Cloud Run) ──► reservation-service (Cloud Run, internal+LB)
│
├── Cloud SQL (private endpoint)
├── Memorystore (VPC connector)
├── Pub/Sub (private service connect)
├── Cloud KMS (private endpoint)
└── Secret Manager (private endpoint)
Pub/Sub push ──► reservation-service `/internal/events/*` (IAM-gated)
Cloud Scheduler ──► reservation-hold-expiry-worker `/internal/jobs/expire-holds` (IAM-gated)
There is no direct public ingress to reservation-service. The only public surface is via bff-tenant-booking-service and bff-backoffice-service, fronted by Kong and Cloudflare.
5. Deploy & release
| Stage | Mechanism |
|---|---|
| Build | GitHub Actions → Cloud Build → distroless image; SBOM + Cosign signature attached |
| Image registry | Artifact Registry gcr.io/melmastoon-platform/reservation-service:<git-sha> |
| Migrations | drizzle-kit push runs as a Cloud Build step before Cloud Run revision rollout; backwards-compatible only |
| Canary | 5% traffic split for 30 minutes; abort and roll back on alert ladder (RESV-001..010) firing for 10 min |
| Rollback | gcloud run services update-traffic --to-revisions=<prev>=100; image stays in registry |
| Promotion | Manual gate from staging to prod; tagged release notes link to PRs |
Helm/Terraform module references: terraform/modules/cloud-run-service and terraform/modules/cloud-scheduler-job from melmastoon-infra.
6. Resource sizing rationale
- Min 3 replicas (API): the booking saga is a hot synchronous path; a single cold start would push p99 above the 5 s SLO. Three replicas survive a single-AZ blip and absorb burst from morning check-in spikes.
- Concurrency 80: Drizzle pool size scales linearly with concurrency × instances; with 80 × 3 = 240 max in-flight queries, the Postgres pool is sized at 60 connections per instance with overflow blocking.
- Single sweeper: the hold-expiry batch is bounded (≤ 100 holds per pass typical) and idempotent; concurrency would only add coordination overhead.
7. Region & residency
me-central1(Doha): primary region; serves Afghan, Iranian (where lawful), GCC, Tajik tenants by default.asia-south1(Mumbai): secondary; serves South Asia tenants.- Tenant pinning is read from
tenant.region; cross-region writes are blocked at the connection middleware. Cross-region reads are allowed only foraudit-serviceandanalytics-service.
8. Cross-references
- GCP topology overview: 02 §12
- Pub/Sub naming and DLQs: 04 §11
- Secrets and KMS: SECURITY_MODEL §5
- Local stack: LOCAL_DEV_SETUP