property-service — DEPLOYMENT_TOPOLOGY
Companion: SERVICE_OVERVIEW · OBSERVABILITY · SECURITY_MODEL · FAILURE_MODES · ../../docs/02-enterprise-architecture.md §GCP Reference Topology
This document describes how property-service is built, packaged, deployed, scaled, and isolated on GCP. It is binding for platform SRE and the service team.
Cloud: GCP only. Desktop client referenced by sync is Electron, never Tauri.
1. Runtime
| Aspect | Value |
|---|---|
| Language / runtime | Node.js 20 LTS |
| Framework | NestJS 10 (HTTP + microservices module for Pub/Sub) |
| Container base | gcr.io/distroless/nodejs20-debian12 |
| Compute | Cloud Run (regional) for HTTP + sync internal RPC; Cloud Run Jobs for migrations and the nightly tenant-isolation auditor |
| Min/max instances | min 2 (warm path), max 40 per region |
| Concurrency | 80 per instance |
| CPU / memory | 1 vCPU / 1 GiB (read-dominant); 2 vCPU / 2 GiB for the photo+AI fan-in path |
| Startup CPU boost | enabled |
| Request timeout | 30 s (60 s for POST /properties/:id/rooms/bulk) |
| Egress | VPC connector (Serverless VPC Access) into the platform shared VPC |
The service runs on a dedicated Cloud Run service per region. There is no Kubernetes hosting today; the AsyncAPI- and OpenAPI-derived clients are portable, so a future move to GKE is a non-breaking infrastructure change.
2. Regions & Multi-Region Posture
| Tenant residency | Primary region | Read replica | Standby DR |
|---|---|---|---|
| AF / TJ / IR | me-central1 (Doha) | me-central1 HA replica | europe-west4 warm standby |
| EU | europe-west4 (Eemshaven) | europe-west4 HA replica | me-central1 warm standby |
| US (future) | us-central1 | us-central1 HA replica | europe-west4 warm standby |
- Routing.
tenant-servicestampstenantResidency; the global API gateway routes the request to the matching region. - Cross-region calls. Forbidden on the read/write path. Background reconciliation jobs may cross regions for analytics export only.
- DR mode. A regional outage triggers Cloud DNS failover to the warm standby Cloud Run service; the warm Cloud SQL replica is promoted manually via runbook (RTO 30 min, RPO 5 min).
3. Dependencies
| Dependency | Connection | Region scope |
|---|---|---|
| Cloud SQL Postgres 15 + PostGIS | Cloud SQL Auth Proxy (IAM auth) over Private IP | per-residency region; HA enabled |
| Memorystore Redis 7 | Private VPC, AUTH+TLS | same region |
| Pub/Sub | Workload Identity, regional Pub/Sub Lite topics for high-volume domain events; standard Pub/Sub for control-plane | per-region |
iam-service (JWKS, OPA bundle) | HTTPS to internal endpoint | same region (with global mirror) |
file-storage-service | HTTPS internal RPC | same region |
ai-orchestrator-service | HTTPS internal RPC | same region (the orchestrator decides further model-region routing) |
geo-service | HTTPS internal RPC | same region |
| Secret Manager | Workload Identity | global resource, regionally cached |
| Cloud Logging / Trace / Monitoring | OTel SDK → OTLP collector sidecar (in-cluster) → Cloud Operations | regional |
4. Service Account & IAM
Runtime SA: property-service@melmastoon.iam.gserviceaccount.com.
Granted roles (least privilege):
roles/cloudsql.client(Cloud SQL Auth Proxy)roles/cloudsql.instanceUser(IAM DB auth)roles/redis.viewer+ custommelmastoon.memorystore.connectroles/pubsub.publisheronmelmastoon.property.*topicsroles/pubsub.subscriberon the consumer subscriptions listed in EVENT_SCHEMASroles/secretmanager.secretAccessoron theproperty-*secret setroles/cloudtrace.agent,roles/logging.logWriter,roles/monitoring.metricWriterroles/iam.serviceAccountTokenCreator(signing internal JWTs for outbound calls — limited to theiam-serviceaudience)
Explicitly not granted: any *.editor, *.admin, roles/storage.* (photo bytes go through file-storage-service), roles/aiplatform.* (Vertex AI access goes through ai-orchestrator-service).
5. Configuration
Configuration sources (in order of precedence):
- Per-environment Cloud Run env vars (build-time pinned; minimal — feature flags and dimensions only)
- Secret Manager (DB password backup, Redis AUTH backup, JWT verifier overrides)
tenant-servicesettings (per-tenant feature flags pulled at request time and cached 60 s)
Required env vars:
NODE_ENV=production
SERVICE_NAME=property-service
SERVICE_VERSION=<git-sha>
REGION=me-central1
DB_INSTANCE=projects/melmastoon-prod/regions/me-central1/instances/property-pg
REDIS_INSTANCE=projects/melmastoon-prod/locations/me-central1/instances/property-cache
PUBSUB_PROJECT=melmastoon-prod
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.platform.svc.cluster.local:4317
LOG_LEVEL=info
Banned: any secret in env vars; any URL with embedded credentials.
6. Build & Release
- CI: GitHub Actions on the
property-servicepackage in the future application monorepo. - Image: built reproducibly via
docker buildx, signed with cosign, attested with SLSA L3 build provenance. - Tags:
gcr.io/melmastoon-prod/property-service:<git-sha>(immutable) +:edge(latest main) +:vX.Y.Z(release). - Promotion:
dev → staging → prodvia Cloud Deploy. Each rollout step gates on an automated SLO check (no burn for 30 min after release). - Strategy: canary 10 % → 50 % → 100 % with auto-rollback on
error_rate > 1 %orlatency_p99 > 2× baselinefor 5 min.
7. Database Migrations
- Migrations run as a Cloud Run Job before traffic shifts to the new revision.
- Locking via PG advisory lock to avoid concurrent runs.
- Strict expand → backfill → contract per migration (MIGRATION_PLAN details the policy).
- Rollback path: every migration ships a paired
down.sqlreviewed at PR time, even if rarely used.
8. Networking & Ingress
- Ingress only via the platform's edge API Gateway (Cloud Run + Cloud Armor) on
https://api.melmastoon.ghasi.io/property/v1/*. - Internal RPC (sync) on
https://internal.property.melmastoon-prod.svc/, callable only from the BFF SAs andai-orchestrator-serviceSA. - WAF rules at Cloud Armor: standard OWASP top-10 ruleset + a tightened body-size cap (
10 MB) and per-IP rate limit (200 req/min unauthenticated, 600 req/min authenticated). - Pub/Sub uses Private Google Access; no traffic on the public Internet.
9. Resource Limits & Autoscaling
- HPA via Cloud Run target concurrency = 80 with min/max set above.
- Memory headroom: peak observed 1.4 GiB on the photo fan-in path → 2 GiB cap applied there to absorb spikes.
- Cloud SQL: 4 vCPU / 16 GiB RAM in prod; HA enabled; PITR retention 7 days.
- Pub/Sub message retention: 7 days on dead-letter topics; default on hot topics.
- Memorystore: Standard tier, 5 GiB capacity, AUTH + TLS, persistence not required (cache is fully recoverable).
10. Disaster Recovery
| Scenario | RTO | RPO | Plan |
|---|---|---|---|
| Cloud Run regional outage | 5 min | 0 | Auto-failover via Cloud DNS to warm standby Cloud Run; Cloud SQL replica still serves reads |
| Cloud SQL primary failure | 5 min | 0 | Cloud SQL HA auto-failover |
| Cloud SQL region outage | 30 min | ≤ 5 min | Promote cross-region replica; runbook required |
| Pub/Sub regional outage | 0 (degraded) | 0 | Outbox absorbs the backlog; downstream catches up after recovery |
| Memorystore outage | 0 (degraded) | n/a | Cache misses fall through to Postgres; expect 2–3× read latency |
| Secret Manager outage | 5 min | 0 | Service holds last-good in memory; alert if refresh fails > 15 min |
DR drills run quarterly on the staging environment with the same topology; results recorded in runbooks/property/dr-drill-log.md.
Cross-references: SLOs and burn alerts in OBSERVABILITY; secret + IAM specifics in SECURITY_MODEL; failure scenarios in FAILURE_MODES.