Skip to main content

property-service — DEPLOYMENT_TOPOLOGY

Companion: SERVICE_OVERVIEW · OBSERVABILITY · SECURITY_MODEL · FAILURE_MODES · ../../docs/02-enterprise-architecture.md §GCP Reference Topology

This document describes how property-service is built, packaged, deployed, scaled, and isolated on GCP. It is binding for platform SRE and the service team.

Cloud: GCP only. Desktop client referenced by sync is Electron, never Tauri.


1. Runtime

AspectValue
Language / runtimeNode.js 20 LTS
FrameworkNestJS 10 (HTTP + microservices module for Pub/Sub)
Container basegcr.io/distroless/nodejs20-debian12
ComputeCloud Run (regional) for HTTP + sync internal RPC; Cloud Run Jobs for migrations and the nightly tenant-isolation auditor
Min/max instancesmin 2 (warm path), max 40 per region
Concurrency80 per instance
CPU / memory1 vCPU / 1 GiB (read-dominant); 2 vCPU / 2 GiB for the photo+AI fan-in path
Startup CPU boostenabled
Request timeout30 s (60 s for POST /properties/:id/rooms/bulk)
EgressVPC connector (Serverless VPC Access) into the platform shared VPC

The service runs on a dedicated Cloud Run service per region. There is no Kubernetes hosting today; the AsyncAPI- and OpenAPI-derived clients are portable, so a future move to GKE is a non-breaking infrastructure change.


2. Regions & Multi-Region Posture

Tenant residencyPrimary regionRead replicaStandby DR
AF / TJ / IRme-central1 (Doha)me-central1 HA replicaeurope-west4 warm standby
EUeurope-west4 (Eemshaven)europe-west4 HA replicame-central1 warm standby
US (future)us-central1us-central1 HA replicaeurope-west4 warm standby
  • Routing. tenant-service stamps tenantResidency; the global API gateway routes the request to the matching region.
  • Cross-region calls. Forbidden on the read/write path. Background reconciliation jobs may cross regions for analytics export only.
  • DR mode. A regional outage triggers Cloud DNS failover to the warm standby Cloud Run service; the warm Cloud SQL replica is promoted manually via runbook (RTO 30 min, RPO 5 min).

3. Dependencies

DependencyConnectionRegion scope
Cloud SQL Postgres 15 + PostGISCloud SQL Auth Proxy (IAM auth) over Private IPper-residency region; HA enabled
Memorystore Redis 7Private VPC, AUTH+TLSsame region
Pub/SubWorkload Identity, regional Pub/Sub Lite topics for high-volume domain events; standard Pub/Sub for control-planeper-region
iam-service (JWKS, OPA bundle)HTTPS to internal endpointsame region (with global mirror)
file-storage-serviceHTTPS internal RPCsame region
ai-orchestrator-serviceHTTPS internal RPCsame region (the orchestrator decides further model-region routing)
geo-serviceHTTPS internal RPCsame region
Secret ManagerWorkload Identityglobal resource, regionally cached
Cloud Logging / Trace / MonitoringOTel SDK → OTLP collector sidecar (in-cluster) → Cloud Operationsregional

4. Service Account & IAM

Runtime SA: property-service@melmastoon.iam.gserviceaccount.com.

Granted roles (least privilege):

  • roles/cloudsql.client (Cloud SQL Auth Proxy)
  • roles/cloudsql.instanceUser (IAM DB auth)
  • roles/redis.viewer + custom melmastoon.memorystore.connect
  • roles/pubsub.publisher on melmastoon.property.* topics
  • roles/pubsub.subscriber on the consumer subscriptions listed in EVENT_SCHEMAS
  • roles/secretmanager.secretAccessor on the property-* secret set
  • roles/cloudtrace.agent, roles/logging.logWriter, roles/monitoring.metricWriter
  • roles/iam.serviceAccountTokenCreator (signing internal JWTs for outbound calls — limited to the iam-service audience)

Explicitly not granted: any *.editor, *.admin, roles/storage.* (photo bytes go through file-storage-service), roles/aiplatform.* (Vertex AI access goes through ai-orchestrator-service).


5. Configuration

Configuration sources (in order of precedence):

  1. Per-environment Cloud Run env vars (build-time pinned; minimal — feature flags and dimensions only)
  2. Secret Manager (DB password backup, Redis AUTH backup, JWT verifier overrides)
  3. tenant-service settings (per-tenant feature flags pulled at request time and cached 60 s)

Required env vars:

NODE_ENV=production
SERVICE_NAME=property-service
SERVICE_VERSION=<git-sha>
REGION=me-central1
DB_INSTANCE=projects/melmastoon-prod/regions/me-central1/instances/property-pg
REDIS_INSTANCE=projects/melmastoon-prod/locations/me-central1/instances/property-cache
PUBSUB_PROJECT=melmastoon-prod
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.platform.svc.cluster.local:4317
LOG_LEVEL=info

Banned: any secret in env vars; any URL with embedded credentials.


6. Build & Release

  • CI: GitHub Actions on the property-service package in the future application monorepo.
  • Image: built reproducibly via docker buildx, signed with cosign, attested with SLSA L3 build provenance.
  • Tags: gcr.io/melmastoon-prod/property-service:<git-sha> (immutable) + :edge (latest main) + :vX.Y.Z (release).
  • Promotion: dev → staging → prod via Cloud Deploy. Each rollout step gates on an automated SLO check (no burn for 30 min after release).
  • Strategy: canary 10 % → 50 % → 100 % with auto-rollback on error_rate > 1 % or latency_p99 > 2× baseline for 5 min.

7. Database Migrations

  • Migrations run as a Cloud Run Job before traffic shifts to the new revision.
  • Locking via PG advisory lock to avoid concurrent runs.
  • Strict expand → backfill → contract per migration (MIGRATION_PLAN details the policy).
  • Rollback path: every migration ships a paired down.sql reviewed at PR time, even if rarely used.

8. Networking & Ingress

  • Ingress only via the platform's edge API Gateway (Cloud Run + Cloud Armor) on https://api.melmastoon.ghasi.io/property/v1/*.
  • Internal RPC (sync) on https://internal.property.melmastoon-prod.svc/, callable only from the BFF SAs and ai-orchestrator-service SA.
  • WAF rules at Cloud Armor: standard OWASP top-10 ruleset + a tightened body-size cap (10 MB) and per-IP rate limit (200 req/min unauthenticated, 600 req/min authenticated).
  • Pub/Sub uses Private Google Access; no traffic on the public Internet.

9. Resource Limits & Autoscaling

  • HPA via Cloud Run target concurrency = 80 with min/max set above.
  • Memory headroom: peak observed 1.4 GiB on the photo fan-in path → 2 GiB cap applied there to absorb spikes.
  • Cloud SQL: 4 vCPU / 16 GiB RAM in prod; HA enabled; PITR retention 7 days.
  • Pub/Sub message retention: 7 days on dead-letter topics; default on hot topics.
  • Memorystore: Standard tier, 5 GiB capacity, AUTH + TLS, persistence not required (cache is fully recoverable).

10. Disaster Recovery

ScenarioRTORPOPlan
Cloud Run regional outage5 min0Auto-failover via Cloud DNS to warm standby Cloud Run; Cloud SQL replica still serves reads
Cloud SQL primary failure5 min0Cloud SQL HA auto-failover
Cloud SQL region outage30 min≤ 5 minPromote cross-region replica; runbook required
Pub/Sub regional outage0 (degraded)0Outbox absorbs the backlog; downstream catches up after recovery
Memorystore outage0 (degraded)n/aCache misses fall through to Postgres; expect 2–3× read latency
Secret Manager outage5 min0Service holds last-good in memory; alert if refresh fails > 15 min

DR drills run quarterly on the staging environment with the same topology; results recorded in runbooks/property/dr-drill-log.md.


Cross-references: SLOs and burn alerts in OBSERVABILITY; secret + IAM specifics in SECURITY_MODEL; failure scenarios in FAILURE_MODES.