Skip to main content

DEPLOYMENT_TOPOLOGY — bff-backoffice-service

Sibling: DATA_MODEL · SECURITY_MODEL · SYNC_CONTRACT · LOCAL_DEV_SETUP

Cross-cutting: 02 Enterprise Architecture · §4 GCP Reference Architecture · ADR-0003 Electron offline-first desktop

1. Runtime

PropertyValue
ComputeCloud Run (managed)
Region (primary)asia-south1 (Mumbai)
Region (DR-warm)europe-west4 (Eemshaven)
ContainerDistroless Node 20, multi-stage build, non-root, read-only root FS
Min instances2 (per region)
Max instances20 (per region)
Concurrency per instance80
CPU2 vCPU, always-allocated
Memory1 GiB
Startup latency< 800 ms
Request timeout30 s
VPC connectorbff-connector-asia-south1

2. Ingress

backoffice.melmastoon.ghasi.io (Electron desktop endpoint)


Cloud DNS (CNAME → GCLB anycast)


Global HTTPS Load Balancer
├── Cloud Armor (per-IP + per-device rules)
├── SNI cert via Certificate Manager


Serverless NEG → Cloud Run (bff-backoffice-service)

CDN is off — every endpoint is authenticated and tenant/device-specific. Cache lives at the BFF layer (Memorystore).

3. Egress (upstream)

All upstreams reached over internal Cloud Run-to-Cloud Run via VPC connector with Google ID tokens minted from bff-backoffice-sa.

UpstreamHostnameAuthTimeoutRetries
iam-serviceiam.melmastoon.internalGoogle ID token600 ms (refresh), 200 ms (validate)0
tenant-servicetenant.melmastoon.internalGoogle ID token500 ms1
reservation-servicereservation.melmastoon.internalGoogle ID token800 ms (read), 1500 ms (mutation)1 mutation
inventory-serviceinventory.melmastoon.internalGoogle ID token600 ms1
pricing-servicepricing.melmastoon.internalGoogle ID token600 ms1
housekeeping-servicehousekeeping.melmastoon.internalGoogle ID token700 ms1
maintenance-servicemaintenance.melmastoon.internalGoogle ID token700 ms1
billing-servicebilling.melmastoon.internalGoogle ID token600 ms1
lock-integration-servicelock.melmastoon.internalGoogle ID token1200 ms0 (sensitive)
ai-orchestrator-serviceai.melmastoon.internalGoogle ID token800 ms0
notification-servicenotification.melmastoon.internalGoogle ID token500 ms1
sync-service (handshake only)sync.melmastoon.internalGoogle ID token700 ms0
analytics-serviceanalytics.melmastoon.internalGoogle ID token600 ms1
property-serviceproperty.melmastoon.internalGoogle ID token800 ms1

4. Stateful dependencies

DependencyTypeRegionHA
Memorystore (Redis 7) — cache tierbff-backoffice-cache-asia-south1, 5 GiB, standardasia-south1Standby + auto-failover
Memorystore (Redis 7) — session tier (no eviction)bff-backoffice-session-asia-south1, 3 GiB, standardasia-south1Standby + auto-failover
Cloud SQL (Postgres 16)bff-backoffice-db-asia-south1, db-custom-2-8192asia-south1Regional HA + cross-region replica in europe-west4
Pub/Sub topicsmelmastoon.bff.backoffice.*globaln/a
Secret Managerpepper, sse-signing-keyglobalreplicated

5. Electron desktop integration (deployment-side)

The desktop is an independent deployment artifact with its own release pipeline, but its operation depends on this BFF being reachable. Coordination points:

ConcernMechanism
Backoffice endpoint URLHard-coded per-environment in desktop build (prod, stage, dev); over-rideable via signed config update
App-version floorThis BFF rejects appVersion < floor with SYNC.VERSION_BLOCKED; floor managed via bff-backoffice-flags Memorystore key with 30 s refresh
Auto-update serverSeparate (update.melmastoon.ghasi.io); not this BFF
Code signing for installerselectron-builder; not this BFF's concern
contextBridge API surfaceVersioned in shared @ghasi/desktop-bridge-types package; consumed by both desktop + this BFF
DPoP device key provisioningFirst-run flow against iam-service.deviceEnroll; this BFF is on the path only after enrollment
MFA factor enrollmentiam-service's responsibility; we only consume attestation tokens
Auto-update rollout pacingIndependent of BFF deploys; coordinated via release calendar

5.1 Compatibility matrix

Desktop majorBFF majorStatus
1.xv1Supported
2.x (planned Phase 2)v1 + v2Overlap window 90 d during migration

5.2 Rollout coordination

When the BFF ships a breaking change to /bff/backoffice/v1/*, we:

  1. Ship /bff/backoffice/v2/* first; keep v1 unchanged.
  2. Release a desktop version that supports both.
  3. Wait for ≥ 95% of devices to be on the new desktop version (tracked via heartbeat appVersion).
  4. Raise the v1 deprecation Sunset header.
  5. Remove v1 after the deprecation window.

The app-version-floor mechanism allows us to force devices off truly broken versions; it is reserved for emergencies (e.g., known data-corruption bug).

6. CI/CD pipeline

GitHub PR → GitHub Actions
├── Lint + typecheck + unit + integration + contract tests
├── Build container (Cloud Build)
├── Trivy scan
├── Cosign sign
├── Push to Artifact Registry
├── Deploy to dev Cloud Run (no-traffic + smoke)
├── Manual approval → stage
└── Manual approval → prod (canary 10% → 50% → 100% over 30 min, with metric guardrails)

Binary authorization on prod cluster requires Cosign signature.

7. Traffic management

  • Default routing: anycast → nearest healthy region.
  • Canary control via Cloud Deploy + Cloud Run revisions.
  • Rollback budget 5 minutes if SLO burn detected.
  • Force-logout broadcasts cross-region via Pub/Sub fanout to all instances; SSE bus locally fanned out per device.

8. Configuration

SourceWhat
Cloud Run env varsNon-secret toggles
Secret Manager (file mounts)Secrets
Cloud Run Service YAMLSizing, scaling, VPC connector
bff-backoffice-flags (Memorystore)Feature flags + sample rates + app-version floor (30 s refresh)

9. Networking

  • VPC: melmastoon-prod-vpc.
  • Subnet for Cloud Run connector: bff-backoffice-connector-asia-south1 (10.20.7.0/28).
  • Private Service Access for Cloud SQL.
  • Memorystore via VPC connector private IP.

10. Cost posture

ItemEstimated monthly @ 50 RPS steady
Cloud Run~$220
Memorystore (cache 5 GiB + session 3 GiB)~$280
Cloud SQL (db-custom-2-8192 + HA)~$240
Pub/Sub~$40
Cloud Armor~$30
Logging + Trace + Monitoring~$80
Total~$890 / month / region

11. Disaster recovery

  • RPO: 5 min (Cloud SQL PITR).
  • RTO: 30 min (DNS + redeploy in DR).
  • Quarterly DR drill: cut traffic to europe-west4; verify desktop fleet reconnects via DNS.
  • Sessions in Memorystore are best-effort — session loss in DR scenario triggers a /auth/refresh from the desktop; idempotency keys absorb mutation retries.

12. Health endpoints

  • /health/live: 200 if process running.
  • /health/ready: 200 if Memorystore + Postgres + 80% upstream circuits closed.

Cloud Run liveness + readiness probes pointed at these.

13. Capacity model

DriverPer-tenant peakAll-tenant peak (50 tenants)
Active operators301,500
Active devices502,500
Heartbeats50/min2,500/min
Dashboard reads30/min1,500/min
Mutations60/min3,000/min
Lock actions10/hour500/hour
AI decisions5/hour250/hour
SSE connections502,500

14. Observability of deploy

  • Cloud Build job emits OpenTelemetry traces.
  • Deploy event published to melmastoon.platform.deployment.v1 (consumed by SRE dashboards).
  • Per-revision RED metrics tagged with bff_version allow rollback evaluation.

15. Operational notes

  • Heartbeat traffic is high-volume + low-value; sample 5%; do not page on heartbeat alerts unless SLO-relevant.
  • SSE long-running connections matter for instance recycling; pre-stop hook drains SSE gracefully (sends event: session\ndata: {"kind":"reconnect"} then closes).
  • Per-device idempotency state on Postgres: cleanup hourly; daily archival.
  • App-version floor change is a breaking operational action; requires runbook execution + Slack announcement.

16. Cross-references