DEPLOYMENT_TOPOLOGY — payment-gateway-service
Sibling: OBSERVABILITY · SECURITY_MODEL · LOCAL_DEV_SETUP · SERVICE_READINESS
This service runs on Google Cloud Platform (per platform standard; Electron is the only desktop technology, never Tauri). Production workloads run on GKE Autopilot; data lives in Cloud SQL Postgres 16; events flow through Pub/Sub; secrets in Secret Manager; key material in Cloud KMS.
1. Environments
| Env | GCP project | Purpose | Vendor mode |
|---|---|---|---|
dev | gm-payments-dev | Local-merge integration | All processors in sandbox; hesabpay-mock |
staging | gm-payments-staging | Pre-prod validation, contract tests | Stripe/PayPal sandbox; HesabPay sandbox |
prod | gm-payments-prod | Live | Stripe/PayPal live; HesabPay live (per-region per-tenant) |
Per-tenant Cloud SQL schemas exist in each environment; tenants in dev/staging are synthetic.
2. Cluster topology
- GKE Autopilot clusters per env, regional (
us-central1primary,europe-west1warm-standby for EU tenants). - Two deployments:
payment-api— REST + webhook receivers, HPA on RPS + p95 latency, min 3 pods, max 60.payment-worker— webhook dispatcher, reconciliation jobs, outbox flush, HPA on Pub/Sub backlog, min 2 pods, max 30.
- A scheduled CronJob
payments-reconciliation-dailyruns at 02:00 UTC per tenant per processor. - A scheduled CronJob
payments-idempotency-gcruns hourly, dropping expired idempotency keys.
3. Network
- All inbound API traffic enters via the platform gateway (
api.melmastoon.ghasi.io) → Cloud Armor → Internal HTTPS LB →payment-api. - Webhook traffic enters via a separate subdomain
webhooks.payments.melmastoon.ghasi.io→ Cloud Armor (with vendor IP allowlists perSECURITY_MODEL §4.3) →payment-apiwebhook receivers. - Egress to vendors (Stripe, PayPal, HesabPay) goes through a Cloud NAT with a fixed public IP, allowlisted on each vendor's side where they support it.
- Egress to FX provider behind the same NAT.
- Service-to-service (Pub/Sub, Cloud SQL, Secret Manager, KMS) over Private Service Connect — no public IPs.
┌────────────────────────────┐
│ api.melmastoon.ghasi.io │
guests / staff / s2s ─────────▶│ Cloud Armor + LB │──▶ payment-api ──┬─▶ Cloud SQL (Postgres 16, schema-per-tenant)
└────────────────────────────┘ │
├─▶ Pub/Sub (events out + saga in)
┌─────────────────────────────────┐ │
stripe / paypal / hesabpay ──▶│ webhooks.payments.melmastoon.… │──▶ payment-api│
│ Cloud Armor + WAF + IP allow │ │
└─────────────────────────────────┘ │
├─▶ Secret Manager (vendor creds)
payment-worker ◀── Pub/Sub backlog (HPA) ── outbox / webhook inbox flush │
└─▶ Cloud KMS (CMEK envelope decrypt)
Cloud NAT (fixed egress IP) ──▶ Stripe / PayPal / HesabPay / FX
4. Data plane
- Cloud SQL Postgres 16, regional HA (primary + standby), private IP only.
- Read replica for analytics queries (no app traffic).
- Backups: continuous PITR (35-day retention), daily snapshots (90-day retention), monthly archive snapshot to Coldline GCS bucket (7-year retention).
- Maintenance window: Sundays 03:00–05:00 UTC.
5. Configuration & secrets
- App config via Kubernetes ConfigMaps for non-secret env (FX provider id, default circuit-breaker thresholds, feature flags).
- Secrets via External Secrets Operator pulling from Secret Manager. Pods receive secrets as env vars or mounted files (vendor private keys mounted, never env-vared).
- Workload Identity binds the
payment-apiandpayment-workerGSAs to namespace-scoped permissions (secretmanager.secretAccessor,cloudkms.cryptoKeyDecrypter,cloudsql.client,pubsub.publisher,pubsub.subscriber).
6. Release pipeline
- CI on PR: lint + unit + integration + contract + PCI scan + OpenAPI diff + event-schema diff.
- Merge to
main: build container image (Wolfi-based distroless), sign with Cosign, push to Artifact Registry, deploy todev. - Promotion to
staging: manual approval; runs e2e:sandbox suite in staging post-deploy. - Promotion to
prod: manual approval (2-engineer); progressive rollout with canary at 5% for 30 min, then 25% for 30 min, then 100%; auto-rollback on SLO burn or error-rate spike.
7. Observability stack
- Metrics: OpenTelemetry → Cloud Operations.
- Traces: OpenTelemetry → Cloud Trace.
- Logs: stdout JSON → Cloud Logging.
- Alerts: routed via
notification-serviceto PagerDuty. - Dashboards: Cloud Monitoring workspace
payment-gateway-service(see OBSERVABILITY).
8. Disaster recovery
| Scenario | RPO | RTO | Strategy |
|---|---|---|---|
| Cluster failure | 0 | 15 m | Failover to warm standby region (DNS swap) |
| DB primary failure | 0 (HA standby in same region) | 5 m | Cloud SQL automatic failover |
| Region-wide outage | ≤ 5 m (PITR) | 60 m | Restore PITR to standby region; webhook ingestion paused via 503 (vendors retry); re-issued events on recovery |
| Pub/Sub outage | n/a | as long as outage | Outbox holds; resumes when Pub/Sub returns |
| Secret Manager outage | 5 m read cache covers | as long as outage | Cached secrets serve in-flight requests; new tenants paused |
Quarterly DR drills include a region failover and a full restore test.
9. Tenant onboarding & offboarding flow (deploy-side)
- Onboarding:
tenant-servicecallspayments-admin-cli provision-tenant <id>which creates the schema, seeds vendor pointers, and runs initial migrations. - Offboarding: 30-day soft-delete window; after that, schema dropped, secrets destroyed, archived snapshot retained per legal hold rules (SECURITY_MODEL §11).