Skip to main content

iam-service — Deployment Topology

Catalog · 02 Enterprise Architecture · SECURITY_MODEL · OBSERVABILITY

GCP-native deployment. Cloud Run for compute, Cloud SQL for state, Memorystore for cache, KMS for crypto, Pub/Sub for events, Secret Manager for secrets. Multi-region active-active by M2.

1. Containers

ContainerPurposeReplicas (prod)CPUMemory
iam-apiHTTP/REST + JWKS endpointmin 2, max 501 vCPU512 MiB
iam-workerOutbox relay, inbox consumer, scheduled jobs (key rotation, breach audit, cert expiry)min 2, max 101 vCPU512 MiB
iam-jwksStatic-content variant of API serving only /.well-known/jwks.json (cache-friendly, no DB)min 2, max 50.5 vCPU256 MiB

All three from the same source repo, different entrypoints. Image tag is the short git SHA; latest not used in prod.

2. Scaling Rules

ContainerTriggerThreshold
iam-apiConcurrent requeststarget 80 / instance
iam-apiCPUtarget 60 %
iam-workerOutbox depthscale-up if depth > 100 for 30 s
iam-workerPub/Sub subscription backlogscale-up if num_undelivered_messages > 500
iam-jwksConcurrent requeststarget 200 / instance

Cold-start mitigation: min instances ≥ 2 in prod always. CPU is allocated even when idle for iam-api (Cloud Run "CPU always allocated").

3. Resource Budgets

ResourceLimitNotes
Request timeout30 sMost requests < 1 s; 30 s for OIDC flows w/ slow IdPs
Max concurrent requests / instance100tuned per release
Container startup probeGET /health/startup 200, deadline 30 s
Liveness probeGET /health/live every 30 s
Readiness probeGET /health/ready every 10 s; checks DB, Redis, KMS

4. Storage Topology

LayerServiceConfig
Primary DBCloud SQL Postgres 15 (Enterprise Plus)HA (regional, read replicas in 2 zones); CMEK; PITR 7 d; backup daily 35 d retention
Sessions cacheMemorystore Redis 7HA (Standard tier, 5 GB), VPC-attached, AUTH enabled
CryptoCloud KMSRegion-pinned (per data-residency tenant policy); HSM keyring
SecretsSecret ManagerRegion-replicated; auto-rotation where supported
EventsPub/SubTopics melmastoon.iam.*.v1, retention 7 d
Cold auditBigQueryDaily export from audit_events partitions ≥ 90 d old
Object storageCloud StorageDSAR PDFs (short-lived, 30 d), tenant CA exports

5. Region Topology

5.1 M0 (single region)

me-central1 (Doha)
├── iam-api (Cloud Run)
├── iam-worker (Cloud Run)
├── iam-jwks (Cloud Run + CDN)
├── Cloud SQL primary + 2 read replicas
├── Memorystore primary + replica
├── Cloud KMS keyring
├── Pub/Sub
└── Secret Manager

5.2 M2 (multi-region active-active)

┌─ Cloud DNS (geo-routed) ─┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ me-central1 │ │ europe-west1 │
│ (primary) │ │ (active-active) │
│ │ │ │
│ iam-api ▸▸▸▸▸▸▸ │◀──────▶│ iam-api │
│ iam-worker │ │ iam-worker │
│ iam-jwks/CDN │ │ iam-jwks/CDN │
│ Cloud SQL HA │ ▸pgsync│ Cloud SQL HA │
│ Memorystore HA │ TODO │ Memorystore HA │
│ KMS keyring │ ── ── ─│ KMS keyring │
│ Pub/Sub │ X-rep │ Pub/Sub │
└─────────────────┘ └─────────────────┘

Cross-region DB topology TBD (logical replication w/ tenant-scoped routing OR per-tenant pinning). Decision lives in architecture/ADR-multi-region.md (M2 milestone).

6. Caching Strategy

LayerWhatTTLInvalidation
CDN (Cloud CDN)/.well-known/jwks.json5 minmanual purge on key rotation
CDNOIDC discovery doc1 hrare changes
MemorystoreSession lookup24 hon revoke (key delete)
MemorystoreRate-limit countersrolling 5 minnatural decay
MemorystoreAdaptive-MFA risk score (idempotent)60 snatural
MemorystoreMagic-link tokens10 minon consume
MemorystoreAPI-key denylist post-revoke60 minTTL
In-memory (per pod)Tenant CA chain1 h, jitteredrare changes
In-memory (per pod)OIDC IdP metadata1 h, jitteredrefresh on parse error

7. Edge Rules

LayerRule
Cloud Armor (WAF)Rate limit /auth/login 10/min/IP; /auth/password/reset/request 3/h/email; /auth/refresh 60/min/family
Cloud ArmorBlock known TOR exit nodes for /auth/login unless tenant explicitly opts in
Cloud ArmorOWASP CRS preconfigured rules + custom credential-stuffing patterns
Cloud ArmorGeo block: per-tenant allowlist; OFAC denylist always
API GatewaymTLS for /internal/*; reject otherwise
Cloud Load BalancerTLS 1.3; HSTS preload; HTTP/2; QUIC opt-in
Cloud CDNCaches only /.well-known/jwks.json and OIDC discovery

8. Service Mesh

AspectChoice
MeshAnthos Service Mesh (Istio-managed)
IdentitySPIFFE: spiffe://melmastoon/prod/iam-service
mTLSSTRICT (mesh-internal only)
AuthorizationAuthorizationPolicy: only tenant-service, audit-service, gdpr-service, notification-service, ai-orchestrator-service, api-gateway may call iam
EgressRestricted: *.googleapis.com, configured IdPs, *.haveibeenpwned.com (HIBP)

9. Release Strategy

PhaseMechanism
BuildCloud Build, signed image to Artifact Registry; SBOM + provenance attestation
DeployCloud Deploy pipeline: dev → staging → prod-canary (5 %) → prod-full
Canary criteriaError rate Δ < 0.5 %, latency p99 Δ < 10 %, login success rate Δ < 0.1 %, no critical alerts in 30 min
PromotionAuto if all criteria green; manual otherwise
RollbackOne-command revert to previous Cloud Run revision (≤ 2 min); preserves env + traffic split
Schema migrationsForward-compatible only (per MIGRATION_PLAN); separate Cloud Build job; gated on ALL-PASS migration tests

10. Disaster Recovery

EventRPORTOMechanism
Pod failure0< 1 minCloud Run auto-restart
Zone failure0< 5 minregional Cloud SQL + Cloud Run
Region failure (M0)≤ 5 min≤ 4 hRestore from cross-region backup; DNS failover
Region failure (M2)0< 10 minActive-active failover
Data corruption≤ 5 min≤ 1 hPITR to pre-corruption point
KMS regional outagen/a≤ 30 minCross-region KMS replica + DR runbook

DR drill cadence: quarterly. Result tracked in runbooks/iam/dr-drill-log.md.

11. Secret Rotation Cadence

SecretCadenceMechanism
JWT signing key (kid)MonthlyKMS rotation alias; 2-day overlap
Tenant CAAnnualKMS; cert overlap; gradual reissue
OIDC client secretsQuarterly or per IdP cadenceSecret Manager + rolling deploy
SAML signing keyAnnualKMS
HIBP API keyPer providerSecret Manager
SMTP credsAnnualSecret Manager
Tenant fingerprint HMAC secretAnnualKMS DEK
API-key HMAC pepperAnnualKMS DEK
Cloud SQL CMEKAnnualKMS rotation; transparent

12. Observability Wiring

SignalSink
LogsCloud Logging → Log Router → BigQuery (90 d analytical) + Coralogix mirror
MetricsCloud Monitoring + Prometheus scrape → Cloud Monitoring + SigNoz
TracesOTel collector → Cloud Trace + SigNoz
AuditCloud Audit Logs (admin actions) + iam audit DB (auth events)
AlertsCloud Monitoring → PagerDuty #iam-oncall + Slack #oncall-iam

13. Network

ComponentSubnet
iam-apiprod-services-me-central1 (private)
iam-workersame
iam-jwkssame
Cloud SQLprod-data-me-central1 (private; private IP only)
Memorystoreprod-data-me-central1
Egress to internetCloud NAT (single egress IP per region for IdP allowlists)

14. Compliance & Sovereignty

RegionTenant residencyData scope
me-central1Default for new tenants in MENAAll iam data + crypto
europe-west1EU tenants opt-in (GDPR data residency)All iam data + crypto
us-central1M3, on demandAll iam data + crypto

Tenant residency recorded in tenant.created.v1; iam writes only to the tenant's residency region. Cross-region read for ops requires elevated approval + audit event.

15. Cost Posture

LeverOptimization
iam-jwks separateCheap (no DB), CDN-fronted; offloads 70 %+ of read traffic
Min instances2 (warm) only; rest scales to zero (jwks) or low
Cloud SQLRight-sized; auto-storage-grow off (manual)
KMSSign rate is bounded; bursts cached at API layer
Pub/Sub7-d retention only; longer-term in BigQuery
AI callsCached for 60 s by (userId, ipMasked) to reduce orchestrator load

Monthly per-tenant cost dashboard in dashboards/cost/iam.json; outliers (P95 cost) reviewed weekly.

16. Versioning & Rollout Discipline

  • API: /api/v1/*. Adding endpoints / fields is non-breaking. Removing or changing requires /api/v2/* with Sunset header on v1.
  • Events: melmastoon.iam.<entity>.<verb>.vN. New version added side-by-side; consumers migrate; old version deprecated per MIGRATION_PLAN §7.
  • Database: forward-compatible migrations only.
  • Helm/Cloud Run config in GitOps (infra/iam/).

17. ASCII Deployment Diagram (M0)

┌────────────────┐
Internet ────────────▶│ Cloud Armor │
│ (WAF + rate) │
└──────┬────────┘

┌──────────────────────────────┐
│ Global HTTPS Load Balancer │
│ (TLS term, HSTS, geo-route) │
└──────┬─────────────┬─────────┘
CDN-cached others
│ │
┌──────────▼──┐ ┌─────▼──────────┐
│ iam-jwks │ │ iam-api │
│ (Cloud Run) │ │ (Cloud Run) │
└─────────────┘ └─┬──────────────┘
│ mTLS (mesh)
┌──────────┐ ┌───────▼──────┐ ┌──────────┐
│ Cloud SQL│ │ Memorystore │ │ Cloud KMS │
│ (Postgres│ │ (Redis 7) │ │ (HSM) │
│ HA) │ │ │ │ │
└──────────┘ └──────────────┘ └──────────┘

┌─────────────────────┴─────────────────────┐
│ iam-worker (Cloud Run) │
│ outbox relay · inbox consumer · jobs │
└────────┬───────────────────────────┬──────┘
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Pub/Sub │ │ External: │
│ topics + DLQ │ │ OIDC/SAML │
└──────────────┘ │ HIBP, SMTP │
└──────────────┘