Skip to main content

Deployment Topology

:::info Source Sourced from services/identity-service/DEPLOYMENT_TOPOLOGY.md in the documentation repo. :::

1. Containers

ContainerPurposeImage
identity-apiREST API (auth, sessions, devices, MFA, API keys)ghasi/identity-service:<ver>
identity-workerOutbox relay, session reaper, token rotation detector, risk classifierghasi/identity-worker:<ver>
identity-jwksJWKS endpoint cache (read-only replica)ghasi/identity-jwks:<ver>

All containers are stateless; state lives in Postgres (primary) + Redis (session cache) + KMS (signing keys).

2. Scaling Rules

DimensionRule
API replicasHPA on CPU > 60% or login-rate > 200 rps per pod. Min 3, max 40 (per region).
Worker replicasHPA on outbox backlog > 5000 rows. Min 2, max 10.
JWKSMin 3 (Anycast behind CDN); essentially read-only.
  • Horizontal preferred; all services stateless.
  • Vertical: baseline 500m CPU / 512Mi RAM; memory-heavy WebAuthn ops burst to 1Gi.
  • Regional pinning: identity-service is deployed in every data-residency region (us, eu, me, ap). Active-active intra-region; tenant-routed by homeRegion.

3. Resource Requirements

WorkloadCPU requestCPU limitMemory requestMemory limit
identity-api500m2000m512Mi1.5Gi
identity-worker200m1000m256Mi1Gi
identity-jwks100m500m128Mi256Mi

4. Caching Layers

LayerContentsTTL
CDN (JWKS)/.well-known/jwks.json5 min, stale-while-revalidate=60
Redis (per region)Session lookup by sessionId, rate-limit counters, login-attempt tracker15 min / sliding
In-memory (per pod)JWT verification keys (by kid)10 min
Postgres pgbouncerTransaction-mode pool, per-tenant app.tenant_id set on checkout

5. CDN Usage

  • Public endpoints (JWKS, password-reset landing) served via CDN with signed origin.
  • Private endpoints (/api/v1/auth/*, /api/v1/users/*) bypass CDN (origin direct).
  • Geographic routing: anycast for public; regional for private.

6. Edge Rules

  • WAF: OWASP CRS + custom rules (credential-stuffing heuristics, IP reputation, header fingerprinting).
  • Rate limits at edge:
    • /auth/login — 10/min per IP, 30/min per email
    • /auth/password/reset/request — 3/hour per email
    • /auth/refresh — 60/min per refresh-token family
    • /auth/sso/* — 20/min per tenant
  • Geographic blocking: per-tenant allowlist/denylist (OFAC + tenant-configured).
  • Anti-bot: Turnstile on signup + password reset.

7. Service Mesh

  • mTLS inside cluster (SPIFFE identity: spiffe://ghasi/prod/identity-service).
  • Egress to IdPs (Google, Microsoft, SAML): restricted via egress gateway + per-destination allowlist.
  • Egress to KMS: per-region, mTLS + IAM.

8. Deployment Topology Diagram

┌────────────────────┐
│ CDN (Cloudflare) │ /.well-known/jwks.json
└─────────┬──────────┘

┌────────────────────┼────────────────────┐
│ │ │
┌────▼──────┐ ┌───────▼────────┐ ┌─────▼───────┐
│ WAF + LB │ │ API Gateway │ │ WAF + LB │
│ (us-east)│ │ (per region) │ │ (eu-fra-1) │
└────┬──────┘ └───────┬────────┘ └─────┬───────┘
│ │ │
┌────▼──────────┐ ┌─────▼──────────┐ ┌────▼─────────┐
│ identity-api │ │ identity-api │ │ identity-api │
│ (3–40 pods) │ │ (3–40 pods) │ │ (3–40 pods) │
└────┬──────────┘ └─────┬──────────┘ └────┬─────────┘
│ │ │
┌─────▼─────┐ ┌────▼────┐ ┌────▼────┐
│ Postgres │ RLS │ Redis │ │ KMS │
│ (primary) │ │ (cache) │ │ (HSM) │
└───────────┘ └─────────┘ └─────────┘

┌─────▼──────┐
│ Read replica│ → JWKS serve, analytics
└────────────┘

9. Release Strategy

  • Blue/green for API container. New version registered with kid-new; old kid still served until drain.
  • Canary (10% → 50% → 100%) on identity-worker.
  • Zero-downtime KMS key rotation via kid overlap window (≥ 2 days).

10. Disaster Recovery

  • RPO: 5 min (Postgres WAL shipped to cold storage).
  • RTO: 60 min (DR drill quarterly).
  • Failover: intra-region automatic; cross-region manual with CTO sign-off.
  • Backups: per-tenant snapshot daily; restore tested monthly.