Deployment Topology
:::info Source
Sourced from services/identity-service/DEPLOYMENT_TOPOLOGY.md in the documentation repo.
:::
1. Containers
| Container | Purpose | Image |
|---|---|---|
identity-api | REST API (auth, sessions, devices, MFA, API keys) | ghasi/identity-service:<ver> |
identity-worker | Outbox relay, session reaper, token rotation detector, risk classifier | ghasi/identity-worker:<ver> |
identity-jwks | JWKS endpoint cache (read-only replica) | ghasi/identity-jwks:<ver> |
All containers are stateless; state lives in Postgres (primary) + Redis (session cache) + KMS (signing keys).
2. Scaling Rules
| Dimension | Rule |
|---|---|
| API replicas | HPA on CPU > 60% or login-rate > 200 rps per pod. Min 3, max 40 (per region). |
| Worker replicas | HPA on outbox backlog > 5000 rows. Min 2, max 10. |
| JWKS | Min 3 (Anycast behind CDN); essentially read-only. |
- Horizontal preferred; all services stateless.
- Vertical: baseline 500m CPU / 512Mi RAM; memory-heavy WebAuthn ops burst to 1Gi.
- Regional pinning: identity-service is deployed in every data-residency region (
us,eu,me,ap). Active-active intra-region; tenant-routed byhomeRegion.
3. Resource Requirements
| Workload | CPU request | CPU limit | Memory request | Memory limit |
|---|---|---|---|---|
identity-api | 500m | 2000m | 512Mi | 1.5Gi |
identity-worker | 200m | 1000m | 256Mi | 1Gi |
identity-jwks | 100m | 500m | 128Mi | 256Mi |
4. Caching Layers
| Layer | Contents | TTL |
|---|---|---|
| CDN (JWKS) | /.well-known/jwks.json | 5 min, stale-while-revalidate=60 |
| Redis (per region) | Session lookup by sessionId, rate-limit counters, login-attempt tracker | 15 min / sliding |
| In-memory (per pod) | JWT verification keys (by kid) | 10 min |
| Postgres pgbouncer | Transaction-mode pool, per-tenant app.tenant_id set on checkout | — |
5. CDN Usage
- Public endpoints (JWKS, password-reset landing) served via CDN with signed origin.
- Private endpoints (
/api/v1/auth/*,/api/v1/users/*) bypass CDN (origin direct). - Geographic routing: anycast for public; regional for private.
6. Edge Rules
- WAF: OWASP CRS + custom rules (credential-stuffing heuristics, IP reputation, header fingerprinting).
- Rate limits at edge:
/auth/login— 10/min per IP, 30/min per email/auth/password/reset/request— 3/hour per email/auth/refresh— 60/min per refresh-token family/auth/sso/*— 20/min per tenant
- Geographic blocking: per-tenant allowlist/denylist (OFAC + tenant-configured).
- Anti-bot: Turnstile on signup + password reset.
7. Service Mesh
- mTLS inside cluster (SPIFFE identity:
spiffe://ghasi/prod/identity-service). - Egress to IdPs (Google, Microsoft, SAML): restricted via egress gateway + per-destination allowlist.
- Egress to KMS: per-region, mTLS + IAM.
8. Deployment Topology Diagram
┌────────────────────┐
│ CDN (Cloudflare) │ /.well-known/jwks.json
└─────────┬──────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌────▼──────┐ ┌───────▼────────┐ ┌─────▼───────┐
│ WAF + LB │ │ API Gateway │ │ WAF + LB │
│ (us-east)│ │ (per region) │ │ (eu-fra-1) │
└────┬──────┘ └───────┬────────┘ └─────┬───────┘
│ │ │
┌────▼──────────┐ ┌─────▼──────────┐ ┌────▼─────────┐
│ identity-api │ │ identity-api │ │ identity-api │
│ (3–40 pods) │ │ (3–40 pods) │ │ (3–40 pods) │
└────┬──────────┘ └─────┬──────────┘ └────┬─────────┘
│ │ │
┌─────▼─────┐ ┌────▼────┐ ┌────▼────┐
│ Postgres │ RLS │ Redis │ │ KMS │
│ (primary) │ │ (cache) │ │ (HSM) │
└───────────┘ └─────────┘ └─────────┘
│
┌─────▼──────┐
│ Read replica│ → JWKS serve, analytics
└────────────┘
9. Release Strategy
- Blue/green for API container. New version registered with
kid-new; oldkidstill served until drain. - Canary (10% → 50% → 100%) on identity-worker.
- Zero-downtime KMS key rotation via
kidoverlap window (≥ 2 days).
10. Disaster Recovery
- RPO: 5 min (Postgres WAL shipped to cold storage).
- RTO: 60 min (DR drill quarterly).
- Failover: intra-region automatic; cross-region manual with CTO sign-off.
- Backups: per-tenant snapshot daily; restore tested monthly.