Deployment Topology
:::info Source
Sourced from services/search-service/DEPLOYMENT_TOPOLOGY.md in the documentation repo.
:::
Inherits platform topology from docs/01-enterprise-architecture.md.
1. Runtime
| Attribute | Value |
|---|---|
| Language | TypeScript / Node.js 22 LTS |
| Framework | Fastify (HTTP), NATS.js (broker), BullMQ (reindex jobs) |
| Container base | node:22-alpine hardened, non-root |
| Orchestrator | Kubernetes (EKS/GKE/AKS); Nomad acceptable for edge regions |
| Pod resources | 1 vCPU, 1 GiB RAM baseline; 4 vCPU, 4 GiB RAM for embedding-batcher |
| HPA | CPU 70%, custom metric search_query_qps, min=3 max=30 per region |
2. Workload Shape
Three distinct workload profiles deployed as separate Deployments sharing the same image:
| Workload | Purpose | Scaling driver | Replicas (prod) |
|---|---|---|---|
search-api | HTTP serving queries | QPS + CPU | 6 (min 3, max 30) |
search-indexer | NATS projectors + embedding pipeline | NATS pending msgs | 4 (min 2, max 20) |
search-jobs | Reindex/rebuild workers | queue depth | 2 (min 1, max 8) |
Same image; different env var ROLE=api|indexer|jobs selects bootstrap path.
3. Regions
- Primary regions: US, EU, ME, AP.
- Each region is a full stack: tenants never span regions (except marketplace slice).
- Cross-region traffic only for marketplace search hitting the global index.
4. Dependencies
| Dep | Version | Notes |
|---|---|---|
| OpenSearch | 2.13+ | Dedicated cluster per region; 3 data + 3 master |
| Postgres | 15+ | Primary + 2 read replicas |
| Redis | 7.2+ | Cluster mode; 3 shards |
| NATS JetStream | 2.10+ | Cross-region mirror; per-region consumers |
| ai-gateway-service | internal | Service mesh endpoint ai-gateway.prod.svc |
5. Networking
- All pods in private subnets.
- Egress via NAT/gateway only (for ai-gateway and identity JWKS refresh).
- Mesh: Linkerd or Istio with mTLS (spiffe IDs).
- Ingress: API Gateway (Kong) in front of
search-api.
6. Kubernetes Manifests (abridged)
apiVersion: apps/v1
kind: Deployment
metadata:
name: search-api
labels: { app: search-service, role: api }
spec:
replicas: 6
selector: { matchLabels: { app: search-service, role: api } }
template:
metadata:
labels: { app: search-service, role: api }
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9464"
spec:
serviceAccountName: search-service
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
seccompProfile: { type: RuntimeDefault }
containers:
- name: app
image: registry.ghasi.io/search-service:{{ .Values.image.tag }}
env:
- { name: ROLE, value: api }
- { name: REGION, valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } } }
- { name: OTEL_SERVICE_NAME, value: search-service }
envFrom:
- secretRef: { name: search-secrets }
- configMapRef: { name: search-config }
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: 2, memory: 2Gi }
readinessProbe:
httpGet: { path: /readyz, port: 8080 }
periodSeconds: 5
livenessProbe:
httpGet: { path: /healthz, port: 8080 }
periodSeconds: 10
startupProbe:
httpGet: { path: /startup, port: 8080 }
failureThreshold: 30
ports:
- { name: http, containerPort: 8080 }
- { name: metrics, containerPort: 9464 }
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector: { matchLabels: { app: search-service, role: api } }
7. HPA / KEDA
search-api: HPA CPU 70% + customsearch_query_qpstarget 200 rps per pod.search-indexer: KEDAnats-jetstreamscaler on consumer pending.search-jobs: KEDApostgresscaler onsearch.reindex_job where status='queued'.
8. Rollouts
- Argo Rollouts canary strategy:
- 10% for 10min → automated SLO analysis (p95, error rate, NDCG if fresh).
- 50% for 20min.
- 100%.
- Automated rollback on SLO breach or
search_error_rate > 1%for 5m. - Schema migrations (Postgres) gated by
migrate dry-runin pre-rollout step.
9. Configuration & Secrets
| Config | Source |
|---|---|
| Non-secret | ConfigMap search-config |
| Secrets | Vault → CSI driver → ephemeral mount |
| Per-tenant policies | search.index_policy table, cached in Redis |
| Feature flags | LaunchDarkly (or OpenFeature) |
10. Image Supply Chain
- Multi-stage Docker build, distroless or alpine-hardened.
- SBOM generated (
syft) stored with image. - Image signed with
cosign(Sigstore); deployment requires signature verification (kyvernopolicy). - Vulnerability scan gate: zero CRITICAL, ≤ 5 HIGH.
11. Backup & DR
| Component | RPO | RTO | Strategy |
|---|---|---|---|
| OpenSearch | 6h | 2h | Snapshots + reindex from NATS |
| Postgres | 15 min | 1h | PITR + read replicas |
| Redis | — | 5m | Ephemeral; cache rewarming tolerated |
| NATS | 1h | 30m | Cross-region mirror |
DR drill: quarterly full region failover.
12. Cost Controls
- Tenant bucketing in metrics caps cardinality.
- Embedding batcher minimizes ai-gateway cost (≥ 70% cache hit target).
- Per-tenant storage quota alert at 80%.
- Cold archives to object storage every 30d.
13. Blue/Green for Index Schema Changes
- Alias points to blue physical index.
- Breaking mapping change → new green index built from snapshots + replayed events.
- Alias swapped on full parity; blue retained 24h.
- Client sees one consistent alias; no downtime.
14. On-Call Access
kubectl execdisabled in prod; access via auditedghasi-opsbreakglass tool.- Every production mutation logged in
platform.admin.action.performed.v1.