Skip to main content

Deployment Topology

:::info Source Sourced from services/search-service/DEPLOYMENT_TOPOLOGY.md in the documentation repo. :::

Inherits platform topology from docs/01-enterprise-architecture.md.

1. Runtime

AttributeValue
LanguageTypeScript / Node.js 22 LTS
FrameworkFastify (HTTP), NATS.js (broker), BullMQ (reindex jobs)
Container basenode:22-alpine hardened, non-root
OrchestratorKubernetes (EKS/GKE/AKS); Nomad acceptable for edge regions
Pod resources1 vCPU, 1 GiB RAM baseline; 4 vCPU, 4 GiB RAM for embedding-batcher
HPACPU 70%, custom metric search_query_qps, min=3 max=30 per region

2. Workload Shape

Three distinct workload profiles deployed as separate Deployments sharing the same image:

WorkloadPurposeScaling driverReplicas (prod)
search-apiHTTP serving queriesQPS + CPU6 (min 3, max 30)
search-indexerNATS projectors + embedding pipelineNATS pending msgs4 (min 2, max 20)
search-jobsReindex/rebuild workersqueue depth2 (min 1, max 8)

Same image; different env var ROLE=api|indexer|jobs selects bootstrap path.

3. Regions

  • Primary regions: US, EU, ME, AP.
  • Each region is a full stack: tenants never span regions (except marketplace slice).
  • Cross-region traffic only for marketplace search hitting the global index.

4. Dependencies

DepVersionNotes
OpenSearch2.13+Dedicated cluster per region; 3 data + 3 master
Postgres15+Primary + 2 read replicas
Redis7.2+Cluster mode; 3 shards
NATS JetStream2.10+Cross-region mirror; per-region consumers
ai-gateway-serviceinternalService mesh endpoint ai-gateway.prod.svc

5. Networking

  • All pods in private subnets.
  • Egress via NAT/gateway only (for ai-gateway and identity JWKS refresh).
  • Mesh: Linkerd or Istio with mTLS (spiffe IDs).
  • Ingress: API Gateway (Kong) in front of search-api.

6. Kubernetes Manifests (abridged)

apiVersion: apps/v1
kind: Deployment
metadata:
name: search-api
labels: { app: search-service, role: api }
spec:
replicas: 6
selector: { matchLabels: { app: search-service, role: api } }
template:
metadata:
labels: { app: search-service, role: api }
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9464"
spec:
serviceAccountName: search-service
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
seccompProfile: { type: RuntimeDefault }
containers:
- name: app
image: registry.ghasi.io/search-service:{{ .Values.image.tag }}
env:
- { name: ROLE, value: api }
- { name: REGION, valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } } }
- { name: OTEL_SERVICE_NAME, value: search-service }
envFrom:
- secretRef: { name: search-secrets }
- configMapRef: { name: search-config }
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: 2, memory: 2Gi }
readinessProbe:
httpGet: { path: /readyz, port: 8080 }
periodSeconds: 5
livenessProbe:
httpGet: { path: /healthz, port: 8080 }
periodSeconds: 10
startupProbe:
httpGet: { path: /startup, port: 8080 }
failureThreshold: 30
ports:
- { name: http, containerPort: 8080 }
- { name: metrics, containerPort: 9464 }
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector: { matchLabels: { app: search-service, role: api } }

7. HPA / KEDA

  • search-api: HPA CPU 70% + custom search_query_qps target 200 rps per pod.
  • search-indexer: KEDA nats-jetstream scaler on consumer pending.
  • search-jobs: KEDA postgres scaler on search.reindex_job where status='queued'.

8. Rollouts

  • Argo Rollouts canary strategy:
    • 10% for 10min → automated SLO analysis (p95, error rate, NDCG if fresh).
    • 50% for 20min.
    • 100%.
  • Automated rollback on SLO breach or search_error_rate > 1% for 5m.
  • Schema migrations (Postgres) gated by migrate dry-run in pre-rollout step.

9. Configuration & Secrets

ConfigSource
Non-secretConfigMap search-config
SecretsVault → CSI driver → ephemeral mount
Per-tenant policiessearch.index_policy table, cached in Redis
Feature flagsLaunchDarkly (or OpenFeature)

10. Image Supply Chain

  • Multi-stage Docker build, distroless or alpine-hardened.
  • SBOM generated (syft) stored with image.
  • Image signed with cosign (Sigstore); deployment requires signature verification (kyverno policy).
  • Vulnerability scan gate: zero CRITICAL, ≤ 5 HIGH.

11. Backup & DR

ComponentRPORTOStrategy
OpenSearch6h2hSnapshots + reindex from NATS
Postgres15 min1hPITR + read replicas
Redis5mEphemeral; cache rewarming tolerated
NATS1h30mCross-region mirror

DR drill: quarterly full region failover.

12. Cost Controls

  • Tenant bucketing in metrics caps cardinality.
  • Embedding batcher minimizes ai-gateway cost (≥ 70% cache hit target).
  • Per-tenant storage quota alert at 80%.
  • Cold archives to object storage every 30d.

13. Blue/Green for Index Schema Changes

  • Alias points to blue physical index.
  • Breaking mapping change → new green index built from snapshots + replayed events.
  • Alias swapped on full parity; blue retained 24h.
  • Client sees one consistent alias; no downtime.

14. On-Call Access

  • kubectl exec disabled in prod; access via audited ghasi-ops breakglass tool.
  • Every production mutation logged in platform.admin.action.performed.v1.