Deployment Topology

:::info Source Sourced from services/search-service/DEPLOYMENT_TOPOLOGY.md in the documentation repo. :::

Inherits platform topology from docs/01-enterprise-architecture.md.

1. Runtime

Attribute	Value
Language	TypeScript / Node.js 22 LTS
Framework	Fastify (HTTP), NATS.js (broker), BullMQ (reindex jobs)
Container base	`node:22-alpine` hardened, non-root
Orchestrator	Kubernetes (EKS/GKE/AKS); Nomad acceptable for edge regions
Pod resources	1 vCPU, 1 GiB RAM baseline; 4 vCPU, 4 GiB RAM for embedding-batcher
HPA	CPU 70%, custom metric `search_query_qps`, min=3 max=30 per region

2. Workload Shape

Three distinct workload profiles deployed as separate Deployments sharing the same image:

Workload	Purpose	Scaling driver	Replicas (prod)
`search-api`	HTTP serving queries	QPS + CPU	6 (min 3, max 30)
`search-indexer`	NATS projectors + embedding pipeline	NATS pending msgs	4 (min 2, max 20)
`search-jobs`	Reindex/rebuild workers	queue depth	2 (min 1, max 8)

Same image; different env var ROLE=api|indexer|jobs selects bootstrap path.

3. Regions

Primary regions: US, EU, ME, AP.
Each region is a full stack: tenants never span regions (except marketplace slice).
Cross-region traffic only for marketplace search hitting the global index.

4. Dependencies

Dep	Version	Notes
OpenSearch	2.13+	Dedicated cluster per region; 3 data + 3 master
Postgres	15+	Primary + 2 read replicas
Redis	7.2+	Cluster mode; 3 shards
NATS JetStream	2.10+	Cross-region mirror; per-region consumers
ai-gateway-service	internal	Service mesh endpoint `ai-gateway.prod.svc`

5. Networking

All pods in private subnets.
Egress via NAT/gateway only (for ai-gateway and identity JWKS refresh).
Mesh: Linkerd or Istio with mTLS (spiffe IDs).
Ingress: API Gateway (Kong) in front of search-api.

6. Kubernetes Manifests (abridged)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: search-api
  labels: { app: search-service, role: api }
spec:
  replicas: 6
  selector: { matchLabels: { app: search-service, role: api } }
  template:
    metadata:
      labels: { app: search-service, role: api }
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9464"
    spec:
      serviceAccountName: search-service
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        fsGroup: 10001
        seccompProfile: { type: RuntimeDefault }
      containers:
        - name: app
          image: registry.ghasi.io/search-service:{{ .Values.image.tag }}
          env:
            - { name: ROLE, value: api }
            - { name: REGION, valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } } }
            - { name: OTEL_SERVICE_NAME, value: search-service }
          envFrom:
            - secretRef: { name: search-secrets }
            - configMapRef: { name: search-config }
          resources:
            requests: { cpu: 500m, memory: 512Mi }
            limits:   { cpu: 2,    memory: 2Gi }
          readinessProbe:
            httpGet: { path: /readyz, port: 8080 }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /healthz, port: 8080 }
            periodSeconds: 10
          startupProbe:
            httpGet: { path: /startup, port: 8080 }
            failureThreshold: 30
          ports:
            - { name: http, containerPort: 8080 }
            - { name: metrics, containerPort: 9464 }
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector: { matchLabels: { app: search-service, role: api } }

7. HPA / KEDA

search-api: HPA CPU 70% + custom search_query_qps target 200 rps per pod.
search-indexer: KEDA nats-jetstream scaler on consumer pending.
search-jobs: KEDA postgres scaler on search.reindex_job where status='queued'.

8. Rollouts

Argo Rollouts canary strategy:
- 10% for 10min → automated SLO analysis (p95, error rate, NDCG if fresh).
- 50% for 20min.
- 100%.
Automated rollback on SLO breach or search_error_rate > 1% for 5m.
Schema migrations (Postgres) gated by migrate dry-run in pre-rollout step.

9. Configuration & Secrets

Config	Source
Non-secret	ConfigMap `search-config`
Secrets	Vault → CSI driver → ephemeral mount
Per-tenant policies	`search.index_policy` table, cached in Redis
Feature flags	LaunchDarkly (or OpenFeature)

10. Image Supply Chain

Multi-stage Docker build, distroless or alpine-hardened.
SBOM generated (syft) stored with image.
Image signed with cosign (Sigstore); deployment requires signature verification (kyverno policy).
Vulnerability scan gate: zero CRITICAL, ≤ 5 HIGH.

11. Backup & DR

Component	RPO	RTO	Strategy
OpenSearch	6h	2h	Snapshots + reindex from NATS
Postgres	15 min	1h	PITR + read replicas
Redis	—	5m	Ephemeral; cache rewarming tolerated
NATS	1h	30m	Cross-region mirror

DR drill: quarterly full region failover.

12. Cost Controls

Tenant bucketing in metrics caps cardinality.
Embedding batcher minimizes ai-gateway cost (≥ 70% cache hit target).
Per-tenant storage quota alert at 80%.
Cold archives to object storage every 30d.

13. Blue/Green for Index Schema Changes

Alias points to blue physical index.
Breaking mapping change → new green index built from snapshots + replayed events.
Alias swapped on full parity; blue retained 24h.
Client sees one consistent alias; no downtime.

14. On-Call Access

kubectl exec disabled in prod; access via audited ghasi-ops breakglass tool.
Every production mutation logged in platform.admin.action.performed.v1.

1. Runtime​

2. Workload Shape​

3. Regions​

4. Dependencies​

5. Networking​

6. Kubernetes Manifests (abridged)​

7. HPA / KEDA​

8. Rollouts​

9. Configuration & Secrets​

10. Image Supply Chain​

11. Backup & DR​

12. Cost Controls​

13. Blue/Green for Index Schema Changes​

14. On-Call Access​