Skip to main content

api-gateway (Kong) — Deployment Topology

Status: populated Owner: TBD (Platform / SRE) Last updated: 2026-04-17 Companion: SERVICE_OVERVIEW · DATA_MODEL · Service Template

1. Runtime

PropertyValue
ProductKong Gateway (edition: OSS or Enterprise — see SERVICE_OVERVIEW open questions)
VersionPinned; upgrades follow the Kong LTS track
ModeDB-less (preferred); DB mode is a fallback
Container imageOfficial kong:<version>-alpine (SBOM scanned in CI)
PlatformKubernetes 1.29+
Workload kindDeployment
Replicas2 (staging) / 2–6 (prod with HPA)
CPU / mem request500m / 1 GiB per pod
CPU / mem limit2000m / 2 GiB per pod
HPACPU 70 % target; scale 2–6
Pod disruption budgetminAvailable = 2

DaemonSet layout is an alternative for dedicated ingress nodes; not chosen by default.

2. Topology diagram

3. Configuration delivery (GitOps)

  1. Engineer opens PR modifying ops/kong/<env>.kong.yaml in the application monorepo.
  2. CI runs the contract-test matrix (see TESTING_STRATEGY §2).
  3. On merge to main, CI runs deck gateway sync against staging.
  4. On release tag, CI runs deck gateway sync against production, with an approval gate.
  5. Nightly deck diff job detects drift.

Kong pods in DB-less mode watch a ConfigMap (mounted as the declarative_config file). deck writes the ConfigMap; Kong reloads on SIGHUP triggered by the config watcher sidecar (or rolling restart if SIGHUP reload is not configured).

4. Cloudflare upstream

  • Cloudflare sits in front as WAF + DDoS + CDN.
  • Cloudflare → Kube LoadBalancer → Kong pods.
  • TLS: Cloudflare holds the public cert; origin cert (Cloudflare Authenticated Origin Pulls) terminates at Kong.
  • Cloudflare applies bot management, IP reputation, basic rate limiting (L7) on top of Kong's finer-grained limits.

5. Environments

EnvDomainReplicasRedisDB mode
devn/a (docker compose)1localDB-less
stagingapi.staging.ghasi.io2shared staging RedisDB-less
prodapi.ghasi.io2–6 (HPA)dedicated clusterDB-less (see open Qs)
drapi.dr.ghasi.io2 hot-standbyregional replicaDB-less

6. Regions

  • Primary: single region (e.g. eu-west-1) — same region as sms-orchestrator and Redis.
  • DR: warm-standby region, Kong config replicated via Git. RTO 4 h, RPO 1 h (matches platform baseline, see 01 Enterprise Architecture §10).

7. Networking

ConcernSetting
Public ingressCloudflare → cloud LB (L4 or L7) → Kong Service
Kong Service typeLoadBalancer (public subnet) or NodePort behind an external LB
Admin APIInternal-only Service; NetworkPolicy restricts to SRE/CI pods
Upstream callsCluster DNS; mTLS preferred (via service mesh or direct)
EgressAllow-list: auth-service, Redis, OTel collector, Loki

NetworkPolicy excerpts:

  • Deny all → Kong admin port except from role=sre-tooling pods.
  • Kong pods may egress to cluster DNS, named services, and the OTel/Loki endpoints only.

8. Dependencies at runtime

DependencyFailure mode
auth-service (JWKS + key resolution)Degrades JWT validation after cache TTL; custom plugin rejects new API keys
RedisRate-limit counters fail per route policy (closed on writes, open on reads)
OTel collectorTraces dropped; alerts on sustained export failure
LokiLog buffer fills then drops; not request-path critical
PrometheusNo impact on request path

9. Upgrade strategy

  • Config changes: rolling pod restart if SIGHUP reload not configured; else live reload.
  • Kong version bumps: blue/green. Stand up a new ReplicaSet with the new image, flip Service selector after smoke test, keep old ReplicaSet for 1 h.
  • Custom plugin releases: ship as part of the Kong image (build-time install). Roll with blue/green.

10. Resource sizing (reference)

For an SMS peak of ~5 000 req/s at the edge with p95 Kong latency < 150 ms:

  • 4 Kong pods @ 1 vCPU each, ~60 % CPU headroom.
  • Redis sustained throughput ~30 k ops/s (rate limit INCR + EXPIRE). 3-node cluster, 4 GiB each.

Final numbers ratified at load test (see TESTING_STRATEGY §7).

11. Open questions

  • DaemonSet vs Deployment on dedicated ingress nodes — revisit if ingress traffic grows significantly.
  • Service mesh (Linkerd / Istio) for Kong ↔ upstream mTLS vs direct TLS.
  • HPA metric — CPU only or custom metric (e.g. kong_http_requests_total rate).