api-gateway (Kong) — Service Overview
Status: populated Owner: TBD (Platform / SRE) Last updated: 2026-04-17 Companion: ADR-0001 Kong edge gateway · 01 Enterprise Architecture · Service Template
1. Purpose
This folder documents the Kong Gateway deployment that fronts the Ghasi-SMS-Gateway platform. Per ADR-0001, the previously-planned custom NestJS api-gateway service is RETIRED. Its edge responsibilities — TLS termination, authentication, rate limiting, correlation, logging — are now performed by Kong; its pre-edge application concerns (payload validation, idempotency, NATS publish) moved to sms-orchestrator.
The services/api-gateway/ directory is therefore not a deployable NestJS service. It is the source-of-truth documentation for:
- Kong Services and Routes (the public API surface).
- Kong plugins enabled per route (auth, rate limiting, correlation, logging, OTel).
- Any custom Kong plugins Ghasi authors (implementation lives in the application monorepo under
ops/kong/plugins/; design and contract live here). - Operational posture for Kong in staging and production (topology, observability, runbook entry points).
2. Why Kong replaces the custom api-gateway
| Concern | Custom NestJS gateway (retired) | Kong (adopted) |
|---|---|---|
| TLS termination | NestJS + cert-manager | Cloudflare + Kong, battle-tested |
| JWT validation | Hand-rolled Firebase verify | jwt plugin + JWKS from auth-service |
| API key auth | Hand-rolled hash + DB lookup | key-auth plugin (+ custom plugin to resolve consumer from auth-service) |
| Rate limiting | ioredis + Lua-ish counters | rate-limiting / rate-limiting-advanced (Redis cluster backend) |
| Request size limits | NestJS body parser config | request-size-limiting plugin |
| IP allow/deny | Hand-rolled guard | ip-restriction plugin |
| Correlation IDs, OTel spans | Custom interceptor | correlation-id + opentelemetry plugins |
| Observability | Custom Prom registry | Kong Prometheus plugin + http-log to Loki |
| Operational cost | Maintain a whole NestJS service | One declarative decK YAML, Helm values |
Rationale: telecom-grade SMS traffic needs first-class per-key / per-account / per-operator rate limits and burst protection. Kong's mature plugin set, declarative config, and operational tooling beat a bespoke NestJS edge for correctness, security posture, and maintenance cost.
3. Route prefixes exposed by Kong
The table below is the authoritative route layout, copied from ADR-0001 §3. It is expanded per endpoint in API_CONTRACTS.
| Path prefix | Upstream service | Auth at edge |
|---|---|---|
/v1/sms/send, /v1/sms/{id}, /v1/sms/bulk | sms-orchestrator | JWT or API key |
/v1/dlr/* | dlr-processor | JWT |
/v1/accounts/*, /v1/auth/*, /v1/api-keys/* | auth-service | JWT (most); /v1/auth/login public |
/v1/billing/*, /v1/invoices/* | billing-service | JWT |
/v1/analytics/*, /v1/reports/* | analytics-service | JWT |
/v1/operators/* | operator-management-service | JWT + admin scope |
/v1/webhooks/* | webhook-dispatcher | JWT |
/admin/* | admin-dashboard BFF | JWT + admin scope + IP allow |
/portal/* | customer-portal BFF | JWT |
Internal east-west traffic (service-to-service HTTP/gRPC, NATS) bypasses Kong. SMPP ingress terminates at smpp-connector, not Kong (ADR §7).
4. Responsibilities owned by Kong
- TLS 1.2+ termination downstream from Cloudflare (Cloudflare remains the WAF/DDoS layer).
- Edge authentication — JWT validation against
auth-serviceJWKS; API key validation viakey-authplugin; optional custom plugin to map API key → consumer using a cachedauth-servicelookup. - Rate limiting — per API key, per account, per operator, and global. Redis-backed (
kong:rl:*namespace). - Request size limiting — default 64 KB for
/v1/sms/send; larger per-route overrides for bulk endpoints. - IP allow/deny — partner integration allowlists, admin-path IP restrictions.
- Correlation and tracing — inject
X-Request-Idif missing, propagatetraceparent, emit OTel spans. - Header forwarding —
Authorization,X-Tenant-Id,X-Api-Key-Id,Idempotency-Key,Accept-Language,X-Forwarded-For. - Access logging — headers and metadata only; never SMS message bodies (PII/telecom data).
5. Responsibilities Kong does not own
| Concern | Owner | Why |
|---|---|---|
| Business authorization (account scope, per-resource RBAC) | auth-service + in-service guards | Kong does coarse gating only |
Tenant isolation (account_id scoping, RLS) | Every service | Authoritative boundary is the DB |
| Idempotency storage and replay | sms-orchestrator | Requires business context (dedupe key scope = API key + endpoint) |
| Zod payload validation (phone E.164, content-type detection, operator lookup) | sms-orchestrator | Business rule correctness |
| Problem+json error shaping | Every upstream service | Consistent with OpenAPI |
| HMAC signing of outbound webhooks | webhook-dispatcher | Not a north-south concern |
| Domain events | Upstream services → NATS | Kong never emits domain events |
6. Upstream dependencies
| Dependency | Pattern | Purpose |
|---|---|---|
auth-service (JWKS endpoint + api_keys lookup) | Pull JWKS at startup + refresh; optional sidecar plugin HTTP call | Validate JWTs; resolve API keys to consumers |
| Redis cluster | kong:rl:* namespace | Rate-limiter counters |
| PostgreSQL (if DB mode) or decK YAML in Git (DB-less) | Config store | Route/plugin/consumer definitions |
| Cloudflare | Upstream CDN/WAF | Sends already-TLS-offloaded or TLS-passthrough traffic |
| OTel collector | Trace export | Spans to platform trace backend |
| Loki | http-log target | Access logs (headers only) |
| Prometheus | Metrics scrape | Kong built-in Prometheus plugin |
7. Architecture diagram
8. Key decisions
- Adopt Kong as the only north-south HTTP gateway. See ADR-0001.
- DB-less mode preferred for production; configuration is declarative YAML under
ops/kong/in the application monorepo, applied via decK in CI. DB mode is a fallback if Kong Enterprise features requiring the DB are adopted later. - Authoritative auth stays in services. Kong performs coarse gating (JWT shape, API key existence, rate limit). Business authorization (account scope, resource RBAC, tenant isolation) remains in upstream services.
- SMPP is out of scope. MNO bindings terminate at
smpp-connector; Kong does not proxy SMPP. - No domain events. Kong emits access logs, metrics, and traces — no NATS events (ADR §4, EVENT_SCHEMAS).
- Custom plugins avoided where possible. Prefer configuration of built-in plugins; a custom plugin is only justified for API-key → consumer resolution against
auth-servicewhen the built-inkey-auth+ consumer bootstrap is insufficient.
9. Service readiness level
| Level | Description | Target |
|---|---|---|
| L1 | Kong deployed in staging, DB-less, one smoke route green | Slice 0 |
| L2 | All route prefixes wired, JWT + key-auth + rate limit + OTel enabled; Grafana dashboards live | Slice 1 |
| L3 | HA (2–6 replicas), decK pipeline with lint against upstream OpenAPI, alerts wired, runbooks signed | Slice 2 |
| L4 | Production cutover complete, custom NestJS api-gateway decommissioned, chaos-tested | Slice 3 |
10. Open questions
- Kong edition: OSS vs Enterprise vs Konnect (SRE decision; see SERVICE_RISK_REGISTER).
- Rate limiter backend: shared platform Redis vs Kong-dedicated Redis cluster.
- DB-less vs DB: final call before production cutover.
- Custom plugin for API-key → consumer resolution: built-in
key-authwith consumer-sync job or a purpose-built plugin querying a cachedauth-serviceendpoint. - Certificate rotation cadence and automation (cert-manager vs Cloudflare origin cert).