api-gateway (Kong) — Population Report
Date: 2026-04-17
Scope: Re-populated all 17 service docs for the re-scoped api-gateway folder per ADR-0001. Folder is now Kong documentation, not a deployable NestJS service.
1. New epics and stories (Kong adoption)
EP-KONG-01 — Adopt Kong as edge gateway
- Type: Epic
- Status: To Do
- Priority: Highest
- Description: Provision Kong Gateway as the sole north-south HTTP edge for Ghasi-SMS-Gateway. DB-less mode, decK-managed configuration under
ops/kong/, CI lint against upstream OpenAPI. Deliver staging first, production second. - Acceptance criteria:
- Kong deployed in staging and production (2–6 replicas with HPA).
- All route prefixes from ADR-0001 §3 wired to upstream services.
- decK GitOps pipeline green; nightly
deck diffreturns no drift. - CI lint enforces: auth plugin per route, tags, no plaintext secrets, route ↔ upstream OpenAPI parity.
- SERVICE_READINESS §2.1 + §2.2 check-list green.
- Stories: US-KONG-01..US-KONG-06
US-KONG-01 — Provision Kong in staging (DB-less)
- Type: Story
- Parent epic: EP-KONG-01
- Priority: Highest
- Story points: 5
- As a platform engineer I want Kong running in the staging cluster in DB-less mode so that we have a reproducible edge environment for route + plugin validation.
- Acceptance criteria:
- Helm chart or manifests in
ops/kong/deploy 2 Kong replicas in staging. - Kong loads
ops/kong/staging.kong.yamlvia ConfigMap on start; reloads on SIGHUP. /healthand/readyresponsive from inside the cluster.
- Helm chart or manifests in
- Dependencies: Staging cluster ready; Redis available.
US-KONG-02 — Decide DB-less vs DB mode for prod
- Type: Story
- Parent epic: EP-KONG-01
- Priority: High
- Story points: 3
- As an SRE lead I want a recorded decision on Kong DB-less vs DB mode so that prod topology is final before cutover.
- Acceptance criteria:
- ADR (or updated ADR-0001) captures decision + rationale.
- If DB-less (default): no PostgreSQL provisioning needed.
- If DB: PostgreSQL instance sized, backed up, DR-tested.
- Dependencies: Review Kong Enterprise feature needs.
US-KONG-03 — decK GitOps pipeline
- Type: Story
- Parent epic: EP-KONG-01
- Priority: High
- Story points: 5
- As a platform engineer I want Kong configuration applied automatically from Git so that no manual admin-API writes reach prod.
- Acceptance criteria:
- CI job runs
deck gateway syncon merge tomain(staging) and on release tag (prod, with approval gate). - Admin API writes denied from outside CI via NetworkPolicy + token scoping.
- PR pipeline runs
deck file validate+ contract lint.
- CI job runs
- Dependencies: CI runner with admin API access.
US-KONG-04 — Configuration contract lint in CI
- Type: Story
- Parent epic: EP-KONG-01
- Priority: High
- Story points: 3
- As a platform engineer I want every Kong Route validated against upstream OpenAPI so that route drift does not reach prod.
- Acceptance criteria:
- CI lint fails if a Route references a path not present in the upstream service's OpenAPI.
- CI lint fails if a Route has no auth plugin and no
public:truetag. - CI lint fails if
env/ownertags are missing. - CI lint fails on plaintext-shaped secrets in YAML.
- Dependencies: Upstream services publish OpenAPI artifacts.
US-KONG-05 — Nightly drift detection
- Type: Story
- Parent epic: EP-KONG-01
- Priority: Medium
- Story points: 2
- As an SRE I want automated drift detection so that config changes outside Git are caught within 24 h.
- Acceptance criteria:
- Nightly CI job runs
deck gateway diffagainst staging and prod. - Diff triggers
KongConfigDriftalert.
- Nightly CI job runs
- Dependencies: US-KONG-03.
US-KONG-06 — Route catalogue and tagging
- Type: Story
- Parent epic: EP-KONG-01
- Priority: High
- Story points: 5
- As a platform engineer I want every ADR-0001 route prefix represented as a Kong Route with standard tags so that dashboards, alerts, and ownership are consistent.
- Acceptance criteria:
- All 10+ route prefixes from ADR-0001 §3 present in prod config.
- Each Route tagged
env:,owner:,tier:. - Route naming follows
rt-<svc>-<ver>-<purpose>.
- Dependencies: Upstream services DNS-resolvable.
EP-KONG-02 — Migrate authentication to Kong plugins
- Type: Epic
- Status: To Do
- Priority: Highest
- Description: Replace authentication responsibilities of the retired custom api-gateway with Kong's
jwtandkey-authplugins. Pull JWKS fromauth-service. Optionally author customghasi-api-key-lookupplugin for runtime API-key resolution without per-consumer Kong rows. - Acceptance criteria:
- JWT plugin on every JWT route with JWKS fetch from
auth-service. - Key-auth plugin on every API-key route;
X-Api-Keystripped before upstream. - Custom plugin (if built) ≥ 80 % unit coverage, pinned, code-reviewed.
- No endpoint reachable without auth except explicitly-tagged public paths.
- JWT plugin on every JWT route with JWKS fetch from
- Stories: US-KONG-07..US-KONG-12
US-KONG-07 — Wire jwt plugin against auth-service JWKS
- Type: Story
- Parent epic: EP-KONG-02
- Priority: Highest
- Story points: 3
- As a customer I want my Firebase/Ghasi-issued JWT accepted by Kong so that my API requests authenticate at the edge.
- Acceptance criteria:
- RS256 only;
iss,aud,exp,kidvalidated. - JWKS cached 5 min; refreshed on
kidmiss. 401problem+json on invalid/expired tokens.
- RS256 only;
- Dependencies:
auth-serviceJWKS live.
US-KONG-08 — Wire key-auth plugin for API-key routes
- Type: Story
- Parent epic: EP-KONG-02
- Priority: Highest
- Story points: 3
- As a customer I want my API key accepted by Kong so that server-to-server integrations authenticate without Firebase.
- Acceptance criteria:
X-Api-Keyvalidated;hide_credentials=trueso it never reaches upstream.401on invalid/revoked key.- Rotation verified: two keys active simultaneously during rotation window.
- Dependencies:
auth-serviceAPI-key lifecycle.
US-KONG-09 — Author custom plugin ghasi-api-key-lookup
- Type: Story
- Parent epic: EP-KONG-02
- Priority: High
- Story points: 8
- As a platform engineer I want runtime API-key resolution against
auth-serviceso that we avoid pre-provisioning a Kong Consumer per customer key. - Acceptance criteria:
- Plugin resolves key via hashed lookup, 60s LRU cache, fail-closed on errors.
- Injects
X-Account-Id,X-Api-Key-Id,X-Tierto upstream. - Emits Prometheus counters and histograms per EVENT_SCHEMAS §4.
- Unit test coverage ≥ 80 %.
- Dependencies:
auth-service/internal/api-keys/resolveendpoint.
US-KONG-10 — Strip sensitive headers before upstream
- Type: Story
- Parent epic: EP-KONG-02
- Priority: High
- Story points: 2
- As a security reviewer I want credentials removed from upstream traffic so that upstream logs cannot leak them.
- Acceptance criteria:
X-Api-Keynot present in upstream request.- Integration test asserts absence in
sms-orchestratorlogs.
- Dependencies: US-KONG-08.
US-KONG-11 — Admin route IP allow-listing
- Type: Story
- Parent epic: EP-KONG-02
- Priority: Medium
- Story points: 2
- As an SRE I want
/admin/*restricted to trusted IPs so that admin dashboard cannot be probed from the internet. - Acceptance criteria:
ip-restrictionplugin on/admin/*with SRE + corp-VPN CIDRs.- Rejects return
403problem+json.
- Dependencies: IP list from SRE.
US-KONG-12 — JWT algorithm allow-list (RS256-only)
- Type: Story
- Parent epic: EP-KONG-02
- Priority: High
- Story points: 1
- As a security reviewer I want HS256 disabled on JWT validation so that a leaked shared secret cannot forge tokens.
- Acceptance criteria:
jwtplugin config pinsalgorithms: [RS256].- Integration test rejects an
alg=HS256token.
- Dependencies: US-KONG-07.
EP-KONG-03 — Migrate rate limiting to Kong
- Type: Epic
- Status: To Do
- Priority: High
- Description: Implement per-API-key, per-account, per-operator, and global rate limits using Kong's
rate-limiting-advancedplugin with Redis backend. Enforce fail-closed on write endpoints, fail-open on read. - Acceptance criteria:
- Per-tier limits derived from
auth-serviceand applied in decK. - Redis
kong:rl:*namespace isolated from application keys. - Fail-mode matrix documented and integration-tested.
- 429 responses include
Retry-Afterand rate-limit headers.
- Per-tier limits derived from
- Stories: US-KONG-13..US-KONG-18
US-KONG-13 — Deploy Redis cluster for rate limiting
- Type: Story
- Parent epic: EP-KONG-03
- Priority: High
- Story points: 3
- As an SRE I want a Redis cluster available to Kong so that the rate-limiter has a persistence-capable backend.
- Acceptance criteria:
- 3-node Redis cluster in staging + prod; TLS + auth.
kong:rl:*keyspace reserved.- Redis credentials in Kong vault.
- Dependencies: Infra Redis provisioning.
US-KONG-14 — Per-API-key and per-account limits
- Type: Story
- Parent epic: EP-KONG-03
- Priority: High
- Story points: 5
- As a platform operator I want rate limits applied per API key and per account so that one abusive key cannot degrade others.
- Acceptance criteria:
rate-limiting-advancedwithidentifier=consumer; tier-specific limits.- Integration test: burst to 2× limit →
429after the limit. Retry-Afterheader present.
- Dependencies: US-KONG-13, US-KONG-08/09.
US-KONG-15 — Global burst guardrail
- Type: Story
- Parent epic: EP-KONG-03
- Priority: Medium
- Story points: 2
- As an SRE I want a platform-wide burst limit so that a traffic spike cannot exhaust upstream capacity.
- Acceptance criteria:
- Global limit of 5 000 req/s (configurable).
- Alert when sustained > 50 % of guardrail.
- Dependencies: US-KONG-14.
US-KONG-16 — Per-operator TPS fencing
- Type: Story
- Parent epic: EP-KONG-03
- Priority: Medium
- Story points: 5
- As a platform operator I want per-operator TPS gates at the edge so that we respect MNO contractual limits.
- Acceptance criteria:
- Operator limits read from
operator-management-serviceinto decK at generation time. - Tested with synthetic traffic for two operator IDs.
- Operator limits read from
- Dependencies:
operator-management-serviceconfigs.
US-KONG-17 — Fail-mode matrix (closed on write, open on read)
- Type: Story
- Parent epic: EP-KONG-03
- Priority: High
- Story points: 3
- As an SRE I want explicit fail behaviour when Redis is down so that outages degrade predictably.
- Acceptance criteria:
/v1/sms/send,/v1/sms/bulk,/v1/auth/loginfail-closed.- Read-only analytics and GET endpoints fail-open.
- Chaos test with Redis down verifies behaviour.
- Dependencies: US-KONG-14.
US-KONG-18 — Anti-abuse on /v1/auth/login
- Type: Story
- Parent epic: EP-KONG-03
- Priority: High
- Story points: 2
- As a security reviewer I want credential-stuffing resistance at the edge so that brute-force is limited before reaching
auth-service. - Acceptance criteria:
- Per-IP limit 5/min on login.
- Bot-detection plugin blocks obvious bot UAs.
- Dependencies: US-KONG-14.
EP-KONG-04 — Kong observability and SRE readiness
- Type: Epic
- Status: To Do
- Priority: High
- Description: Make Kong fully observable before cutover: Prometheus metrics, Loki logs, OTel traces, Grafana dashboards, alerts, runbooks.
- Acceptance criteria:
- All metrics in EVENT_SCHEMAS §4 live.
- 6 dashboards in Grafana.
- All 10 alerts in OBSERVABILITY §6 wired.
- Runbooks authored for each alert.
- Stories: US-KONG-19..US-KONG-24
US-KONG-19 — Enable prometheus plugin and scrape
- Type: Story
- Parent epic: EP-KONG-04
- Priority: High
- Story points: 2
- As an SRE I want Prometheus scraping Kong metrics so that we can build dashboards and alerts.
- Acceptance criteria:
/metricsexposed on internal port.- ServiceMonitor or scrape config deployed.
- Metrics visible in Prometheus.
- Dependencies: Prometheus in cluster.
US-KONG-20 — Enable opentelemetry plugin
- Type: Story
- Parent epic: EP-KONG-04
- Priority: High
- Story points: 2
- As an SRE I want Kong spans chained to upstream spans so that we can trace requests end-to-end.
- Acceptance criteria:
- Kong emits
kong.requestserver spans with attributes per spec. traceparentpropagated to upstream; upstream spans are children.
- Kong emits
- Dependencies: OTel collector.
US-KONG-21 — Enable http-log plugin to Loki
- Type: Story
- Parent epic: EP-KONG-04
- Priority: High
- Story points: 2
- As an SRE I want structured access logs in Loki so that we can triage with LogQL.
- Acceptance criteria:
- One JSON object per request with the schema in EVENT_SCHEMAS §3.
- Body logging disabled (verified test).
- Retention 14 d hot / 90 d archive.
- Dependencies: Loki endpoint.
US-KONG-22 — Build Grafana dashboards
- Type: Story
- Parent epic: EP-KONG-04
- Priority: High
- Story points: 5
- As an SRE I want 6 prebuilt Kong dashboards so that we can observe the edge at a glance.
- Acceptance criteria:
kong-overview,kong-route-drilldown,kong-auth,kong-rate-limit,kong-plugin-latency,kong-resourcedashboards live and versioned inops/grafana/dashboards/kong/.
- Dependencies: US-KONG-19/20/21.
US-KONG-23 — Wire alerts and runbooks
- Type: Story
- Parent epic: EP-KONG-04
- Priority: High
- Story points: 5
- As an on-call SRE I want alerts paging the right rotation with a runbook linked so that I can respond within SLA.
- Acceptance criteria:
- 10 alerts from OBSERVABILITY §6 deployed to Alertmanager.
- Runbook authored per alert under
docs/ops/runbooks/kong/. - Page routing verified with a synthetic firing.
- Dependencies: US-KONG-22.
US-KONG-24 — Synthetic probes
- Type: Story
- Parent epic: EP-KONG-04
- Priority: Medium
- Story points: 3
- As an SRE I want continuous synthetic probes so that silent failures are caught before customers notice.
- Acceptance criteria:
- Blackbox exporter probes
/healthevery 30 s from ≥ 2 regions. - Synthetic
POST /v1/sms/sendevery 5 min in staging + prod with dedicated internal key.
- Blackbox exporter probes
- Dependencies: Blackbox exporter; internal key.
EP-KONG-05 — Cutover and decommission custom api-gateway
- Type: Epic
- Status: To Do
- Priority: High
- Description: Execute the migration plan: dual-running window behind Cloudflare, canary ramp, then decommission the retired NestJS api-gateway service.
- Acceptance criteria:
- 7 consecutive days at 100 % Kong with SLOs met.
- NestJS api-gateway scaled to zero and manifests removed.
- Post-migration updates applied to
01-enterprise-architecture.md.
- Stories: US-KONG-25..US-KONG-29
US-KONG-25 — Cloudflare dual-origin routing
- Type: Story
- Parent epic: EP-KONG-05
- Priority: Highest
- Story points: 3
- As an SRE I want Cloudflare to weight traffic between Kong and the legacy gateway so that cutover is incremental and reversible.
- Acceptance criteria:
- Weighted origin rules live; weight adjustable in < 1 min.
- Both origins reachable; synthetic probes green for each.
- Dependencies: Kong in prod with 0 % weight.
US-KONG-26 — Canary ramp 5 → 25 → 50 → 100 %
- Type: Story
- Parent epic: EP-KONG-05
- Priority: Highest
- Story points: 5
- As an SRE I want a staged canary ramp so that regressions are contained.
- Acceptance criteria:
- Each step held for the duration in MIGRATION_PLAN §6.
- Side-by-side Grafana view confirms parity each step.
- Any SLO breach → auto rollback to 0 %.
- Dependencies: US-KONG-25, US-KONG-22.
US-KONG-27 — Parity test harness
- Type: Story
- Parent epic: EP-KONG-05
- Priority: High
- Story points: 5
- As a platform engineer I want automated request parity checks so that Kong's response semantics match the legacy gateway.
- Acceptance criteria:
- Test harness sends the same request through both gateways and diff-compares responses (ignoring latency / trace headers).
- Runs hourly during dual-run window.
- Dependencies: US-KONG-25.
US-KONG-28 — Scale NestJS api-gateway to zero
- Type: Story
- Parent epic: EP-KONG-05
- Priority: High
- Story points: 2
- As an SRE I want the legacy gateway warm-standby then removed so that we retain a rollback path for 7 days post-cutover.
- Acceptance criteria:
- After T+0, scale to zero replicas at T+7 d.
- After T+14 d, manifests deleted; code folder archived.
- Dependencies: US-KONG-26.
US-KONG-29 — Post-migration doc sweep
- Type: Story
- Parent epic: EP-KONG-05
- Priority: Medium
- Story points: 2
- As a documentation owner I want platform docs updated to reflect Kong as the only edge so that readers are not misled by stale NestJS gateway references.
- Acceptance criteria:
01-enterprise-architecture.mdchange log entry marking migration complete.- Legacy dashboards / alerts retired.
services/api-gateway/_sources/retained for audit but marked archived.
- Dependencies: US-KONG-28.
2. Updated or retired epics/stories from legacy sources
Source files: _sources/api-gateway/epics.md, _sources/api-gateway/user_stories.md.
Applied status per ADR-0001: the custom NestJS api-gateway service is retired; its responsibilities are redistributed.
2.1 Legacy epics
| Legacy | Title | New status |
|---|---|---|
| GW-EPIC-001 | Authentication & Authorisation | Moved → EP-KONG-02 (Kong JWT + key-auth plugins) and to auth-service (RBAC, account scope). Legacy NestJS middleware obsolete. |
| GW-EPIC-002 | SMS Send & Idempotency | Split. Edge concerns (rate limiting, auth, correlation) moved to Kong (EP-KONG-02/03). Validation, idempotency, NATS publish moved to sms-orchestrator (tracked in that service's epics). |
| GW-EPIC-003 | Message Status | Moved to sms-orchestrator API surface; Kong only routes. |
| GW-EPIC-004 | API Key Management | Moved to auth-service. Kong consumes resolved key context via custom plugin (US-KONG-09). |
| GW-EPIC-005 | Billing & Invoicing | Moved to billing-service. Kong only routes /v1/billing/*, /v1/invoices/*. |
| GW-EPIC-006 | Webhook Management | Moved to webhook-dispatcher. Kong only routes /v1/webhooks/*. |
| GW-EPIC-007 | Observability | Split. Edge observability → EP-KONG-04 (Kong Prom/Loki/OTel). Upstream observability stays per-service. Legacy NestJS /metrics endpoint obsolete. |
2.2 Legacy user stories
| Legacy | Description | New status |
|---|---|---|
| GW-US-001 | Firebase JWT auth | Moved → US-KONG-07 |
| GW-US-002 | API-key auth | Moved → US-KONG-08 / US-KONG-09 |
| GW-US-003 | RBAC role resolution | Moved to auth-service (role at edge not enforced by Kong beyond coarse claim presence) |
| GW-US-004 | SMS send | Moved to sms-orchestrator (validation) + US-KONG-06 (route) |
| GW-US-005 | Rate-limit per account | Moved → US-KONG-14 |
| GW-US-006 | Idempotency key | Moved to sms-orchestrator |
| GW-US-007 | Message status GET | Moved to sms-orchestrator |
| GW-US-008 | API key create | Moved to auth-service |
| GW-US-009 | API key revoke | Moved to auth-service (cache invalidation surfaces via US-KONG-09) |
| GW-US-010 | Billing usage view | Moved to billing-service |
| GW-US-011 | Invoice list | Moved to billing-service |
| GW-US-012 | Webhook test | Moved to webhook-dispatcher (Kong routes only) |
| GW-US-013 | Prometheus metrics | Obsolete for custom gateway. Edge metrics covered by US-KONG-19; upstream metrics stay per-service. |
| GW-US-014 | OTel trace propagation | Split. Edge side: US-KONG-20. Upstream side: per-service. |
| GW-US-015 | K8s health probes | Obsolete for custom gateway. Kong has its own /health, /ready, /status. Upstream readiness (Postgres/Redis/NATS) remains with each service. |
2.3 Net summary
- Obsolete: All legacy GW-EPIC-007 tasks tied to a NestJS process (
/metricsendpoint, custom health module, NestJS guard implementations, custom Redis rate limiter, custom Firebase verify). - Moved to other services: Idempotency, payload validation, NATS publish →
sms-orchestrator. RBAC →auth-service. API-key CRUD →auth-service. Billing/webhook/message APIs → respective owner services. - Superseded by new Kong epics: Edge auth, rate limiting, observability, routing — all covered by EP-KONG-01 through EP-KONG-05.
No legacy story survives in its original form. All legacy GW-US-* items are either retired or re-homed per the table above.
EP-KONG-06 · Adaptive Edge Defence (JA3, per-tenant adaptive rate-limit, tarpit lane)
Context: Cloudflare WAF protects against generic attacks; Kong must add identity-aware adaptive defence at the platform edge, sized for a national backbone. This epic introduces JA3/JA4 client fingerprinting, per-tenant adaptive rate-limit shaping, and a tarpit lane for known-bad clients (deliberate slow responses to drain attacker resources without 429-blocking that just rotates IPs).
US-KONG-030 · Per-Tenant Adaptive Rate-Limit (multi-dimensional)
Type: Feature | Points: 8
Description:
As Kong, I need rate-limit dimensions of (consumer_id, api_key_id, tenant_id, route, ip) simultaneously so that a single noisy consumer cannot starve the rest of a tenant and a single tenant cannot starve the platform.
Acceptance Criteria:
- Custom Kong plugin
ghasi-adaptive-ratelimitconsults Redis keyskong:rl:{dim}:{value}:{window}per dimension; the most-restrictive limit wins. - Per-tenant defaults: 1000 RPS, configurable per tenant tier in
auth.tenant_rate_limitsand pulled by the plugin every 60 s. - Burst budget: 2× sustained for ≤ 10 s; tracked via leaky-bucket in Redis.
- On 429, response includes
Retry-After,X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset. - Metric
kong_ratelimit_throttled_total{dimension,tenant,tier}increments on each 429. - Integration test: 1500 RPS for a P3-tier tenant gets ≥ 30% 429 within 5 s.
US-KONG-031 · JA3 / JA4 Fingerprint Capture and Storage
Type: Feature | Points: 5
Description: As a security engineer, I need every TLS handshake fingerprinted with JA3 and JA4 so that abusive clients can be denied even when they rotate IPs and API keys.
Acceptance Criteria:
- Cloudflare passes
cf-ja3-hashandcf-ja4headers to Kong (or compute via NGINX module if Cloudflare is bypassed). - Kong stores
ja3_hash,ja4,ip,consumer_id,tenant_id,route,status_code,latency_msin a sample of every Nth (configurable, default 10) request to the access log. - Top-N fingerprints exposed at
/internal/edge/fingerprints?top=50&since=1h(mTLS, security-team only). - Fingerprint sampling does not add > 1 ms to P99 request latency.
US-KONG-032 · Fingerprint-Based Deny List with TTL
Type: Feature | Points: 5
Description: As a security engineer, I need to push a JA3/JA4 fingerprint or IP/CIDR into a deny list with TTL so that observed attackers are blocked at the edge until the TTL expires.
Acceptance Criteria:
-
POST /v1/internal/edge/deny(mTLS, security-team only) accepts{ kind: "ja3"|"ja4"|"ip"|"cidr", value, ttlSeconds, reason, ticketUrl }. - Plugin checks deny list (Redis set with TTL) on every request; blocked requests get 403 with
code: "EDGE_DENIED". - Deny entries audit-logged to
auth.audit_logand emitedge.deny.added.v1NATS event. -
DELETE /v1/internal/edge/deny/:idremoves entry;GET /v1/internal/edge/denylists all active.
US-KONG-033 · Tarpit Lane for Known-Bad Clients
Type: Feature | Points: 5
Description: As a security engineer, I need a tarpit lane that intentionally slow-responds to known-bad fingerprints/IPs so that attackers' connection slots are tied up without giving them the explicit 429 signal that would prompt them to rotate.
Acceptance Criteria:
- Plugin checks
tarpit:{ja3hash|ip}Redis key; if present, response is delayedmin(2^reqIndex, 30)seconds with HTTP 200 +{"status":"queued"}and no actual upstream call. - Tarpit entries are pushed by
fraud-intel-serviceviaPOST /v1/internal/edge/tarpitand TTL-managed (default 1 h). - Metric
kong_tarpit_active_connectionsgauge; alert if > 1000 sustained for 5 min (suggests under-attack). - Tarpit traffic is not counted against rate-limit budgets (so legitimate users behind same NAT remain unaffected).
US-KONG-034 · Geo + ASN Allow/Deny Hints from Cloudflare
Type: Feature | Points: 3
Description: As a security engineer, I need country and ASN hints from Cloudflare available to Kong policies so that origin-aware rules can be applied (e.g., admin endpoints AF-only, regulator endpoints ATRA-ASN-only).
Acceptance Criteria:
-
cf-ipcountryandcf-iporg/cf-asnheaders trusted only when source IP is in Cloudflare CIDR list. - Per-route allow/deny lists for country/ASN configurable via Kong declarative config.
- Admin route
/v1/admin/*denies all non-AF country by default. - Regulator route
/v1/regulator/*allows only the ATRA ASN allow-list (env-configurable).
US-KONG-035 · Edge Defence Dashboard
Type: Feature | Points: 3
Description: As a SOC/NOC analyst, I need a Grafana dashboard showing top fingerprints, top IPs, deny-list size, tarpit population, 429 rate by tenant/tier, and country distribution.
Acceptance Criteria:
- Dashboard
dashboards/edge-defence.jsoncommitted. - Panels: top-20 JA3 by request count (24h), top-20 IP by 4xx count, deny-list size over time, tarpit gauge, 429 rate, country/ASN treemap.
- Linked alert:
EdgeDeniedSpikefires when 403 EDGE_DENIED rate > 10×/min.
EP-KONG-07 · mTLS Upstream Policy for Sensitive Routes
Context: Kong already terminates TLS for clients. For routes that touch compliance, regulator, CBC, and sender-ID, the upstream connection must be mTLS-encrypted with SPIFFE workload identities so that compromise of one node pool cannot read sensitive traffic. Aligns with
EP-PLAT-NB-05.
US-KONG-036 · Upstream mTLS for Compliance, Regulator, and CBC routes
Type: Feature | Points: 5
Description:
As a security engineer, I need Kong to mTLS to compliance-engine, regulator-portal-service, and cbc-bridge-service upstreams using SPIFFE-issued client SVIDs so that east-west traffic to these high-sensitivity services is bound to workload identity, not network position.
Acceptance Criteria:
- Kong service definitions for these upstreams set
tls_verify: true,client_certificate: spire-issued-svid. - SPIRE agent injects SVIDs as Kubernetes secrets refreshed every 1 hour; Kong reloads on rotation without dropping connections.
- Upstream mismatch (cert, SAN) returns 502 with
code: "UPSTREAM_MTLS_FAIL"and emitskong.mtls.upstream.failed.v1. - Integration test verifies: revoked SVID is rejected within 5 min.
US-KONG-037 · Per-Route Upstream Allow-List (Defence in Depth)
Type: Feature | Points: 3
Description:
As a security engineer, I need each Kong route bound to exactly one upstream service identity so that misconfiguration cannot accidentally proxy /v1/regulator/* to the wrong service.
Acceptance Criteria:
- Kong declarative config gate: every route has
upstream.spiffe_idexplicitly set. - CI policy fails if
spiffe_idis missing for routes matching/v1/{compliance,regulator,cbc,sender-id}/*. - Runtime check: plugin asserts upstream certificate SAN matches the configured SPIFFE ID.
US-KONG-038 · Customer-Webhook Reverse mTLS Channel (optional, per-tenant)
Type: Feature | Points: 5
Description: As an enterprise tenant, I want my webhook endpoint to require Ghasi to present a client certificate so that the source of webhook delivery is verifiable beyond shared HMAC.
Acceptance Criteria:
- Tenant config field
webhook.mtlsClientCerttoggles mTLS-on-egress for outbound webhooks (paired withEP-HOOK-08). - Egress proxy presents the platform-wide client SVID (rotated 1 h via SPIRE).
- Tenant-portal page
/webhooks/mtlsshows the platform CA chain and SPIFFE ID for the tenant's allow-list configuration. - Integration test: mTLS-required webhook with platform cert reachable; same webhook with no cert is rejected.