Skip to main content

api-gateway (Kong) — Population Report

Date: 2026-04-17 Scope: Re-populated all 17 service docs for the re-scoped api-gateway folder per ADR-0001. Folder is now Kong documentation, not a deployable NestJS service.


1. New epics and stories (Kong adoption)

EP-KONG-01 — Adopt Kong as edge gateway

  • Type: Epic
  • Status: To Do
  • Priority: Highest
  • Description: Provision Kong Gateway as the sole north-south HTTP edge for Ghasi-SMS-Gateway. DB-less mode, decK-managed configuration under ops/kong/, CI lint against upstream OpenAPI. Deliver staging first, production second.
  • Acceptance criteria:
    • Kong deployed in staging and production (2–6 replicas with HPA).
    • All route prefixes from ADR-0001 §3 wired to upstream services.
    • decK GitOps pipeline green; nightly deck diff returns no drift.
    • CI lint enforces: auth plugin per route, tags, no plaintext secrets, route ↔ upstream OpenAPI parity.
    • SERVICE_READINESS §2.1 + §2.2 check-list green.
  • Stories: US-KONG-01..US-KONG-06

US-KONG-01 — Provision Kong in staging (DB-less)

  • Type: Story
  • Parent epic: EP-KONG-01
  • Priority: Highest
  • Story points: 5
  • As a platform engineer I want Kong running in the staging cluster in DB-less mode so that we have a reproducible edge environment for route + plugin validation.
  • Acceptance criteria:
    • Helm chart or manifests in ops/kong/ deploy 2 Kong replicas in staging.
    • Kong loads ops/kong/staging.kong.yaml via ConfigMap on start; reloads on SIGHUP.
    • /health and /ready responsive from inside the cluster.
  • Dependencies: Staging cluster ready; Redis available.

US-KONG-02 — Decide DB-less vs DB mode for prod

  • Type: Story
  • Parent epic: EP-KONG-01
  • Priority: High
  • Story points: 3
  • As an SRE lead I want a recorded decision on Kong DB-less vs DB mode so that prod topology is final before cutover.
  • Acceptance criteria:
    • ADR (or updated ADR-0001) captures decision + rationale.
    • If DB-less (default): no PostgreSQL provisioning needed.
    • If DB: PostgreSQL instance sized, backed up, DR-tested.
  • Dependencies: Review Kong Enterprise feature needs.

US-KONG-03 — decK GitOps pipeline

  • Type: Story
  • Parent epic: EP-KONG-01
  • Priority: High
  • Story points: 5
  • As a platform engineer I want Kong configuration applied automatically from Git so that no manual admin-API writes reach prod.
  • Acceptance criteria:
    • CI job runs deck gateway sync on merge to main (staging) and on release tag (prod, with approval gate).
    • Admin API writes denied from outside CI via NetworkPolicy + token scoping.
    • PR pipeline runs deck file validate + contract lint.
  • Dependencies: CI runner with admin API access.

US-KONG-04 — Configuration contract lint in CI

  • Type: Story
  • Parent epic: EP-KONG-01
  • Priority: High
  • Story points: 3
  • As a platform engineer I want every Kong Route validated against upstream OpenAPI so that route drift does not reach prod.
  • Acceptance criteria:
    • CI lint fails if a Route references a path not present in the upstream service's OpenAPI.
    • CI lint fails if a Route has no auth plugin and no public:true tag.
    • CI lint fails if env / owner tags are missing.
    • CI lint fails on plaintext-shaped secrets in YAML.
  • Dependencies: Upstream services publish OpenAPI artifacts.

US-KONG-05 — Nightly drift detection

  • Type: Story
  • Parent epic: EP-KONG-01
  • Priority: Medium
  • Story points: 2
  • As an SRE I want automated drift detection so that config changes outside Git are caught within 24 h.
  • Acceptance criteria:
    • Nightly CI job runs deck gateway diff against staging and prod.
    • Diff triggers KongConfigDrift alert.
  • Dependencies: US-KONG-03.

US-KONG-06 — Route catalogue and tagging

  • Type: Story
  • Parent epic: EP-KONG-01
  • Priority: High
  • Story points: 5
  • As a platform engineer I want every ADR-0001 route prefix represented as a Kong Route with standard tags so that dashboards, alerts, and ownership are consistent.
  • Acceptance criteria:
    • All 10+ route prefixes from ADR-0001 §3 present in prod config.
    • Each Route tagged env:, owner:, tier:.
    • Route naming follows rt-<svc>-<ver>-<purpose>.
  • Dependencies: Upstream services DNS-resolvable.

EP-KONG-02 — Migrate authentication to Kong plugins

  • Type: Epic
  • Status: To Do
  • Priority: Highest
  • Description: Replace authentication responsibilities of the retired custom api-gateway with Kong's jwt and key-auth plugins. Pull JWKS from auth-service. Optionally author custom ghasi-api-key-lookup plugin for runtime API-key resolution without per-consumer Kong rows.
  • Acceptance criteria:
    • JWT plugin on every JWT route with JWKS fetch from auth-service.
    • Key-auth plugin on every API-key route; X-Api-Key stripped before upstream.
    • Custom plugin (if built) ≥ 80 % unit coverage, pinned, code-reviewed.
    • No endpoint reachable without auth except explicitly-tagged public paths.
  • Stories: US-KONG-07..US-KONG-12

US-KONG-07 — Wire jwt plugin against auth-service JWKS

  • Type: Story
  • Parent epic: EP-KONG-02
  • Priority: Highest
  • Story points: 3
  • As a customer I want my Firebase/Ghasi-issued JWT accepted by Kong so that my API requests authenticate at the edge.
  • Acceptance criteria:
    • RS256 only; iss, aud, exp, kid validated.
    • JWKS cached 5 min; refreshed on kid miss.
    • 401 problem+json on invalid/expired tokens.
  • Dependencies: auth-service JWKS live.

US-KONG-08 — Wire key-auth plugin for API-key routes

  • Type: Story
  • Parent epic: EP-KONG-02
  • Priority: Highest
  • Story points: 3
  • As a customer I want my API key accepted by Kong so that server-to-server integrations authenticate without Firebase.
  • Acceptance criteria:
    • X-Api-Key validated; hide_credentials=true so it never reaches upstream.
    • 401 on invalid/revoked key.
    • Rotation verified: two keys active simultaneously during rotation window.
  • Dependencies: auth-service API-key lifecycle.

US-KONG-09 — Author custom plugin ghasi-api-key-lookup

  • Type: Story
  • Parent epic: EP-KONG-02
  • Priority: High
  • Story points: 8
  • As a platform engineer I want runtime API-key resolution against auth-service so that we avoid pre-provisioning a Kong Consumer per customer key.
  • Acceptance criteria:
    • Plugin resolves key via hashed lookup, 60s LRU cache, fail-closed on errors.
    • Injects X-Account-Id, X-Api-Key-Id, X-Tier to upstream.
    • Emits Prometheus counters and histograms per EVENT_SCHEMAS §4.
    • Unit test coverage ≥ 80 %.
  • Dependencies: auth-service /internal/api-keys/resolve endpoint.

US-KONG-10 — Strip sensitive headers before upstream

  • Type: Story
  • Parent epic: EP-KONG-02
  • Priority: High
  • Story points: 2
  • As a security reviewer I want credentials removed from upstream traffic so that upstream logs cannot leak them.
  • Acceptance criteria:
    • X-Api-Key not present in upstream request.
    • Integration test asserts absence in sms-orchestrator logs.
  • Dependencies: US-KONG-08.

US-KONG-11 — Admin route IP allow-listing

  • Type: Story
  • Parent epic: EP-KONG-02
  • Priority: Medium
  • Story points: 2
  • As an SRE I want /admin/* restricted to trusted IPs so that admin dashboard cannot be probed from the internet.
  • Acceptance criteria:
    • ip-restriction plugin on /admin/* with SRE + corp-VPN CIDRs.
    • Rejects return 403 problem+json.
  • Dependencies: IP list from SRE.

US-KONG-12 — JWT algorithm allow-list (RS256-only)

  • Type: Story
  • Parent epic: EP-KONG-02
  • Priority: High
  • Story points: 1
  • As a security reviewer I want HS256 disabled on JWT validation so that a leaked shared secret cannot forge tokens.
  • Acceptance criteria:
    • jwt plugin config pins algorithms: [RS256].
    • Integration test rejects an alg=HS256 token.
  • Dependencies: US-KONG-07.

EP-KONG-03 — Migrate rate limiting to Kong

  • Type: Epic
  • Status: To Do
  • Priority: High
  • Description: Implement per-API-key, per-account, per-operator, and global rate limits using Kong's rate-limiting-advanced plugin with Redis backend. Enforce fail-closed on write endpoints, fail-open on read.
  • Acceptance criteria:
    • Per-tier limits derived from auth-service and applied in decK.
    • Redis kong:rl:* namespace isolated from application keys.
    • Fail-mode matrix documented and integration-tested.
    • 429 responses include Retry-After and rate-limit headers.
  • Stories: US-KONG-13..US-KONG-18

US-KONG-13 — Deploy Redis cluster for rate limiting

  • Type: Story
  • Parent epic: EP-KONG-03
  • Priority: High
  • Story points: 3
  • As an SRE I want a Redis cluster available to Kong so that the rate-limiter has a persistence-capable backend.
  • Acceptance criteria:
    • 3-node Redis cluster in staging + prod; TLS + auth.
    • kong:rl:* keyspace reserved.
    • Redis credentials in Kong vault.
  • Dependencies: Infra Redis provisioning.

US-KONG-14 — Per-API-key and per-account limits

  • Type: Story
  • Parent epic: EP-KONG-03
  • Priority: High
  • Story points: 5
  • As a platform operator I want rate limits applied per API key and per account so that one abusive key cannot degrade others.
  • Acceptance criteria:
    • rate-limiting-advanced with identifier=consumer; tier-specific limits.
    • Integration test: burst to 2× limit → 429 after the limit.
    • Retry-After header present.
  • Dependencies: US-KONG-13, US-KONG-08/09.

US-KONG-15 — Global burst guardrail

  • Type: Story
  • Parent epic: EP-KONG-03
  • Priority: Medium
  • Story points: 2
  • As an SRE I want a platform-wide burst limit so that a traffic spike cannot exhaust upstream capacity.
  • Acceptance criteria:
    • Global limit of 5 000 req/s (configurable).
    • Alert when sustained > 50 % of guardrail.
  • Dependencies: US-KONG-14.

US-KONG-16 — Per-operator TPS fencing

  • Type: Story
  • Parent epic: EP-KONG-03
  • Priority: Medium
  • Story points: 5
  • As a platform operator I want per-operator TPS gates at the edge so that we respect MNO contractual limits.
  • Acceptance criteria:
    • Operator limits read from operator-management-service into decK at generation time.
    • Tested with synthetic traffic for two operator IDs.
  • Dependencies: operator-management-service configs.

US-KONG-17 — Fail-mode matrix (closed on write, open on read)

  • Type: Story
  • Parent epic: EP-KONG-03
  • Priority: High
  • Story points: 3
  • As an SRE I want explicit fail behaviour when Redis is down so that outages degrade predictably.
  • Acceptance criteria:
    • /v1/sms/send, /v1/sms/bulk, /v1/auth/login fail-closed.
    • Read-only analytics and GET endpoints fail-open.
    • Chaos test with Redis down verifies behaviour.
  • Dependencies: US-KONG-14.

US-KONG-18 — Anti-abuse on /v1/auth/login

  • Type: Story
  • Parent epic: EP-KONG-03
  • Priority: High
  • Story points: 2
  • As a security reviewer I want credential-stuffing resistance at the edge so that brute-force is limited before reaching auth-service.
  • Acceptance criteria:
    • Per-IP limit 5/min on login.
    • Bot-detection plugin blocks obvious bot UAs.
  • Dependencies: US-KONG-14.

EP-KONG-04 — Kong observability and SRE readiness

  • Type: Epic
  • Status: To Do
  • Priority: High
  • Description: Make Kong fully observable before cutover: Prometheus metrics, Loki logs, OTel traces, Grafana dashboards, alerts, runbooks.
  • Acceptance criteria:
    • All metrics in EVENT_SCHEMAS §4 live.
    • 6 dashboards in Grafana.
    • All 10 alerts in OBSERVABILITY §6 wired.
    • Runbooks authored for each alert.
  • Stories: US-KONG-19..US-KONG-24

US-KONG-19 — Enable prometheus plugin and scrape

  • Type: Story
  • Parent epic: EP-KONG-04
  • Priority: High
  • Story points: 2
  • As an SRE I want Prometheus scraping Kong metrics so that we can build dashboards and alerts.
  • Acceptance criteria:
    • /metrics exposed on internal port.
    • ServiceMonitor or scrape config deployed.
    • Metrics visible in Prometheus.
  • Dependencies: Prometheus in cluster.

US-KONG-20 — Enable opentelemetry plugin

  • Type: Story
  • Parent epic: EP-KONG-04
  • Priority: High
  • Story points: 2
  • As an SRE I want Kong spans chained to upstream spans so that we can trace requests end-to-end.
  • Acceptance criteria:
    • Kong emits kong.request server spans with attributes per spec.
    • traceparent propagated to upstream; upstream spans are children.
  • Dependencies: OTel collector.

US-KONG-21 — Enable http-log plugin to Loki

  • Type: Story
  • Parent epic: EP-KONG-04
  • Priority: High
  • Story points: 2
  • As an SRE I want structured access logs in Loki so that we can triage with LogQL.
  • Acceptance criteria:
    • One JSON object per request with the schema in EVENT_SCHEMAS §3.
    • Body logging disabled (verified test).
    • Retention 14 d hot / 90 d archive.
  • Dependencies: Loki endpoint.

US-KONG-22 — Build Grafana dashboards

  • Type: Story
  • Parent epic: EP-KONG-04
  • Priority: High
  • Story points: 5
  • As an SRE I want 6 prebuilt Kong dashboards so that we can observe the edge at a glance.
  • Acceptance criteria:
    • kong-overview, kong-route-drilldown, kong-auth, kong-rate-limit, kong-plugin-latency, kong-resource dashboards live and versioned in ops/grafana/dashboards/kong/.
  • Dependencies: US-KONG-19/20/21.

US-KONG-23 — Wire alerts and runbooks

  • Type: Story
  • Parent epic: EP-KONG-04
  • Priority: High
  • Story points: 5
  • As an on-call SRE I want alerts paging the right rotation with a runbook linked so that I can respond within SLA.
  • Acceptance criteria:
    • 10 alerts from OBSERVABILITY §6 deployed to Alertmanager.
    • Runbook authored per alert under docs/ops/runbooks/kong/.
    • Page routing verified with a synthetic firing.
  • Dependencies: US-KONG-22.

US-KONG-24 — Synthetic probes

  • Type: Story
  • Parent epic: EP-KONG-04
  • Priority: Medium
  • Story points: 3
  • As an SRE I want continuous synthetic probes so that silent failures are caught before customers notice.
  • Acceptance criteria:
    • Blackbox exporter probes /health every 30 s from ≥ 2 regions.
    • Synthetic POST /v1/sms/send every 5 min in staging + prod with dedicated internal key.
  • Dependencies: Blackbox exporter; internal key.

EP-KONG-05 — Cutover and decommission custom api-gateway

  • Type: Epic
  • Status: To Do
  • Priority: High
  • Description: Execute the migration plan: dual-running window behind Cloudflare, canary ramp, then decommission the retired NestJS api-gateway service.
  • Acceptance criteria:
    • 7 consecutive days at 100 % Kong with SLOs met.
    • NestJS api-gateway scaled to zero and manifests removed.
    • Post-migration updates applied to 01-enterprise-architecture.md.
  • Stories: US-KONG-25..US-KONG-29

US-KONG-25 — Cloudflare dual-origin routing

  • Type: Story
  • Parent epic: EP-KONG-05
  • Priority: Highest
  • Story points: 3
  • As an SRE I want Cloudflare to weight traffic between Kong and the legacy gateway so that cutover is incremental and reversible.
  • Acceptance criteria:
    • Weighted origin rules live; weight adjustable in < 1 min.
    • Both origins reachable; synthetic probes green for each.
  • Dependencies: Kong in prod with 0 % weight.

US-KONG-26 — Canary ramp 5 → 25 → 50 → 100 %

  • Type: Story
  • Parent epic: EP-KONG-05
  • Priority: Highest
  • Story points: 5
  • As an SRE I want a staged canary ramp so that regressions are contained.
  • Acceptance criteria:
    • Each step held for the duration in MIGRATION_PLAN §6.
    • Side-by-side Grafana view confirms parity each step.
    • Any SLO breach → auto rollback to 0 %.
  • Dependencies: US-KONG-25, US-KONG-22.

US-KONG-27 — Parity test harness

  • Type: Story
  • Parent epic: EP-KONG-05
  • Priority: High
  • Story points: 5
  • As a platform engineer I want automated request parity checks so that Kong's response semantics match the legacy gateway.
  • Acceptance criteria:
    • Test harness sends the same request through both gateways and diff-compares responses (ignoring latency / trace headers).
    • Runs hourly during dual-run window.
  • Dependencies: US-KONG-25.

US-KONG-28 — Scale NestJS api-gateway to zero

  • Type: Story
  • Parent epic: EP-KONG-05
  • Priority: High
  • Story points: 2
  • As an SRE I want the legacy gateway warm-standby then removed so that we retain a rollback path for 7 days post-cutover.
  • Acceptance criteria:
    • After T+0, scale to zero replicas at T+7 d.
    • After T+14 d, manifests deleted; code folder archived.
  • Dependencies: US-KONG-26.

US-KONG-29 — Post-migration doc sweep

  • Type: Story
  • Parent epic: EP-KONG-05
  • Priority: Medium
  • Story points: 2
  • As a documentation owner I want platform docs updated to reflect Kong as the only edge so that readers are not misled by stale NestJS gateway references.
  • Acceptance criteria:
    • 01-enterprise-architecture.md change log entry marking migration complete.
    • Legacy dashboards / alerts retired.
    • services/api-gateway/_sources/ retained for audit but marked archived.
  • Dependencies: US-KONG-28.

2. Updated or retired epics/stories from legacy sources

Source files: _sources/api-gateway/epics.md, _sources/api-gateway/user_stories.md.

Applied status per ADR-0001: the custom NestJS api-gateway service is retired; its responsibilities are redistributed.

2.1 Legacy epics

LegacyTitleNew status
GW-EPIC-001Authentication & AuthorisationMoved → EP-KONG-02 (Kong JWT + key-auth plugins) and to auth-service (RBAC, account scope). Legacy NestJS middleware obsolete.
GW-EPIC-002SMS Send & IdempotencySplit. Edge concerns (rate limiting, auth, correlation) moved to Kong (EP-KONG-02/03). Validation, idempotency, NATS publish moved to sms-orchestrator (tracked in that service's epics).
GW-EPIC-003Message StatusMoved to sms-orchestrator API surface; Kong only routes.
GW-EPIC-004API Key ManagementMoved to auth-service. Kong consumes resolved key context via custom plugin (US-KONG-09).
GW-EPIC-005Billing & InvoicingMoved to billing-service. Kong only routes /v1/billing/*, /v1/invoices/*.
GW-EPIC-006Webhook ManagementMoved to webhook-dispatcher. Kong only routes /v1/webhooks/*.
GW-EPIC-007ObservabilitySplit. Edge observability → EP-KONG-04 (Kong Prom/Loki/OTel). Upstream observability stays per-service. Legacy NestJS /metrics endpoint obsolete.

2.2 Legacy user stories

LegacyDescriptionNew status
GW-US-001Firebase JWT authMoved → US-KONG-07
GW-US-002API-key authMoved → US-KONG-08 / US-KONG-09
GW-US-003RBAC role resolutionMoved to auth-service (role at edge not enforced by Kong beyond coarse claim presence)
GW-US-004SMS sendMoved to sms-orchestrator (validation) + US-KONG-06 (route)
GW-US-005Rate-limit per accountMoved → US-KONG-14
GW-US-006Idempotency keyMoved to sms-orchestrator
GW-US-007Message status GETMoved to sms-orchestrator
GW-US-008API key createMoved to auth-service
GW-US-009API key revokeMoved to auth-service (cache invalidation surfaces via US-KONG-09)
GW-US-010Billing usage viewMoved to billing-service
GW-US-011Invoice listMoved to billing-service
GW-US-012Webhook testMoved to webhook-dispatcher (Kong routes only)
GW-US-013Prometheus metricsObsolete for custom gateway. Edge metrics covered by US-KONG-19; upstream metrics stay per-service.
GW-US-014OTel trace propagationSplit. Edge side: US-KONG-20. Upstream side: per-service.
GW-US-015K8s health probesObsolete for custom gateway. Kong has its own /health, /ready, /status. Upstream readiness (Postgres/Redis/NATS) remains with each service.

2.3 Net summary

  • Obsolete: All legacy GW-EPIC-007 tasks tied to a NestJS process (/metrics endpoint, custom health module, NestJS guard implementations, custom Redis rate limiter, custom Firebase verify).
  • Moved to other services: Idempotency, payload validation, NATS publish → sms-orchestrator. RBAC → auth-service. API-key CRUD → auth-service. Billing/webhook/message APIs → respective owner services.
  • Superseded by new Kong epics: Edge auth, rate limiting, observability, routing — all covered by EP-KONG-01 through EP-KONG-05.

No legacy story survives in its original form. All legacy GW-US-* items are either retired or re-homed per the table above.


EP-KONG-06 · Adaptive Edge Defence (JA3, per-tenant adaptive rate-limit, tarpit lane)

Context: Cloudflare WAF protects against generic attacks; Kong must add identity-aware adaptive defence at the platform edge, sized for a national backbone. This epic introduces JA3/JA4 client fingerprinting, per-tenant adaptive rate-limit shaping, and a tarpit lane for known-bad clients (deliberate slow responses to drain attacker resources without 429-blocking that just rotates IPs).

US-KONG-030 · Per-Tenant Adaptive Rate-Limit (multi-dimensional)

Type: Feature | Points: 8

Description: As Kong, I need rate-limit dimensions of (consumer_id, api_key_id, tenant_id, route, ip) simultaneously so that a single noisy consumer cannot starve the rest of a tenant and a single tenant cannot starve the platform.

Acceptance Criteria:

  • Custom Kong plugin ghasi-adaptive-ratelimit consults Redis keys kong:rl:{dim}:{value}:{window} per dimension; the most-restrictive limit wins.
  • Per-tenant defaults: 1000 RPS, configurable per tenant tier in auth.tenant_rate_limits and pulled by the plugin every 60 s.
  • Burst budget: 2× sustained for ≤ 10 s; tracked via leaky-bucket in Redis.
  • On 429, response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.
  • Metric kong_ratelimit_throttled_total{dimension,tenant,tier} increments on each 429.
  • Integration test: 1500 RPS for a P3-tier tenant gets ≥ 30% 429 within 5 s.

US-KONG-031 · JA3 / JA4 Fingerprint Capture and Storage

Type: Feature | Points: 5

Description: As a security engineer, I need every TLS handshake fingerprinted with JA3 and JA4 so that abusive clients can be denied even when they rotate IPs and API keys.

Acceptance Criteria:

  • Cloudflare passes cf-ja3-hash and cf-ja4 headers to Kong (or compute via NGINX module if Cloudflare is bypassed).
  • Kong stores ja3_hash, ja4, ip, consumer_id, tenant_id, route, status_code, latency_ms in a sample of every Nth (configurable, default 10) request to the access log.
  • Top-N fingerprints exposed at /internal/edge/fingerprints?top=50&since=1h (mTLS, security-team only).
  • Fingerprint sampling does not add > 1 ms to P99 request latency.

US-KONG-032 · Fingerprint-Based Deny List with TTL

Type: Feature | Points: 5

Description: As a security engineer, I need to push a JA3/JA4 fingerprint or IP/CIDR into a deny list with TTL so that observed attackers are blocked at the edge until the TTL expires.

Acceptance Criteria:

  • POST /v1/internal/edge/deny (mTLS, security-team only) accepts { kind: "ja3"|"ja4"|"ip"|"cidr", value, ttlSeconds, reason, ticketUrl }.
  • Plugin checks deny list (Redis set with TTL) on every request; blocked requests get 403 with code: "EDGE_DENIED".
  • Deny entries audit-logged to auth.audit_log and emit edge.deny.added.v1 NATS event.
  • DELETE /v1/internal/edge/deny/:id removes entry; GET /v1/internal/edge/deny lists all active.

US-KONG-033 · Tarpit Lane for Known-Bad Clients

Type: Feature | Points: 5

Description: As a security engineer, I need a tarpit lane that intentionally slow-responds to known-bad fingerprints/IPs so that attackers' connection slots are tied up without giving them the explicit 429 signal that would prompt them to rotate.

Acceptance Criteria:

  • Plugin checks tarpit:{ja3hash|ip} Redis key; if present, response is delayed min(2^reqIndex, 30) seconds with HTTP 200 + {"status":"queued"} and no actual upstream call.
  • Tarpit entries are pushed by fraud-intel-service via POST /v1/internal/edge/tarpit and TTL-managed (default 1 h).
  • Metric kong_tarpit_active_connections gauge; alert if > 1000 sustained for 5 min (suggests under-attack).
  • Tarpit traffic is not counted against rate-limit budgets (so legitimate users behind same NAT remain unaffected).

US-KONG-034 · Geo + ASN Allow/Deny Hints from Cloudflare

Type: Feature | Points: 3

Description: As a security engineer, I need country and ASN hints from Cloudflare available to Kong policies so that origin-aware rules can be applied (e.g., admin endpoints AF-only, regulator endpoints ATRA-ASN-only).

Acceptance Criteria:

  • cf-ipcountry and cf-iporg/cf-asn headers trusted only when source IP is in Cloudflare CIDR list.
  • Per-route allow/deny lists for country/ASN configurable via Kong declarative config.
  • Admin route /v1/admin/* denies all non-AF country by default.
  • Regulator route /v1/regulator/* allows only the ATRA ASN allow-list (env-configurable).

US-KONG-035 · Edge Defence Dashboard

Type: Feature | Points: 3

Description: As a SOC/NOC analyst, I need a Grafana dashboard showing top fingerprints, top IPs, deny-list size, tarpit population, 429 rate by tenant/tier, and country distribution.

Acceptance Criteria:

  • Dashboard dashboards/edge-defence.json committed.
  • Panels: top-20 JA3 by request count (24h), top-20 IP by 4xx count, deny-list size over time, tarpit gauge, 429 rate, country/ASN treemap.
  • Linked alert: EdgeDeniedSpike fires when 403 EDGE_DENIED rate > 10×/min.

EP-KONG-07 · mTLS Upstream Policy for Sensitive Routes

Context: Kong already terminates TLS for clients. For routes that touch compliance, regulator, CBC, and sender-ID, the upstream connection must be mTLS-encrypted with SPIFFE workload identities so that compromise of one node pool cannot read sensitive traffic. Aligns with EP-PLAT-NB-05.

US-KONG-036 · Upstream mTLS for Compliance, Regulator, and CBC routes

Type: Feature | Points: 5

Description: As a security engineer, I need Kong to mTLS to compliance-engine, regulator-portal-service, and cbc-bridge-service upstreams using SPIFFE-issued client SVIDs so that east-west traffic to these high-sensitivity services is bound to workload identity, not network position.

Acceptance Criteria:

  • Kong service definitions for these upstreams set tls_verify: true, client_certificate: spire-issued-svid.
  • SPIRE agent injects SVIDs as Kubernetes secrets refreshed every 1 hour; Kong reloads on rotation without dropping connections.
  • Upstream mismatch (cert, SAN) returns 502 with code: "UPSTREAM_MTLS_FAIL" and emits kong.mtls.upstream.failed.v1.
  • Integration test verifies: revoked SVID is rejected within 5 min.

US-KONG-037 · Per-Route Upstream Allow-List (Defence in Depth)

Type: Feature | Points: 3

Description: As a security engineer, I need each Kong route bound to exactly one upstream service identity so that misconfiguration cannot accidentally proxy /v1/regulator/* to the wrong service.

Acceptance Criteria:

  • Kong declarative config gate: every route has upstream.spiffe_id explicitly set.
  • CI policy fails if spiffe_id is missing for routes matching /v1/{compliance,regulator,cbc,sender-id}/*.
  • Runtime check: plugin asserts upstream certificate SAN matches the configured SPIFFE ID.

US-KONG-038 · Customer-Webhook Reverse mTLS Channel (optional, per-tenant)

Type: Feature | Points: 5

Description: As an enterprise tenant, I want my webhook endpoint to require Ghasi to present a client certificate so that the source of webhook delivery is verifiable beyond shared HMAC.

Acceptance Criteria:

  • Tenant config field webhook.mtlsClientCert toggles mTLS-on-egress for outbound webhooks (paired with EP-HOOK-08).
  • Egress proxy presents the platform-wide client SVID (rotated 1 h via SPIRE).
  • Tenant-portal page /webhooks/mtls shows the platform CA chain and SPIFFE ID for the tenant's allow-list configuration.
  • Integration test: mTLS-required webhook with platform cert reachable; same webhook with no cert is rejected.