api-gateway (Kong) — Testing Strategy
Status: populated Owner: TBD (Platform / SRE) Last updated: 2026-04-17 Companion: SERVICE_OVERVIEW · API_CONTRACTS · Service Template
1. Purpose
Define how Kong configuration and the custom plugin are tested. Because Kong is configuration (not NestJS code), the usual "80% unit coverage" target does not apply verbatim; instead we target 100 % route coverage by config lint + integration smoke, plus unit tests for the custom plugin.
2. Configuration contract tests
Run in CI on every PR touching ops/kong/ or any upstream service's OpenAPI.
| Check | Tool | Fail condition |
|---|---|---|
| decK YAML syntactic validity | deck file validate | Invalid YAML / schema |
Every Route has an auth plugin (or public:true tag) | Custom script | Route without auth |
| Every Route path corresponds to a documented upstream OpenAPI path | Custom script comparing decK → service OpenAPI | Mismatch |
Every Route/Service is tagged with env, owner | Custom script | Missing tag |
| No Route points to an external host | Custom script | External host |
| No plaintext secrets in YAML | gitleaks + custom regex | Secret-shaped string |
| Upstream host resolves in the target cluster | nslookup in CI (optional) | Unresolvable |
These are the gate: no Kong change merges without passing this matrix.
3. Drift detection (periodic)
A nightly CI job runs deck gateway diff against each live Kong (staging, prod):
- No diff → pass.
- Diff detected →
KongConfigDriftalert + block the next deploy until resolved.
4. Integration tests (staging)
End-to-end tests hitting Kong in staging, covering each route's plugin policies.
| Scenario | Expected |
|---|---|
POST /v1/sms/send with valid JWT | 202 from sms-orchestrator; X-Account-Id forwarded |
POST /v1/sms/send with valid API key | 202; X-Api-Key-Id forwarded; X-Api-Key stripped |
POST /v1/sms/send without auth | 401 problem+json |
POST /v1/sms/send with expired JWT | 401 problem+json |
Burst > per-key limit on /v1/sms/send | 429 + Retry-After + rate-limit headers |
POST /v1/sms/send with 70 KB body | 413 |
POST /v1/auth/login × 6 from same IP within 1 min | 429 after 5th |
/admin/* from non-allow-listed IP | 403 |
/unknown/path | 404 problem+json |
POST /v1/sms/send with upstream down | 503 |
POST /v1/sms/send with Redis down (read-only route still works) | /v1/sms/{id} returns 200 (fail-open); /v1/sms/send returns 503 (fail-closed) |
W3C traceparent end-to-end | Kong + upstream spans linked in trace backend |
Body logging on /v1/sms/send | Not present in Loki |
Tests run on every PR merge and nightly.
5. Smoke tests in CI
| Test | Cadence |
|---|---|
curl https://api.staging.ghasi.io/health returns 200 | Every deploy |
| Synthetic SMS send with internal key | Every 5 min (prod + staging) |
| JWKS fetch from Kong pod | Every deploy |
6. Custom plugin unit tests
For ghasi-api-key-lookup:
- 80 %+ line coverage required (per platform testing standard).
- Test cases:
- Cache hit → no
auth-servicecall. - Cache miss → upstream call, success path.
- Upstream 404 → reject with 401.
- Upstream 5xx → reject with 503 (do not fail-open).
- Upstream timeout → serve cached result if present; else 503.
- Header injection correctness.
- Metric emission correctness.
- Cache hit → no
Tool: whatever the plugin language dictates (busted for Lua, go test for Go).
7. Load / soak tests
k6orvegetascripts against staging at 2× expected peak TPS for 30 min.- Verifies Kong pod HPA scales, rate-limit counters are correct, no memory leak.
- Run before any major Kong version upgrade and quarterly.
8. Security tests
- OWASP ZAP scan against the public edge.
curlsmuggling / header injection tests.- TLS cipher scan via
testssl.sh. - Rate-limit bypass attempts (vary IP, fresh keys, rotate UA).
9. Coverage targets
| Area | Target |
|---|---|
| Config lint | 100 % of routes |
| Integration scenarios | 100 % of plugin combinations in use |
| Custom plugin unit | ≥ 80 % line coverage |
| Drift detection | Runs nightly, < 24 h detection |
10. Open questions
- Should integration suite run against a Kong instance spun up per-PR (ephemeral namespace) or shared staging?
- Contract tests against each upstream OpenAPI — generate route stubs automatically vs hand-maintain?