api-gateway (Kong) — Event Schemas
Status: populated Owner: TBD (Platform / SRE) Last updated: 2026-04-17 Companion: SERVICE_OVERVIEW · OBSERVABILITY · Service Template
1. NATS events
Kong produces no NATS events and consumes no NATS events.
All domain events remain the responsibility of upstream services:
sms.outbound.request— produced bysms-orchestratorsms.dlr.inbound— produced bysmpp-connectorbilling.events,webhook.dispatch— produced bydlr-processorauth.events— produced byauth-service
See each service's EVENT_SCHEMAS.md.
2. What Kong does emit
Kong emits three telemetry streams, none of which are domain events:
- Access logs → Loki (via
http-logplugin). - Metrics → Prometheus (via
prometheusplugin). - Traces → OpenTelemetry collector (via
opentelemetryplugin).
3. Access log schema (Loki)
One JSON object per request, written to Loki with labels {service="kong", env="<env>", route="<route-name>"}.
| Field | Type | Notes |
|---|---|---|
timestamp | RFC3339 | Request received |
request_id | string | X-Request-Id (UUIDv7) |
trace_id | string | W3C trace ID (low-cardinality label; high-cardinality in body) |
client_ip | string | First entry in X-Forwarded-For (Cloudflare-stripped) |
method | string | POST, GET, ... |
path | string | Request path, query string stripped (PII risk) |
route_name | string | Kong Route name (low-cardinality) |
service_name | string | Kong Service name |
account_id | string | null | From X-Account-Id (injected post-auth) |
api_key_id | string | null | From X-Api-Key-Id — never the key itself |
tier | string | null | `free |
status | number | HTTP status returned to client |
latency_ms | number | Total request latency |
upstream_latency_ms | number | Time at upstream |
kong_latency_ms | number | Kong processing time |
request_size_bytes | number | Request body size |
response_size_bytes | number | Response body size |
user_agent | string | Truncated to 256 chars |
referer | string | Optional |
rate_limit_remaining | number | null | From X-RateLimit-Remaining-* |
Never logged:
- Request or response bodies (PII, SMS message content).
Authorizationheader contents (only presence flag if needed).- Full API key (only the
api_key_ididentifier). - Phone numbers, customer message text.
Retention: 14 days hot in Loki, 90 days cold archive. Longer retention requires explicit business justification (audit).
4. Metric names (Prometheus)
Exposed at /metrics on the Kong admin port (scraped by Prometheus). Using the standard prometheus plugin.
| Metric | Type | Labels | Notes |
|---|---|---|---|
kong_http_requests_total | counter | service, route, code, source | Request count |
kong_http_latency_ms | histogram | service, route | End-to-end latency |
kong_upstream_latency_ms | histogram | service, route | Upstream-only latency |
kong_kong_latency_ms | histogram | service, route | Kong processing latency |
kong_bandwidth_bytes | counter | service, route, direction | Ingress/egress bytes |
kong_nginx_http_current_connections | gauge | state | Active connections |
kong_memory_lua_shared_dict_bytes | gauge | shared_dict | Lua SHM usage |
kong_rate_limiting_rejected_total | counter | service, route, limit_by | Ghasi-specific — emitted by custom plugin or derived from status=429 |
ghasi_api_key_lookup_total | counter | `result=hit | miss |
ghasi_api_key_lookup_latency_seconds | histogram | — | Custom plugin |
kong_jwks_refresh_total | counter | issuer, result | JWT plugin JWKS refresh |
Budget: keep label cardinality low. No per-account_id labels on metrics (high cardinality); that lives in logs. Per-route and per-service labels are fine.
5. OpenTelemetry spans
Kong's opentelemetry plugin emits one server span per request.
| Attribute | Example | Notes |
|---|---|---|
http.method | POST | |
http.route | /v1/sms/send | Template, not concrete path |
http.status_code | 202 | |
http.user_agent | curl/8.4 | Truncated |
net.peer.ip | 203.0.113.4 | Client IP (post-Cloudflare) |
kong.service | svc-sms-orchestrator | Kong Service name |
kong.route | rt-sms-v1-send | Kong Route name |
kong.consumer | csm-acct_01HW9X... | When resolved |
ghasi.account_id | acct_01HW9X... | Custom attribute |
ghasi.tier | pro | Custom attribute |
traceparent | W3C | Propagated to upstream |
Spans are linked as parent to the upstream service's server span (W3C trace context). Upstream services add their own application spans as children.
Sampling: head-based, 10% default; 100% for 5xx responses; 100% for /v1/auth/login (security-sensitive).
6. Retention class (per platform taxonomy)
| Stream | Class | Retention |
|---|---|---|
| Access logs | telemetry (PII-scrubbed) | 14 d hot / 90 d cold |
| Metrics | telemetry | 30 d hot / 13 mo rolled-up |
| Traces | telemetry | 7 d |
No events are persisted to PostgreSQL or NATS JetStream by Kong itself.
7. Open questions
- Do we export access logs to a SIEM in addition to Loki (SOC/compliance)?
- Span sampling: consider tail-based sampling via OTel collector for error-biased retention.