Skip to main content

Infrastructure Baseline

Version: 1.2 Status: Approved Owner: Platform Infrastructure Team Last Updated: 2026-04-19 References: ADR-0001 Kong edge gateway, system.md §7–8, AGENT.md §11–12

Change log

  • v1.2 (2026-04-19) — Added Keycloak (base/default IdP + OIDC/SAML broker for tenant external IdP SSO) and compliance-engine (first-class Compliance Layer service) to the infrastructure topology, Docker Compose inventory, Kubernetes namespace strategy, and secrets management. Firebase Auth is retained only as an optional legacy provider.
  • v1.1 (2026-04-17) — Ingress layer updated: Kong Gateway replaces NGINX + custom NestJS api-gateway at the edge. Custom api-gateway pod removed; Kong runs as a deployment with its own HPA and (optional) DB-less configuration. See ADR-0001.
  • v1.0 (2026-04-12) — Initial baseline.

1. Purpose

This document defines the infrastructure topology, environment strategy, Kubernetes configuration standards, observability stack, and secrets management approach for the Ghasi Messaging Gateway platform.


2. Infrastructure Topology


3. Local Development — Docker Compose

Services defined in infra/docker/docker-compose.yml:

ServicePortNotes
postgres5432Single instance with volume
redis6379Single instance
nats4222 / 8222JetStream enabled, monitoring UI
api-gateway3000Hot reload via nodemon
sms-orchestrator3003
smpp-connector3004Connects to mock SMPP server
routing-engine3005
dlr-processor3006
billing-service3007
webhook-dispatcher3008
auth-service3009Talks to local Keycloak; Firebase emulator optional (legacy provider)
analytics-service3010
notification-service3011
operator-management-service3012
compliance-engine3013gRPC on :50051, REST admin on :3013; pairs with compliance-ai
compliance-ai8088Local LLM (container) for compliance classification
keycloak8080 / 8443Base / default IdP; dev realm ghasi-local; Postgres-backed
admin-dashboard3001Next.js dev server
customer-portal3002Next.js dev server
smpp-simulator2775Mock SMPP operator
prometheus9090Scrapes all services (incl. Keycloak + compliance-engine)
grafana3100Pre-loaded dashboards
loki3200Log aggregation

4. Kubernetes — Production Configuration

4.1 Namespace Strategy

NamespaceContents
ghasi-prodAll production application services (incl. compliance-engine)
ghasi-identityKeycloak (HA), auth-service, compliance-ai (local LLM)
ghasi-dataPostgres, Redis, NATS
ghasi-obsPrometheus, Grafana, Loki, OTel Collector
ghasi-vaultHashiCorp Vault

Note. Keycloak and compliance-ai live in their own namespace (ghasi-identity) because they (a) handle sensitive credentials/PII and warrant tighter NetworkPolicies, and (b) have different scaling and upgrade cadences from the messaging core.

4.2 Resource Standards (per service pod)

ProfileCPU RequestCPU LimitMemory RequestMemory Limit
Light (UI, analytics)100m500m128Mi512Mi
Standard (API, billing)250m1000m256Mi1Gi
Heavy (SMPP, orchestrator)500m2000m512Mi2Gi

4.3 HPA Configuration

  • All services use HorizontalPodAutoscaler with CPU utilisation target of 70%.
  • SMPP Connector uses StatefulSet (sticky SMPP sessions require stable pod identity).
  • Minimum replicas: 2 for all production services (high availability).

4.4 Health Endpoints (required on all services)

EndpointPurpose
GET /health/liveKubernetes liveness probe
GET /health/readyKubernetes readiness probe
GET /metricsPrometheus scrape endpoint

4.5 Ingress Rules

HostServiceTLS
api.ghasi.ioapi-gatewayCloudflare-managed cert
admin.ghasi.ioadmin-dashboardCloudflare-managed cert
app.ghasi.iocustomer-portalCloudflare-managed cert

5. Secrets Management

  • All secrets (DB credentials, API keys, SMPP operator credentials, Keycloak admin credentials, Keycloak realm signing keys, per-tenant OIDC/SAML broker client secrets, SAML signing keys, legacy Firebase service account, external LLM API keys) stored in HashiCorp Vault.
  • K8s Secrets used as fallback for environments without Vault.
  • Secrets injected as environment variables via Vault Agent Sidecar Injector or External Secrets Operator.
  • Keycloak realm signing keys are managed inside Keycloak but backed up via Vault-sealed exports.
  • Prohibited: Secrets in ConfigMaps, Helm values files, or source code.

6. Observability Stack

ToolRoleRetention
PrometheusMetrics collection and alerting30 days
GrafanaDashboards and alert routing
LokiLog aggregation (Pino JSON logs)14 days
OpenTelemetry CollectorTrace collection and export7 days

Required Dashboards (Grafana)

  • Platform overview: message throughput, delivery rates, error rates
  • Service-level: latency P50/P95/P99 per service
  • SMPP connector: TPS, bind status per operator
  • Billing: events per hour, invoice generation rate
  • Infrastructure: pod CPU/memory, Postgres connections, Redis hit rate

7. CI/CD Pipeline (GitHub Actions)

StageTriggerAction
LintPR openedESLint + TypeScript check
TestPR openedUnit + integration tests
BuildPR merged to mainDocker image build + push to registry
Deploy stagingBuild successkubectl apply to staging
E2EDeploy staging completePlaywright + API E2E suite
Deploy productionManual approvalkubectl apply to production

8. Assumptions and Open Points

IDAssumption / Open PointOwnerResolution Date
A-001Cloud provider and region not specified; assumed Kubernetes-compatible (GKE / EKS / AKS)Infra TeamTBD
A-002Postgres HA via patroni or managed cloud service TBDInfra TeamTBD
A-003Redis cluster vs Redis Sentinel decision TBDInfra TeamTBD
A-004NATS cluster deployment (self-managed vs managed) TBDInfra TeamTBD
A-005Container registry (GCR / ECR / GHCR) TBDInfra TeamTBD
A-006ClickHouse for analytics: optional, not in baseline K8s manifestsAnalytics TeamTBD