Skip to main content

Architecture Baseline

Version: 1.2 Status: Approved Owner: Platform Architecture Team Last Updated: 2026-04-19 References: ADR-0001 Kong edge gateway, system.md §1–4, AGENT.md §4–5

Change log

  • v1.2 (2026-04-19) — (a) Identity rebaselined: Keycloak is the default/base Identity Provider (IdP); auth-service exposes a pluggable IdP provider abstraction so tenant organisations can federate their own external IdP (OIDC or SAML 2.0) for SSO. Firebase is retained only as a legacy/optional provider. (b) Compliance Engine added as a first-class architectural tier. Every outbound SMS now traverses the Compliance Layer (async, in the NATS consumer) before routing. Container view, outbound SMS sequence, NATS topology, database ownership, and technology stack updated accordingly.
  • v1.1 (2026-04-17) — Kong adopted as the north-south API gateway. Custom NestJS api-gateway service retired; its responsibilities moved to Kong (TLS, auth, rate limiting, correlation, logging) and to sms-orchestrator (validation, idempotency, NATS publish). See ADR-0001. Container view and sequence diagrams updated accordingly.
  • v1.0 (2026-04-12) — Initial baseline with custom NestJS api-gateway.

1. Purpose

This document establishes the authoritative architectural baseline for the Ghasi Messaging Gateway — a telecom-grade SMS gateway platform. It defines service boundaries, communication patterns, data ownership, and system context for all 14 platform units (13 messaging/commerce services plus the compliance-engine).


2. System Context (C4 Level 1)

Identity topology (summary). Kong terminates TLS and validates JWTs/API keys at the edge. auth-service is the platform's canonical identity surface and owns the IdP provider abstraction: a pluggable set of providers of which Keycloak is the base/default. Keycloak itself acts as an OIDC/SAML broker so that a tenant organisation can bring its own corporate IdP (Azure AD, Okta, Google Workspace, ADFS, generic OIDC/SAML) for SSO without any change to downstream services. Firebase remains available as an optional legacy provider for early customers and will be retired per the migration plan. See auth-service SERVICE_OVERVIEW §5.

Compliance Layer (summary). The compliance-engine is a first-class architectural tier alongside ingestion, routing, and transport. It is invoked asynchronously by sms-orchestrator after the tenant has received a 202, and its verdict (ALLOW / FLAG / HOLD / BLOCK) gates whether the message is handed to routing-engine. See compliance-engine SERVICE_OVERVIEW.


3. Container View (C4 Level 2)

3.1 Identity & Access — provider abstraction

auth-service does not hard-code a single external IdP. It exposes an IdP Provider Abstraction with three categories of concrete providers, all implementing the same internal IdentityProvider port (verifyExternalToken, resolveExternalIdentity, provisionUserFromClaims, revokeExternalSession):

CategoryConcrete providerWhen used
Base / DefaultKeycloakProvider (OIDC, RS256)Every tenant by default; also the OIDC/SAML broker for any tenant that enables external SSO
External organisation SSOTenantOIDCProvider, TenantSAMLProvider (brokered through a per-tenant Keycloak realm / IdP mapper)Enterprise tenants who require SSO against their own corporate IdP (Azure AD / Okta / Google / ADFS / generic OIDC / SAML 2.0)
Legacy / optionalFirebaseProviderExisting Firebase-based customers during the migration window; slated for retirement

Tenant-level configuration (tenant_identity_providers table, owned by auth-service) selects which provider(s) apply. Downstream services are indifferent to which provider authenticated the user: they only see the canonical platform JWT signed by auth-service and validated by Kong. See auth-service SECURITY_MODEL §1 for the auth flows per provider.

3.2 Compliance Layer — container-level view

The compliance-engine sits between orchestration and routing. It is a gRPC service with an HTTP REST admin surface and a NATS producer/consumer pair:

InterfacePurposeCaller
gRPC EvaluateCompliance (P95 ≤ 500 ms)Synchronous rule evaluation for a queued messagesms-orchestrator (NATS consumer)
HTTPS /v1/compliance/* (admin)CRUD on rules, rule-sets, hold-queue review, tenant score overridesadmin-dashboard, Kong-authenticated
NATS producer (compliance.*)Emits audit, hold, block, release, reject, expire, score-change eventsFan-out to notification-service, analytics-service, billing-service
NATS consumer (sms.dlr.inbound → stats)Consumes DLR statistics feeding tenant-score modelsFrom dlr-processor

Crucially, no outbound SMS reaches routing-engine until the Compliance Layer returns ALLOW (or an admin releases a held message). The pipeline is fail-closed.


4. Outbound SMS Pipeline (Sequence)

Fail-closed guarantee. If compliance-engine is unavailable, sms-orchestrator retries via NATS (bounded by DLQ policy). The message remains in EVALUATING and is never dispatched to routing-engine without an explicit ALLOW verdict or admin release.


5. DLR Return Path (Sequence)


6. NATS JetStream Topology

Retention notes. compliance.audit.v1 is retained for ≥ 13 months (regulatory evidence window). compliance.message.* and compliance.tenant.tier.changed.v1 use standard 7-day JetStream retention; durable consumers fan events into long-term Postgres storage inside each subscribing service. auth.events captures SSO-relevant signals (external IdP link/unlink, SAML/OIDC session events) for audit.


7. Database Ownership Map

Keycloak data ownership. Keycloak manages its own schema (keycloak) inside the same PostgreSQL instance but in an isolated logical database or schema; no Ghasi service reads the Keycloak schema directly. auth-service interacts with Keycloak exclusively via its Admin REST API + OIDC endpoints. The auth schema stores Ghasi-owned projections: tenant_identity_providers (which IdP each tenant is bound to), external_identities (link between platform userId and external IdP subject), and idp_session_audit.

Compliance data retention. compliance.audit_log is append-only, partitioned by month, and retained ≥ 13 months. evaluation_log (per-message evaluation trace) uses a shorter 90-day retention with cold-tier archival to object storage.


8. Architectural Principles

PrincipleRuleSource
No shared databasesEach service owns exactly one schemaAGENT.md §5.2
Async-firstInter-service communication via NATS JetStream by defaultsystem.md §4
Sync only when requiredgRPC for latency-sensitive calls (Routing Engine)system.md §2
DDD enforcementDomain layer contains zero framework importsAGENT.md §4.2
IdempotencyAll message processing is idempotent via Redis keyssystem.md §2
SMPP resiliencePersistent reconnect, operator failover, DLQsystem.md §2
Secret managementVault or K8s Secrets — never plaintextAGENT.md §11.1
ObservabilityEvery service exposes Prometheus metrics + OTel tracesAGENT.md §12

9. Technology Stack

LayerTechnologyKey packages / version
LanguageTypeScript5.x, strict mode
Backend frameworkNestJS@nestjs/core, @nestjs/common, @nestjs/platform-fastify (HTTP adapter), latest stable
HTTP adapterFastify (via NestJS platform adapter)@nestjs/platform-fastify 10.x — NestJS drives Fastify internally; no raw Fastify code in services
API documentation@nestjs/swaggerOpenAPI 3.1 generated from decorators
Input validationclass-validator + class-transformer + ZodDTO validation via NestJS Pipes
FrontendNext.js (App Router)14+
UI componentsShadCN UI + TailwindCSSLatest stable
Primary DBPostgreSQL16+
ORMPrisma (via @nestjs/prisma / custom module)5.x
Caching / rate limitingRedis (@nestjs/cache-manager, ioredis)7+
Message busNATS JetStream (nats npm package, via shared nats-client)2.10+
Identity Provider (base / default)Keycloak (self-hosted) — realm-per-environment, OIDC + SAML 2.0 broker24.x LTS
IdP client librariesopenid-client, @node-saml/node-saml, keycloak-admin-clientLatest stable
IdP provider abstractionIn-house IdentityProvider port in auth-service with pluggable providers (Keycloak / Tenant-OIDC / Tenant-SAML / Firebase-legacy)
Legacy IdP (optional)Firebase Authentication (firebase-admin) — retained only for migration window12.x
Auth guards@nestjs/passport + custom NestJS GuardsLatest stable
SCIM (tenant org user provisioning)scim2-server or equivalent, exposed via auth-service for enterprise tenantsLatest stable
SMPPSMPP 3.4 connector (custom NestJS module)
LoggingPino via nestjs-pinoLatest stable
TracingOpenTelemetry SDK1.x
ContainerDocker24+
OrchestrationKubernetes1.29+
DNS / WAFCloudflare
CI/CDGitHub Actions
ObservabilityPrometheus + Grafana + Loki + OpenTelemetryLatest stable
Compliance AILocal LLM (e.g. llama.cpp / vLLM) with external LLM fallback for classification (@compliance-engine/ai)Latest stable

10. Assumptions and Open Points

IDAssumption / Open PointOwnerResolution Date
A-001Cloud region not specified in system.md; assumed single primary region with optional DRInfra TeamTBD
A-002RPO and RTO targets not defined; assumed RPO 1h, RTO 4h as initial baselineInfra TeamTBD
A-003ClickHouse integration for analytics is optional scaffolding; not in baseline architectureAnalytics TeamTBD
A-004gRPC is used for Routing Engine synchronous calls; all other sync calls use RESTPlatform ArchTBD
A-005Vault is preferred for secrets; K8s Secrets as fallbackSecurity TeamTBD
A-006Keycloak runs as a managed deployment inside the cluster (HA pair) with PostgreSQL as its persistence backend. Managed/cloud Keycloak (e.g., Red Hat SSO) remains an option for regulated regions.Platform Arch + SecurityTBD
A-007Tenant-specific external IdP onboarding (OIDC discovery URL or SAML metadata URL) is self-serve via admin-dashboardauth-service → Keycloak Admin REST.Platform ArchTBD
A-008compliance-engine local LLM runs as a sidecar or shared in-cluster service; external LLM fallback is region-scoped and governed by data residency policy.Trust & Safety + SecurityTBD
A-009Compliance Layer is fail-closed: if unavailable, messages remain in EVALUATING and are retried from NATS until the service recovers or DLQ policy fires. Messages are never released to routing without an explicit ALLOW verdict.Trust & SafetyApproved