Ghasi SMS Gateway — Product & Engineering Roadmap
Version: 1.1 Status: Normative Aligned with: architecture_baseline.md, infrastructure_baseline.md, governance.md, all service specs, ADR-0002 (Keycloak multi-IDP), ADR-0003 (Compliance Layer) Principles: Cloud-first, Multi-tenant SaaS, Clean Architecture + DDD, Event-driven microservices, Security + compliance from day one, Fast time-to-market, Iterative delivery in slices.
Change log
- v1.1 (2026-04-19) — Rebaselined for two architectural decisions landed in v1.2 of the enterprise architecture: (a) Keycloak as base/default IdP with a pluggable provider abstraction enabling tenant external OIDC/SAML SSO (ADR-0002); (b) Compliance Layer as first-class tier via the
compliance-enginemicroservice (ADR-0003). Added 12 new epics (2 identity, 10 compliance) and 50 new user stories; totals updated accordingly. Resequenced M0 and M1 to ensure compliance evaluation is on the path from the first production-bound SMS.- v1.0 (2026-04-12) — Initial release with 13 services, Firebase identity, no compliance tier.
0. Executive Summary
The Ghasi SMS Gateway is a telecom-grade, multi-tenant SMS aggregation platform. It connects enterprise customers to mobile network operators (MNOs) via SMPP 3.4, providing intelligent routing, delivery tracking, billing, self-service management, and — from v1.1 — a first-class Compliance Layer that gates every outbound message before carrier dispatch.
This roadmap delivers the full 14-service platform in 6 milestones across 18 sprints (2-week cadence, ~9 months total). Each milestone produces a deployable, testable, independently valuable increment.
Total scope: 79 epics, ~243 user stories, 14 microservices, 2 frontend apps, Keycloak identity platform, local-LLM compliance AI.
1. Milestone Overview
| Milestone | Theme | Duration | Sprints | Customers | Primary Value | Monetization | Competitive Edge |
|---|---|---|---|---|---|---|---|
| M0 | Platform Foundation | 4 weeks | 1–2 | Internal / DevOps | Infrastructure, Keycloak + IdP provider abstraction, API skeleton, compliance-engine scaffolding, CI/CD | None | Architecture locked; security + compliance from day one |
| M1 | First Marketable Product | 6 weeks | 3–5 | Alpha testers (internal) | Send SMS end-to-end: API → compliance (observation-mode) → routing → SMPP → DLR | None (free alpha) | Compliance Layer on the path from day one; functional parity with basic SMS APIs |
| M2 | First Sellable Product | 6 weeks | 6–8 | Beta customers (5–10) | Self-service portal, billing, webhooks, compliance rule + hold-queue management, compliance billing waiver | Per-SMS pricing + monthly invoices | Full self-service onboarding; webhook DLR delivery; enforced compliance rules |
| M3 | Competitive Differentiator | 6 weeks | 9–11 | Paid customers (20–50) | Admin dashboard, advanced routing, operator failover, analytics, tenant scoring & risk tiering, tenant external OIDC/SAML SSO | Tiered pricing, seat-pack billing, enterprise SSO upsell | Intelligent routing; real-time operator failover; admin visibility; enterprise SSO; auto-enforced tenant risk tiers |
| M4 | Full Platform GA | 6 weeks | 12–14 | GA launch (100+) | Notifications, full analytics, local-LLM AI classification, compliance reporting & audit, observability hardening | Full pricing tiers, enterprise contracts | Production-grade SLAs; AI-assisted compliance; regulator-ready audit exports; full observability |
| M5 | Post-GA Expansion | 6 weeks | 15–18 | Scale-up | Multi-currency, hot-reload config, advanced analytics, API v2, Firebase legacy retirement, classification accuracy harness | Multi-currency billing, premium tiers | Hot-reload ops; real-time analytics; API evolution; legacy IdP sunset |
Total: ~36 weeks (9 months) to GA. Post-GA runs continuously.
2. Slicing Strategy
Slice 1 / M0 — Platform Foundation
Capabilities: Kubernetes namespaces (incl. ghasi-identity), database schemas (incl. compliance + keycloak), NATS JetStream streams (incl. compliance.* + auth.events.idp.*), Redis cluster, Keycloak HA deployment with dev+staging realms, IdentityProvider port scaffolded in auth-service with Keycloak provider as default, compliance-engine scaffolded (gRPC + HTTP + schema + mTLS + health), Kong edge gateway, health probes, CI/CD pipeline.
Services:
auth-service— AUTH-EPIC-001 (Keycloak baseline), AUTH-EPIC-002 (IdP provider abstraction), AUTH-EPIC-003 (RBAC), AUTH-EPIC-004 (API key lifecycle)compliance-engine— EP-CE-01 (Service Foundation: scaffold, schema, mTLS, health)- Kong — edge gateway with JWT +
ghasi-api-key-lookupplugins (ADR-0001) - Infrastructure bootstrap (shared modules, Prisma migrations, Docker Compose, K8s manifests)
Frontend: None (API-only; Postman/curl testing).
Why this slice matters: Nothing else works without identity, authorization, and infrastructure. Freezing the auth contract and the compliance-engine contract early prevents cascading changes. Every downstream service depends on the platform JWT and (from M1) the compliance gRPC. Deploying Keycloak in M0 avoids a disruptive IdP swap later; scaffolding compliance-engine in M0 ensures its gRPC is available the moment M1 starts publishing SMS.
Monetization: None. Investment phase.
Epics:
| Epic | Service | Stories |
|---|---|---|
| AUTH-EPIC-001 | auth-service | Keycloak baseline: realm per env, client config, JWKS exposure, platform JWT issuance — AUTH-US-001, 002, 003, 004 |
| AUTH-EPIC-002 | auth-service | IdP provider abstraction: IdentityProvider port, KeycloakProvider, NativeProvider, FirebaseLegacyProvider adapters, dispatcher — AUTH-US-005, 006, 007 |
| AUTH-EPIC-003 | auth-service | RBAC: roles, scopes, user-role assignment — AUTH-US-008, 009, 010 |
| AUTH-EPIC-004 | auth-service | API key lifecycle + Kong custom plugin lookup — AUTH-US-011, 012, 013 |
| GW-EPIC-001 | Kong (api-gateway) | JWT + key-auth + rate-limit + correlation-id + OTel — GW-US-001, 002, 003 |
| GW-EPIC-007 | Kong (api-gateway) | Observability skeleton — GW-US-013, 014, 015 |
| EP-CE-01 | compliance-engine | Service Foundation: NestJS gRPC + HTTP bootstrap, Prisma migrations for full compliance schema, mTLS for gRPC, health/metrics endpoints — US-CE-001, 002, 003, 004 |
Story count: ~26 | Sprint allocation: Sprint 1–2
Architectural freeze by end of M0: platform JWT claim shape (incl.
idpclaim),IdentityProviderport,complianceschema,ComplianceService.v1proto, Kong plugin policies, Keycloak realm-per-environment convention.
Slice 2 / M1 — Core SMS Flow (First Marketable Product)
Capabilities: Send SMS via REST API, intelligent prefix-based routing, SMPP 3.4 delivery to operators, delivery receipt processing, message status tracking, idempotent processing, retry + DLQ. Compliance Layer in observation mode — every outbound SMS traverses compliance-engine via gRPC, but the seeded rule set is FLAG-only so no messages are actually held/blocked. This validates the pipeline, fail-closed semantics, and latency budget before enforcement is turned on in M2.
Services:
- Kong (api-gateway config) — GW-EPIC-002 (SMS send + idempotency), GW-EPIC-003 (message status)
sms-orchestrator— ORCH-EPIC-001 through 004 (full processing core), ORCH-EPIC-005 (compliance integration)routing-engine— ROUTE-EPIC-001 (operator selection), ROUTE-EPIC-002 (health monitoring)smpp-connector— SMPP-EPIC-001 (session mgmt), SMPP-EPIC-002 (PDU processing), SMPP-EPIC-003 (TPS + failover)dlr-processor— DLR-EPIC-001 (core DLR pipeline)compliance-engine— EP-CE-02 core subset (KEYWORD, SENDER_ID, GEO_RESTRICTION, fullEvaluateCompliancehandler), EP-CE-08 (async pipeline integration + observation-mode rollout), EP-CE-07 partial (metrics, structured logging, OTel)
Frontend: None (API-only; test via curl/Postman/SDK).
Why this slice matters: This is the critical path to a working product. An SMS sent via the API reaches a handset and a DLR comes back. Everything else (billing, portal, analytics) is layered on top of this flow. Shipping the Compliance Layer on the path from day one — in observation mode — avoids a disruptive insertion later and validates fail-closed behaviour with real traffic under low stakes.
Monetization: Free alpha tier to validate throughput, latency, operator connectivity, and compliance-evaluation budget.
Epics:
| Epic | Service | Stories |
|---|---|---|
| GW-EPIC-002 | api-gateway (Kong) | GW-US-004, 005, 006 |
| GW-EPIC-003 | api-gateway (Kong) | GW-US-007 |
| ORCH-EPIC-001 | sms-orchestrator | ORCH-US-001–007, 012 |
| ORCH-EPIC-002 | sms-orchestrator | ORCH-US-008, 009 |
| ORCH-EPIC-003 | sms-orchestrator | ORCH-US-010, 011 |
| ORCH-EPIC-004 | sms-orchestrator | ORCH-US-013, 014, 015 |
| ORCH-EPIC-005 | sms-orchestrator | Compliance integration: EVALUATING state, gRPC client, verdict handler, fail-closed non-ack — ORCH-US-016, 017, 018 |
| ROUTE-EPIC-001 | routing-engine | ROUTE-US-001, 002, 003 |
| ROUTE-EPIC-002 | routing-engine | ROUTE-US-007, 008, 009, 010 |
| SMPP-EPIC-001 | smpp-connector | SMPP-US-001, 002, 012 |
| SMPP-EPIC-002 | smpp-connector | SMPP-US-003, 004, 005 |
| SMPP-EPIC-003 | smpp-connector | SMPP-US-006, 007, 008 |
| SMPP-EPIC-004 | smpp-connector | SMPP-US-009, 010, 011 |
| DLR-EPIC-001 | dlr-processor | DLR-US-001–006, 009, 010 |
| EP-CE-02 (subset) | compliance-engine | Rule engine core — KEYWORD, SENDER_ID, GEO, full EvaluateCompliance handler — US-CE-005, 010, 011, 013 |
| EP-CE-08 | compliance-engine + sms-orchestrator | Async pipeline integration + observation-mode rollout — US-CE-033, 034, 035, 037 |
| EP-CE-07 (subset) | compliance-engine | Metrics, structured logging with PII masking, OTel — US-CE-028, 029, 030 |
Story count: ~60 | Sprint allocation: Sprint 3–5
M1 exit criterion (new):
compliance.audit.v1is emitted for every sent SMS in the alpha environment; P95EvaluateCompliance≤ 500 ms under 200 RPS; fail-closed chaos test passes (killcompliance-engine→ messages stay inEVALUATINGand never hit a carrier).
Slice 3 / M2 — First Sellable Product
Capabilities: Customer self-service portal (signup, API keys, test SMS, message logs, webhook management, billing dashboard), billing event pipeline, pricing engine, invoice generation, webhook delivery with HMAC signing and retry. Compliance Layer switched from observation to enforcement: real rule authoring, blocklists, keyword lists, and a working hold-queue with manual review in admin-dashboard. Billing consumes compliance events so non-dispatched messages are not charged.
Services:
billing-service— BILL-EPIC-001 through 004, BILL-EPIC-005 (compliance waiver)customer-portal— CUST-EPIC-001 through 006, CUST-EPIC-007 (compliance visibility: blocked/held states, appeals)webhook-dispatcher— HOOK-EPIC-001 through 004dlr-processor— DLR-EPIC-002 (billing emission), DLR-EPIC-003 (webhook trigger)- Kong (api-gateway config) — GW-EPIC-004 (API key mgmt), GW-EPIC-005 (billing proxy), GW-EPIC-006 (webhook test)
compliance-engine— EP-CE-03 (Hold Queue & Manual Review), EP-CE-04 (Rule & Blocklist Management API), EP-CE-02 remainder (REGEX, RECIPIENT, RATE_VOLUME, TEMPORAL, COMPOSITE rule types)notification-service(early slice) — consumescompliance.message.*events to surface portal notifications (NOTIF-US-012, 013 brought forward)
Frontend: customer-portal (Next.js, port 3002). Compliance views (blocked/held messages, appeal form) per EP-CE-10 subset (US-CE-042, 043, 045).
Why this slice matters: This is the earliest path to monetization. Customers can self-onboard, send production SMS, track delivery, receive webhook callbacks, and get billed. Revenue starts here. Turning compliance enforcement on at M2 — before we scale paid customers — contains blast radius if rules need tuning and establishes the evidence trail from first dollar.
Monetization: Per-segment pricing. Monthly invoiced billing. Self-service signup eliminates sales friction for SMB segment. Blocked / held / rejected / expired messages are not billed (EP-CE-08 / US-CE-036 ensures this is reconcilable end-to-end).
Epics:
| Epic | Service | Stories |
|---|---|---|
| BILL-EPIC-001 | billing-service | BILL-US-001, 002, 003, 004 |
| BILL-EPIC-002 | billing-service | BILL-US-005, 006, 007 |
| BILL-EPIC-003 | billing-service | BILL-US-008, 009, 010 |
| BILL-EPIC-004 | billing-service | BILL-US-011, 012, 013, 014, 015 |
| CUST-EPIC-001 | customer-portal | CUST-US-001–004, 020, 021 |
| CUST-EPIC-002 | customer-portal | CUST-US-005, 006, 007 |
| CUST-EPIC-003 | customer-portal | CUST-US-008, 009 |
| CUST-EPIC-004 | customer-portal | CUST-US-010, 011, 012, 013 |
| CUST-EPIC-005 | customer-portal | CUST-US-014, 015, 016, 017 |
| CUST-EPIC-006 | customer-portal | CUST-US-018, 019 |
| HOOK-EPIC-001 | webhook-dispatcher | HOOK-US-001, 002, 006, 007, 014 |
| HOOK-EPIC-002 | webhook-dispatcher | HOOK-US-003, 004, 005, 011 |
| HOOK-EPIC-003 | webhook-dispatcher | HOOK-US-008, 009, 012, 013 |
| HOOK-EPIC-004 | webhook-dispatcher | HOOK-US-010 |
| DLR-EPIC-002 | dlr-processor | DLR-US-007 |
| DLR-EPIC-003 | dlr-processor | DLR-US-008 |
| GW-EPIC-004 | api-gateway (Kong) | GW-US-008, 009 |
| GW-EPIC-005 | api-gateway (Kong) | GW-US-010, 011 |
| GW-EPIC-006 | api-gateway (Kong) | GW-US-012 |
| EP-CE-02 (remainder) | compliance-engine | REGEX, RECIPIENT, RATE_VOLUME, TEMPORAL, COMPOSITE — US-CE-006, 008, 012 + rule types split from US-CE-011 |
| EP-CE-03 | compliance-engine | Hold queue, single-item review, auto-expiry, bulk-review — US-CE-014, 015, 016, 017, 018 |
| EP-CE-04 | compliance-engine | Rule CRUD + versioning, rule-set mgmt, blocklist mgmt, keyword-list mgmt — US-CE-019, 020, 021, 022 |
| BILL-EPIC-005 | billing-service | Compliance event consumer — waive non-dispatched messages — US-CE-036 |
| CUST-EPIC-007 | customer-portal | Compliance message states, appeals UI — US-CE-042, 045 |
| NOTIF-EPIC (early) | notification-service | Hold/block portal alerts — US-CE-043 |
Story count: ~80 | Sprint allocation: Sprint 6–8
M2 exit criterion (new): at least one production tenant has had a message correctly
HELD, manually released viaadmin-dashboard, billed on eventual DLR, and portal-notified of each state change. Compliance-audit export successfully round-trips to CSV for a 7-day window.
Slice 4 / M3 — Competitive Differentiator
Capabilities: Admin dashboard (operator CRUD, routing rules, user/role management, message logs, billing overview, system health, compliance rule authoring, hold-queue review UI, tenant scoring dashboards), advanced routing (per-account, cost-based, round-robin, priority), operator management with Vault credential storage, analytics pipeline. Tenant external OIDC/SAML SSO via Keycloak broker (the enterprise-unlock capability). Continuous tenant compliance scoring + risk tier enforcement.
Services:
admin-dashboard— ADMDASH-EPIC-001 through 007, ADMDASH-EPIC-008 (compliance console: rules, hold queue, tenant scores, audit log viewer)operator-management-service— OPS-EPIC-001 through 004routing-engine— ROUTE-EPIC-003 (rules management), ROUTE-EPIC-004 (observability)smpp-connector— SMPP-EPIC-005 (observability)analytics-service— ANLYT-EPIC-001, ANLYT-EPIC-002, ANLYT-EPIC-005 (compliance analytics: audit stream archive, violations, tier transitions)auth-service— AUTH-EPIC-005 (tenant external OIDC SSO), AUTH-EPIC-006 (tenant external SAML 2.0 SSO), AUTH-EPIC-007 (SCIM 2.0 inbound provisioning)compliance-engine— EP-CE-05 (Tenant Scoring & Risk Tiering), EP-CE-07 remainder (alerts, runbooks, HPA, deployment hardening), EP-CE-10 remainder (US-CE-044 tenant score visibility in portal)notification-service— NOTIF-EPIC-001-early slice extended for compliance alert routing + email delivery prefs
Frontend: admin-dashboard (Next.js, port 3001) including the compliance console. Customer-portal surfaces tenant score + tier guidance copy.
Why this slice matters: This is where the platform becomes operationally competitive and enterprise-sellable. Admin visibility, intelligent routing, operator failover driven by health events, analytics, enterprise SSO against the customer's own IdP, and automated tenant risk tiering are the four things enterprise customers will ask about in pre-sales. Together they close the enterprise gap vs commodity SMS APIs.
Monetization: Enterprise tier pricing enabled by admin tooling + SSO. Tiered routing creates upsell opportunity. Tenant risk tiering reduces support cost by auto-enforcing volume limits on risky tenants.
Epics:
| Epic | Service | Stories |
|---|---|---|
| ADMDASH-EPIC-001 | admin-dashboard | ADMDASH-US-001, 002 |
| ADMDASH-EPIC-002 | admin-dashboard | ADMDASH-US-003, 004 |
| ADMDASH-EPIC-003 | admin-dashboard | ADMDASH-US-005–008, 018, 019 |
| ADMDASH-EPIC-004 | admin-dashboard | ADMDASH-US-013, 014, 015 |
| ADMDASH-EPIC-005 | admin-dashboard | ADMDASH-US-009, 010, 017 |
| ADMDASH-EPIC-006 | admin-dashboard | ADMDASH-US-011, 012, 020 |
| ADMDASH-EPIC-007 | admin-dashboard | ADMDASH-US-016 |
| OPS-EPIC-001 | operator-mgmt | OPS-US-001–005 |
| OPS-EPIC-002 | operator-mgmt | OPS-US-006, 007 |
| OPS-EPIC-003 | operator-mgmt | OPS-US-008, 009 |
| OPS-EPIC-004 | operator-mgmt | OPS-US-010, 011, 012 |
| ROUTE-EPIC-003 | routing-engine | ROUTE-US-004, 005, 006, 011, 012 |
| ROUTE-EPIC-004 | routing-engine | ROUTE-US-013, 014 |
| SMPP-EPIC-005 | smpp-connector | SMPP-US-013, 014 |
| ANLYT-EPIC-001 | analytics-service | ANLYT-US-001, 002, 003 |
| ANLYT-EPIC-002 | analytics-service | ANLYT-US-004, 005, 006 |
| ANLYT-EPIC-005 | analytics-service | Compliance analytics — audit archive, violations dashboards, tier-transition reports |
| ADMDASH-EPIC-008 | admin-dashboard | Compliance console — rule authoring, hold-queue review UI, tenant scores + overrides, audit log viewer |
| AUTH-EPIC-005 | auth-service | Tenant external OIDC SSO (brokered via Keycloak) — discovery URL registration, mapper provisioning, SSO start/callback, external_identities linking |
| AUTH-EPIC-006 | auth-service | Tenant external SAML 2.0 SSO (brokered via Keycloak) — metadata intake, SP endpoints, ACS/SLS |
| AUTH-EPIC-007 | auth-service | SCIM 2.0 inbound — Users + Groups CRUD, per-tenant bearer tokens, Keycloak mirror |
| EP-CE-05 | compliance-engine | Tenant scoring worker, REST endpoints, manual tier override — US-CE-023, 024, 025 |
| EP-CE-07 (remainder) | compliance-engine | Alerts + runbook, K8s deployment with HPA + PDB — US-CE-031, 032 |
| EP-CE-10 (score visibility) | compliance-engine + customer-portal | Tenant-visible score + tier + guidance — US-CE-044 |
Story count: ~80 | Sprint allocation: Sprint 9–11
M3 exit criterion: one enterprise tenant onboarded via Azure AD OIDC or Okta SAML in staging; tenant compliance score + tier displayed in both admin and tenant portals; automated SUSPENDED-tier → auto-HOLD enforcement validated end-to-end.
Slice 5 / M4 — Full Platform GA
Capabilities: Notification service (welcome emails, invoice emails, operator alerts, system alerts), full analytics API with caching, DLR observability hardening, local-LLM AI classification for compliance (AI_CLASSIFICATION + DLR_ABUSE rule types), compliance reporting & audit export (TENANT_AUDIT, VIOLATION_SUMMARY, etc.), GDPR erasure (auth + compliance-engine consumers), data retention policies, production observability (dashboards, runbooks, SLOs), a11y (keyboard nav, dark mode).
Services:
notification-service— NOTIF-EPIC-001 through 003 (including compliance-event consumption beyond the early slice)analytics-service— ANLYT-EPIC-003 (query API), ANLYT-EPIC-004 (retention + observability)dlr-processor— DLR-EPIC-004 (observability)compliance-engine— EP-CE-02 finish (AI_CLASSIFICATION, DLR_ABUSE rule types — US-CE-007, 009), EP-CE-06 (Reporting & Audit — US-CE-026, 027), EP-CE-09 (Local LLM Platform — US-CE-038, 039, 040)- Platform-wide — security hardening, Vault integration, mTLS, final K8s HPA tuning, GDPR erasure end-to-end (incl. compliance hold-queue PII redaction on
auth.user.erased.v1)
Frontend: Admin dashboard + customer portal polish, a11y stories (ADMDASH-US-021, 022). Admin-dashboard compliance console gains AI-rule authoring + report generation surface.
Why this slice matters: GA readiness. Every SLO has a dashboard and alert. Every runbook is published. AI-assisted compliance closes the last detection gap (sophisticated fraud/phishing that static rules miss). Regulator-ready audit exports (13-month retention + TENANT_AUDIT report) satisfy the evidence ask from banking and telecom auditors. Notification workflows close the loop on invoicing, operator incidents, and compliance events. The platform is contractually supportable.
Monetization: Full pricing tiers. Enterprise SLA contracts. Premium support tier. Compliance tier (enhanced AI rules, longer retention, custom reports) as upsell.
Epics:
| Epic | Service | Stories |
|---|---|---|
| NOTIF-EPIC-001 | notification-service | NOTIF-US-001, 002, 006, 007 |
| NOTIF-EPIC-002 | notification-service | NOTIF-US-003, 004, 005, 008 |
| NOTIF-EPIC-003 | notification-service | NOTIF-US-009, 010, 011 |
| ANLYT-EPIC-003 | analytics-service | ANLYT-US-007, 008, 009, 010 |
| ANLYT-EPIC-004 | analytics-service | ANLYT-US-011, 012 |
| DLR-EPIC-004 | dlr-processor | DLR-US-011, 012 |
| EP-CE-02 (finish) | compliance-engine | AI_CLASSIFICATION + DLR_ABUSE rule types — US-CE-007, 009 |
| EP-CE-06 | compliance-engine | Compliance report generation + audit-log query — US-CE-026, 027 |
| EP-CE-09 (core) | compliance-engine + local-LLM | vLLM deployment, provider abstraction, cost/perf monitoring — US-CE-038, 039, 040 |
| Cross-cutting | admin-dashboard | ADMDASH-US-021, 022 |
| GDPR erasure | auth-service + compliance-engine | auth.user.erased.v1 consumer on compliance-engine redacts hold-queue PII |
Story count: ~30 | Sprint allocation: Sprint 12–14
M4 exit criterion: 500 msg/s sustained load test with compliance enforcement on, AI classification of ≥ 30% of traffic, local LLM P95 ≤ 300 ms; TENANT_AUDIT report generates for a 90-day window in ≤ 5 minutes; GDPR erasure redacts all tenant PII across
authandcomplianceschemas within SLA.
Slice 6 / M5 — Post-GA Expansion
Capabilities: Hot-reload SMPP operator config (zero-downtime), multi-currency billing, advanced analytics (real-time streaming), API v2 planning, additional MNO integrations, geographic expansion. Classification accuracy evaluation harness for the compliance AI. Firebase legacy provider retirement — migrate residual Firebase tenants to Keycloak and remove the FirebaseLegacyProvider adapter.
Services:
smpp-connector— SMPP-US-015 (hot-reload)billing-service— multi-currency expansionanalytics-service— real-time streaming pipelinecompliance-engine— EP-CE-09 finish (US-CE-041 — classification accuracy harness)auth-service— AUTH-EPIC-008 (Firebase legacy retirement): migrate Firebase-only users to Keycloak, removeFirebaseLegacyProvider, pruneauth.external_identitiesrows forprovider_id='firebase-legacy'- All services — performance tuning, capacity planning
Frontend: Enhanced dashboards, real-time analytics views, Firebase legacy deprecation banners + migration wizard in customer portal.
Why this slice matters: Operational excellence and market expansion. Hot-reload eliminates maintenance windows for operator changes. Multi-currency unlocks international markets. Real-time analytics enables premium tier pricing. Retiring Firebase simplifies the identity surface to a single provider class (Keycloak + external brokering) and cuts one operational dependency. The classification accuracy harness turns the AI into a measured and continuously-improving asset rather than a black box.
Monetization: International expansion revenue. Premium analytics tier. Reduced ops cost via hot-reload and single-IdP posture.
Story count: ~15 (new stories for expansion + compliance AI harness + IdP migration) | Sprint allocation: Sprint 15–18
M5 exit criterion: zero Firebase tenants in
tenant_identity_providerswithstatus = 'active'; classification accuracy report shows ≥ baseline F1 across all categories.
3. Critical Path Analysis
3.1 Earliest Path to a Working Product (M1)
Infrastructure → auth-service → api-gateway → sms-orchestrator → routing-engine → smpp-connector → dlr-processor
M0 M0 M0→M1 M1 M1 M1 M1
Critical chain: auth must be stable before API gateway can validate requests. Orchestrator depends on routing-engine (gRPC). SMPP connector depends on operator configs. DLR processor depends on SMPP connector publishing sms.dlr.inbound.
Duration: 10 weeks (M0 + M1).
3.2 Earliest Path to Monetization (M2)
Working SMS flow (M1) → billing-service → customer-portal → webhook-dispatcher
M2 M2 M2
Duration: 16 weeks (M0 + M1 + M2). Revenue from Sprint 8 onward.
3.3 Parallelizable Workstreams
| Workstream | Can run parallel with | Constraint |
|---|---|---|
customer-portal (CUST-EPIC-001–003) | billing-service (BILL-EPIC-001–002) | Portal can stub billing APIs initially |
admin-dashboard (ADMDASH-*) | operator-management-service (OPS-*) | Both in M3; dashboard consumes OPS APIs |
notification-service (NOTIF-*) | analytics-service (ANLYT-003–004) | Independent event consumers |
webhook-dispatcher (HOOK-*) | billing-service (BILL-003–004) | Both consume DLR events independently |
routing-engine advanced (ROUTE-003–004) | smpp-connector observability (SMPP-005) | No dependency |
compliance-engine rule types (EP-CE-02) | sms-orchestrator ORCH-EPIC-001–004 | Rule types only need a stable MessageContext; orchestrator + compliance teams parallel from Sprint 3 |
| Keycloak operations + auth-service IdP abstraction | All other M0 work | Platform-team owns Keycloak; service-team owns auth-service; shared contract freeze at end of Sprint 2 |
| Tenant external SSO (AUTH-EPIC-005, 006, 007) | Admin dashboard compliance console (ADMDASH-EPIC-008) | Both M3; independent but share the admin-dashboard frontend — feature-flag per capability |
| Compliance scoring worker (EP-CE-05) | Compliance hold-queue bulk-review (EP-CE-03 US-018) | Independent subsystems; both consume DLR stats + audit log |
3.4 Architectural Freeze Points
| Element | Freeze by | Reason |
|---|---|---|
NATS JetStream stream definitions (incl. compliance.* + auth.events.idp.*) | Sprint 1 | Every async service depends on stream names, subjects, consumer configs |
| PostgreSQL schema conventions (UUID PKs, tenant_id, timestamps) | Sprint 1 | All Prisma schemas derive from this |
Platform JWT claim shape (incl. idp claim) and IdentityProvider port | Sprint 2 | Every service validates tokens; any IdP change must go through the port |
| Keycloak realm-per-environment convention + Admin REST client config | Sprint 2 | Provisioning new tenant IdPs depends on this |
ComplianceService.v1 gRPC proto + compliance schema | Sprint 2 | sms-orchestrator integration and rule authoring depend on this |
RBAC roles (incl. platform.compliance.*) + API key format | Sprint 2 | Every service validates scopes against this |
| gRPC routing-engine contract | Sprint 3 | Orchestrator depends on this; changing it cascades |
| SMPP message ID correlation strategy | Sprint 3 | DLR processing, billing, and status tracking all depend on this |
| Billing event schema (incl. compliance waiver) | Sprint 5 | DLR processor, billing service, compliance-engine, analytics all produce/consume this |
| Webhook payload schema | Sprint 5 | Customer integrations depend on this; breaking changes lose trust |
| API v1 response envelope (incl. compliance REST surface) | Sprint 3 | Customer-facing; versioned; must not break |
4. Engineering Roadmap (Detailed)
4.1 M0 — Platform Foundation (Sprint 1–2)
Capabilities delivered:
- Kubernetes cluster with 5 namespaces (
ghasi-prod,ghasi-identity,ghasi-data,ghasi-obs,ghasi-vault) - PostgreSQL 16 HA, Redis 7 cluster, NATS 3-node JetStream cluster (with
compliance.*+auth.events.*streams) - Keycloak HA deployment (2 replicas) in
ghasi-identitywith Postgres-backed storage; realmsghasi-local,ghasi-stagingprovisioned with Admin-REST bootstrap auth-servicewithIdentityProviderport +KeycloakProvider(default),NativeProvider(break-glass), stubFirebaseLegacyProvider; login via Keycloak OIDC; JWKS exposure; RBAC + API-key CRUDcompliance-enginescaffolded (NestJS dual-transport gRPC/HTTP, Prisma migrations for fullcomplianceschema, mTLS on gRPC, health/metrics)- Kong edge gateway with
jwt+rate-limiting-advanced+ghasi-api-key-lookupcustom plugin - CI/CD pipeline: lint → test → build → deploy (GitHub Actions → K8s)
- Shared packages:
shared-types,shared-utils,shared-config,nats-client,db-client,logging,compliance-proto-client - Docker Compose for local dev with mock SMPP simulator, Keycloak with preloaded realm, mock-oidc (simulating a tenant IdP)
Services: Kong, auth-service, compliance-engine (scaffold), Keycloak (infra), infrastructure
Dependencies: K8s cluster, DNS, TLS certificates, Vault, object storage (for report output in later milestones)
Risks & mitigations:
| Risk | Impact | Mitigation |
|---|---|---|
| Keycloak operational learning curve (HA, upgrades, realm import/export) | Delayed M0 | Spike in Sprint 1; document runbook; vendor-neutral OIDC port so Keycloak swap is possible later |
compliance-engine proto churn after scaffolding | Cascading changes in M1 | Freeze ComplianceService.v1 proto + EvaluateComplianceRequest/Response at end of Sprint 2 |
| NATS JetStream learning curve | Delayed stream config | Spike in Sprint 1; document patterns |
Prisma migration conflicts (especially partitioned tables in compliance) | Schema drift | Single migration runner; advisory locks; partition provisioning cron tested against staging |
Acceptance criteria:
-
auth-servicehealth probe returns 200 with DB + Redis + NATS + Keycloak Admin REST ready - Platform JWT issued by
auth-serviceafter Keycloak OIDC login validates through Kongjwtplugin - API key created, listed, revoked via REST; sha256 hashing verified; Kong
ghasi-api-key-lookupplugin resolves key → consumer - RBAC lookup returns correct role within 50ms (Redis cache hit);
platform.compliance.*roles provisioned -
compliance-engine/health/readyreturns 200 with all deps up; stubEvaluateCompliancehandler returns valid response over mTLS - CI pipeline deploys to staging on merge to main
- Docker Compose
docker compose upstarts all infra + Keycloak + Kong + auth-service + compliance-engine in < 90s
Release checklist:
- All shared packages published to private registry
- Prisma migrations run cleanly on fresh DB (incl. monthly partitions for
evaluation_log,audit_log,score_history) - NATS streams created with correct retention policies (incl. 13-month
COMPLIANCE_AUDIT) - Vault configured with auth-service, compliance-engine, and Keycloak secrets; mTLS PKI engine issuing certs
- Keycloak realm exports committed to repo as disaster-recovery artefact
- Grafana dashboards for auth-service, Kong, Keycloak, compliance-engine deployed
4.2 M1 — First Marketable Product (Sprint 3–5)
Capabilities delivered:
POST /v1/sms/sendwith validation, rate limiting, idempotencyGET /v1/sms/:messageId/statusfor status polling- SMS orchestrator: consume → validate → route → publish → retry → DLQ
- Routing engine: gRPC operator selection, longest-prefix matching, health-aware failover
- SMPP connector: bind, submit_sm, enquire_link, TPS throttling, failover
- DLR processor: receive deliver_sm → normalize → update status → persist receipt
- Full message lifecycle: QUEUED → ROUTING → ROUTED → SENT → DELIVERED/FAILED
Services: api-gateway (SMS endpoints), sms-orchestrator, routing-engine, smpp-connector, dlr-processor
Frontend: None
Dependencies: M0 complete; at least 1 SMPP operator configured (test/sandbox)
Risks & mitigations:
| Risk | Impact | Mitigation |
|---|---|---|
| SMPP operator sandbox unavailable | Blocks E2E testing | Ship mock SMPP simulator; test against it |
| gRPC contract instability | Cascading changes | Freeze proto in Sprint 3; contract tests |
| Message loss during NATS consumer scaling | Data integrity | Durable consumers + explicit ack; chaos test |
| TPS throttling race conditions | Duplicate SMS | Redis Lua atomic script; integration tests |
Acceptance criteria:
- SMS sent via API arrives on test handset within 30s (happy path)
- DLR received and message status updated to DELIVERED
- Idempotent resend with same key returns original response
- Rate limit returns 429 when exceeded
- Failed SMPP send retries 3x with exponential backoff
- After 3 failures, message routes to DLQ
- Operator failover triggers when primary disconnects
- P95 routing decision < 50ms
- P95 API-to-NATS-publish < 100ms
- P95
EvaluateCompliance< 500ms at 200 RPS -
compliance.audit.v1emitted for every evaluated SMS; Grafana dashboard shows audit rate ≈ orchestrator throughput - Kill-compliance-engine chaos test: messages stay in
EVALUATING, redeliver 3x, move tosms.outbound.deadletterwith reasoncompliance_unavailable; zero carrier dispatches - Observation-mode rule set active — all rules
FLAG, no HOLD/BLOCK verdicts produced
Release checklist:
- All 5 services passing health probes
- Contract tests (orchestrator ↔ routing-engine gRPC) green
- NATS consumer lag dashboards deployed
- SMPP session state dashboard deployed
- Runbooks: "SMPP operator disconnect", "NATS consumer lag spike", "DLQ depth alert"
- Load test: 100 msg/s sustained for 10 min with < 0.1% loss
4.3 M2 — First Sellable Product (Sprint 6–8)
Capabilities delivered:
- Customer portal: signup, login, API key management, test SMS, message logs, webhook config, billing dashboard
- Billing service: event ingestion, pricing resolution, invoice generation, usage API
- Webhook dispatcher: HMAC-signed delivery, retry, dead-letter, delivery logging
- DLR → billing event emission (exactly-once)
- DLR → webhook dispatch trigger (conditional on callbackUrl)
Services: billing-service, customer-portal, webhook-dispatcher, api-gateway (billing/webhook endpoints), dlr-processor (billing + webhook emission)
Frontend: customer-portal (Next.js)
Dependencies: M1 complete; Stripe/payment integration for invoice payment (or manual initially)
Risks & mitigations:
| Risk | Impact | Mitigation |
|---|---|---|
| Billing event double-counting | Revenue integrity | Redis exactly-once dedup key; reconciliation job |
| Customer portal UX friction | Onboarding drop-off | Iterative UX testing with 3 beta customers |
| Webhook endpoint unreliable | Customer trust | Retry with backoff; dead-letter + admin visibility |
| Invoice generation race condition | Duplicate invoices | PostgreSQL advisory lock; idempotent cron |
Acceptance criteria:
- Customer signs up (Keycloak registration flow), creates API key, sends test SMS, sees DLR in message log
- Webhook delivered with valid HMAC signature within 5s of DLR
- Monthly invoice generated with correct segment count and pricing; blocked/held/expired messages are NOT billed
- Webhook retry exhaustion routes to dead-letter; visible in customer portal
- CSV export of message logs works for 10k+ records
- Billing usage API returns correct totals matching event log
- Compliance enforcement is on: seeded rule set has at least one
BLOCKkeyword and oneHOLDrule; triggering each from customer-portal test SMS produces the expected terminal state and portal notification - Admin reviewer can
RELEASEa held message from admin-dashboard and see it subsequently billed on DLR
Release checklist:
- Customer portal deployed behind Keycloak OIDC SSO (default realm); no Firebase dependency in customer portal
- Billing service reconciliation job runs nightly and includes compliance-waiver reconciliation
- Webhook HMAC verification documented in customer docs
- Pricing rules seeded for beta tier; compliance seed rules reviewed by Trust & Safety
- 5 beta customers onboarded and sending production traffic
- Compliance hold-queue SLO dashboard deployed (queue depth, oldest pending, reviewer response time)
4.4 M3 — Competitive Differentiator (Sprint 9–11)
Capabilities delivered:
- Admin dashboard: operator CRUD, routing rule management, user/role management, message logs, billing overview, system health
- Operator management service: Vault-secured credentials, audit trail, health propagation
- Advanced routing: per-account overrides, cost-based selection, priority, round-robin
- Analytics pipeline: event ingestion, hourly/daily aggregation, operator + account metrics
- SMPP connector observability: metrics, structured logging, tracing
Services: admin-dashboard, operator-management-service, routing-engine (advanced), smpp-connector (observability), analytics-service (pipeline)
Frontend: admin-dashboard (Next.js)
Dependencies: M2 complete; Vault configured for operator credentials
Risks & mitigations:
| Risk | Impact | Mitigation |
|---|---|---|
| Vault availability | Operator connections fail | Fallback to K8s Secrets (degraded mode) |
| Routing rule complexity | Edge-case bugs | Property-based testing on rule engine |
| Admin dashboard scope creep | Delayed delivery | Strict epic scope; defer nice-to-haves to M5 |
Acceptance criteria:
- Admin creates operator via dashboard; SMPP connector binds within 30s
- Routing rule change reflected in next routing decision within 5s
- Cost-based routing selects cheapest operator for given prefix
- Operator health event disables operator in routing within 10s
- Analytics dashboard shows correct hourly aggregates within 2 hours of events
- Credential rotation via Vault does not drop SMPP session
- Tenant compliance score recomputed every 15 min; tier transitions emit
compliance.tenant.tier.changed.v1; SUSPENDED → auto-HOLD observed end-to-end - Enterprise tenant onboarded via Azure AD OIDC (or Okta SAML) in staging: admin registers discovery URL / metadata, user SSOs into portal, platform JWT issued with
idp=tenant-oidc:<tenantId>claim,auth.external_identity.linked.v1emitted - SCIM push from tenant IdP provisions 100 users into Keycloak + mirrors into
auth.userswithin 5 s - Admin-dashboard compliance console: author rule, assign to tenant, observe verdict change in real time via SSE
Release checklist:
- Admin dashboard deployed behind Keycloak realm with
platform.*roles; per-environment admin client configured - Vault policies configured for operator-management-service and Keycloak Admin REST credentials
- Analytics aggregation cron verified over 7-day window
- 20+ customers migrated to tiered pricing
- At least one enterprise tenant on external OIDC SSO in production
- Runbooks: "operator credential rotation", "routing rule misconfiguration", "tenant IdP onboarding", "tenant IdP emergency disable", "tenant compliance score override"
4.5 M4 — Full Platform GA (Sprint 12–14)
Capabilities delivered:
- Notification service: welcome emails, invoice emails, operator alerts, system alerts, delivery preferences
- Analytics query API with Redis caching
- Data retention policies enforced (hourly → daily → archive → purge)
- GDPR compliance: account erasure flow
- Full observability: every service has health probes, Prometheus metrics, OpenTelemetry traces, Grafana dashboards, Loki log aggregation
- HPA tuning: load-tested and validated for 500 msg/s sustained
- Security audit: mTLS between services, Vault for all secrets, no PII in logs
- Accessibility: keyboard navigation, dark mode (admin dashboard)
Services: notification-service, analytics-service (query API + retention), dlr-processor (observability), all services (hardening)
Frontend: Both portals polished; a11y audit clean.
Dependencies: M3 complete; security audit scheduled
Acceptance criteria:
- Welcome email sent on customer signup within 60s
- Invoice email sent on invoice generation
- Operator-down alert reaches admin within 30s
- Analytics API returns cached results in < 100ms
- Data retention: hourly data older than 90d archived; raw events > 365d purged;
compliance.audit_logretained ≥ 13 months with partition pruning verified - GDPR erasure deletes all PII within 72h of request across
authANDcomplianceschemas (hold-queue body/to redacted; audit log pseudonymised per regulator guidance) - Load test: 500 msg/s sustained for 30 min with compliance enforcement on, AI classification on ≥ 30% of traffic; P99 < 500ms end-to-end; 0 message loss
- Local LLM P95 inference ≤ 300ms; AI cache hit rate ≥ 70% after 24h warm-up
- TENANT_AUDIT report generates for a 90-day window in ≤ 5 minutes and is regulator-accepted in dry-run review
- All services pass security checklist
- All runbooks reviewed and rehearsed
Release checklist:
- GA launch checklist signed by engineering + product + security
- Status page configured
- On-call rotation established
- SLA documentation published
- Customer documentation site live
4.6 M5 — Post-GA Expansion (Sprint 15–18)
Capabilities delivered:
- Hot-reload SMPP operator config (SMPP-US-015)
- Multi-currency billing (localized pricing per market)
- Real-time analytics streaming (WebSocket / SSE)
- API v2 planning and deprecation strategy
- Additional MNO integrations (new markets)
- Performance optimization: connection pooling, batch processing
Services: All services (incremental improvements)
Acceptance criteria:
- Operator config change applies without SMPP session drop
- Invoice generated in customer's designated currency
- Real-time analytics dashboard updates within 5s of event
- 3+ new MNO operators integrated
5. Competitive Positioning
5.1 Sequencing Advantage
| Phase | What we have | What competitors lack |
|---|---|---|
| M1 (Week 10) | Working SMS API with DLR | Most competitors ship API-only without DLR tracking |
| M2 (Week 16) | Self-service portal + billing + webhooks | Competitors require manual onboarding; no webhook DLR |
| M3 (Week 22) | Intelligent routing + admin tooling + analytics | Commodity APIs have no routing intelligence or admin visibility |
| M4 (Week 28) | GA with SLAs, compliance, full observability | Small competitors lack compliance tooling; large ones are slow |
| M5 (Week 36) | Multi-currency, hot-reload, real-time analytics | Operational excellence that compounds with scale |
5.2 Key Differentiators
- Intelligent routing — Cost-based, priority, round-robin, health-aware failover. Most commodity SMS APIs route statically.
- Real-time operator failover — SMPP health events propagate to routing within 10s. Competitors often require manual intervention.
- Self-service everything — Customer portal eliminates sales friction. Admin dashboard eliminates ops tickets.
- Webhook-first DLR delivery — Customers get proactive DLR callbacks instead of polling.
- Telecom-grade reliability — NATS JetStream with durable consumers, exactly-once billing, idempotent processing, DLQ with alerting.
- Multi-tenant from day one — Every query, every event, every log scoped to
account_id. Enterprise customers get isolation guarantees.
6. Timeline (ASCII Gantt)
Week: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Sprint: S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18
M0 Foundation
|████████|
S1 S2
M1 Core SMS Flow
|████████████████|
S3 S4 S5
↑ First SMS delivered
M2 First Sellable Product
|████████████████|
S6 S7 S8
↑ First revenue
M3 Competitive Differentiator
|████████████████|
S9 S10 S11
↑ Enterprise-ready
M4 Full Platform GA
|████████████████|
S12 S13 S14
↑ GA Launch
M5 Post-GA Expansion
|████████████████████████|
S15 S16 S17 S18
═══ SERVICE TIMELINE ═══
Keycloak (infra) |████|
auth-service |████████████████████████████████████| ← IdP abstraction, SSO onboarding, SCIM, Firebase retirement
api-gateway (Kong) |████████████|
compliance-engine |████████████████████████████████| ← scaffold M0; rules+pipeline M1; hold-queue+mgmt M2; scoring M3; AI+reporting M4
compliance-ai (LLM) |████████████| ← M4 onward
sms-orchestrator |████████████████| ← core M1; compliance integration M1; release-path M2
routing-engine |████████████████████████|
smpp-connector |████████████████████████████|
dlr-processor |████████████████████████████████|
billing-service |████████████████████| ← compliance waiver added M2
customer-portal |████████████████████| ← compliance views added M2
webhook-dispatcher |████████████████|
operator-mgmt-svc |████████████|
admin-dashboard |████████████████| ← compliance console added M3
analytics-service |████████████████████| ← compliance analytics added M3
notification-service |████████████████████| ← hold/block alerts brought forward to M2
═══ PARALLEL WORKSTREAMS ═══
Stream A (Core): auth/Keycloak → Kong → orchestrator → smpp → dlr
Stream B (Commerce): billing → portal → webhooks
Stream C (Admin): ops-mgmt → admin-dash → analytics
Stream D (Trust/Safety): compliance-engine → rules → hold-queue → scoring → AI
Stream E (Identity): IdP abstraction → tenant OIDC/SAML SSO → SCIM → Firebase retirement
Stream F (Ops): notif → hardening
7. Sprint-to-Milestone Mapping (Summary)
| Sprint | Milestone | Primary Focus |
|---|---|---|
| Sprint 1 | M0 | K8s, DB, NATS, Redis, shared packages; Keycloak HA deployment + realm bootstrap; auth-service + IdentityProvider port scaffold; compliance-engine service scaffold |
| Sprint 2 | M0 | Auth hardening, API-key lifecycle, Kong plugins (ADR-0001); compliance-engine compliance schema + mTLS + health; proto + schema freeze |
| Sprint 3 | M1 | Kong SMS send routes, sms-orchestrator core, routing-engine gRPC; compliance-engine EvaluateCompliance handler + KEYWORD/SENDER_ID/GEO rule types |
| Sprint 4 | M1 | SMPP connector (bind, submit_sm, TPS); orchestrator retry + DLQ; orchestrator ↔ compliance gRPC integration (observation mode) |
| Sprint 5 | M1 | DLR processor, SMPP DLR handling, E2E SMS flow validation; compliance fail-closed chaos test + observation-mode rollout sign-off |
| Sprint 6 | M2 | Billing event ingestion + pricing, customer-portal Keycloak SSO + API keys; compliance rule + rule-set management REST + hold-queue insertion |
| Sprint 7 | M2 | Webhook dispatcher, customer portal (test SMS, message logs); hold-queue admin review (RELEASE/REJECT), notification-service compliance-event consumer |
| Sprint 8 | M2 | Invoice generation, billing dashboard, webhook management, beta launch; billing compliance-waiver consumer, customer-portal compliance states + appeals, compliance enforcement turned on |
| Sprint 9 | M3 | Operator-management service, admin-dashboard Keycloak SSO + operator CRUD; auth-service tenant external OIDC SSO (AUTH-EPIC-005) |
| Sprint 10 | M3 | Advanced routing rules, admin routing management, analytics ingestion; auth-service tenant SAML SSO (AUTH-EPIC-006) + SCIM (AUTH-EPIC-007); compliance scoring worker (EP-CE-05) |
| Sprint 11 | M3 | Admin message logs, billing overview, system health, analytics aggregation; admin-dashboard compliance console (ADMDASH-EPIC-008); first enterprise tenant onboarded on external SSO |
| Sprint 12 | M4 | Notification service core, analytics query API; local-LLM (compliance-ai) deployment + provider abstraction (EP-CE-09 core) |
| Sprint 13 | M4 | Data retention, GDPR erasure across auth + compliance, DLR observability; AI_CLASSIFICATION + DLR_ABUSE rule types finished; compliance report generation + audit-log query (EP-CE-06) |
| Sprint 14 | M4 | Security audit, load testing (500 msg/s with compliance + AI on), a11y, GA launch checklist |
| Sprint 15 | M5 | Hot-reload SMPP config, multi-currency billing groundwork; compliance classification accuracy harness (US-CE-041) |
| Sprint 16 | M5 | Real-time analytics, additional MNO integrations; Firebase legacy retirement kickoff — migration wizard + communication |
| Sprint 17 | M5 | API v2 planning, performance optimization; Firebase legacy retirement execution — bulk migrate residual users to Keycloak |
| Sprint 18 | M5 | Polish, documentation, geographic expansion prep; remove FirebaseLegacyProvider adapter; close out legacy tenant migration |
8. New Epics Added in v1.1 — At a Glance
Identity (2 new epics, +2 superseding renames):
| Epic | Service | Intent | Milestone |
|---|---|---|---|
| AUTH-EPIC-001 (rebaselined) | auth-service | Keycloak baseline (replaces Firebase baseline) | M0 |
| AUTH-EPIC-002 (new) | auth-service | IdP provider abstraction | M0 |
| AUTH-EPIC-005 (new) | auth-service | Tenant external OIDC SSO (brokered) | M3 |
| AUTH-EPIC-006 (new) | auth-service | Tenant external SAML 2.0 SSO (brokered) | M3 |
| AUTH-EPIC-007 (new) | auth-service | SCIM 2.0 inbound provisioning | M3 |
| AUTH-EPIC-008 (new) | auth-service | Firebase legacy retirement | M5 |
Compliance (10 new epics):
| Epic | Service | Intent | Milestone |
|---|---|---|---|
| EP-CE-01 | compliance-engine | Service Foundation | M0 |
| EP-CE-02 | compliance-engine | Rule Engine Core (10 rule types) | M1 (subset) → M2 (remainder) → M4 (AI + DLR) |
| EP-CE-03 | compliance-engine | Hold Queue & Manual Review | M2 |
| EP-CE-04 | compliance-engine | Rule & Blocklist Management API | M2 |
| EP-CE-05 | compliance-engine | Tenant Scoring & Risk Tiering | M3 |
| EP-CE-06 | compliance-engine | Reporting & Audit | M4 |
| EP-CE-07 | compliance-engine | Observability & Production Hardening | M1 (subset) → M3 (remainder) |
| EP-CE-08 | compliance-engine + sms-orchestrator | Async Pipeline Integration | M1 |
| EP-CE-09 | compliance-engine + compliance-ai | Local LLM Platform | M4 (core) → M5 (accuracy harness) |
| EP-CE-10 | compliance-engine + customer-portal + admin-dashboard | Tenant-Facing Web Portal Integration | M2 (subset) → M3 (subset) |
Consumer-side epics for compliance events:
| Epic | Service | Milestone |
|---|---|---|
| ORCH-EPIC-005 | sms-orchestrator | M1 |
| BILL-EPIC-005 | billing-service | M2 |
| CUST-EPIC-007 | customer-portal | M2 |
| ADMDASH-EPIC-008 | admin-dashboard | M3 |
| ANLYT-EPIC-005 | analytics-service | M3 |
End of Roadmap.