Skip to main content

Ghasi-SMS-Gateway — Critique & Gap Analysis

Version: 1.1 (re-scanned baseline) Status: Approved (baseline critique) Owner: Platform Architecture + Trust & Safety + SRE Last Updated: 2026-04-20 Scope reviewed: docs/01-…14-*.md, docs/architecture/ADR-000{1,2,3,4}, all 14 service docs under services/<svc>/ (incl. _report.md, SERVICE_OVERVIEW.md, JIRA_IMPORT.csv), docs/standards/, docs/roadmap/ROADMAP.md.

Change log

  • v1.1 (2026-04-20) — Re-scanned the repo after correcting an inventory error. The earlier draft overstated the stub problem (it claimed 11 of 14 services were stubs and that 07-epics-and-user-stories.md was empty). Both are wrong: every service has a populated SERVICE_OVERVIEW.md (3KB–12KB) and a _report.md carrying full Jira-ready epics & stories under the EP-{PREFIX}-NN / US-{PREFIX}-NNN scheme; 07-…md is populated with five cross-cutting platform epics (EP-PLAT-01..05). The critique below is recalibrated to that real baseline. Architectural / NFR / national-backbone critique stands.
  • v1.0 (2026-04-20) — Initial draft.

Companion deliverables:


0. Verdict

The platform has a strong micro-architectural skeleton (Kong → orchestrator → compliance → routing → SMPP, NATS JetStream backbone, fail-closed compliance, Keycloak-brokered IdP) and a substantial Jira-ready backlog already authored (60+ epics, 200+ stories, consistent IDs across 14 services). It is materially better than the average vendor starting point.

What is missing is not engineering discipline — it's the telecom-grade national-asset DNA: multi-region resilience, regulator integration, fraud / AIT / SIM-box defence, sender-ID national registry, HLR/MNP authority, cell-broadcast for emergencies, multi-channel fallback, sovereign HSM key custody, service-mesh zero trust, quantitative NFR/SLA catalog, and the operational machinery (NOC, chaos drills, status page, change management) that distinguishes an enterprise SaaS from a national backbone.

One-line summary: strong foundations, mature backlog, but architected for SaaS, not for a national asset; the gap to Twilio/Infobip/Sinch parity is now small — the gap to surpassing them is closed by the additions in ADR-0004 and the new epics in 007.


1. What is Strong (keep, do not regress)

#StrengthEvidenceWhy it matters
1Mature, ID-consistent backlog already exists across 14 services using EP-{PREFIX}-NN and US-{PREFIX}-NNNservices/*/_report.md — 60+ epics, 200+ stories, all with acceptance criteria and story pointsMost "national backbone" pitches do not have this much actually written down.
2First-class fail-closed Compliance Layer between orchestrator and routing01 §3.2, §4; compliance-engine is the most fully-specified service (12.6 KB SERVICE_OVERVIEW, 763-line _report.md, dedicated JIRA_IMPORT.csv with 10 epics + 45 stories)Twilio / Infobip treat moderation as a sidecar; you treat it as a tier. Most differentiated control today.
3Pluggable IdP abstraction with Keycloak as broker and per-tenant OIDC/SAML01 §3.1, ADR-0002Enterprise customers (banks, ministries) demand SSO; Twilio's federation story is weaker.
4NATS JetStream as the only async substrate, with explicit DLQ + durable consumer policy01 §6, 13 §8Avoids the Kafka operational tax; correct choice for sub-30 ms inter-service hops.
5gRPC for hot paths (Routing Engine SelectOperator, Compliance EvaluateCompliance) with REST for admin/CRUD01 §3.2; ADR-0002, ADR-0003Right choice for sub-50 ms decisions.
617-doc service template + Definition of Done with mutation-test thresholds, RLS/tenant guards, AI-provenance VOs, WCAG 2.2, ICU MessageFormatstandards/SERVICE_TEMPLATE.md, standards/DEFINITION_OF_DONE.mdBetter engineering discipline than most regional competitors ship with.
7Real telecom rigour already authored in smpp-connector/_report.md (bind modes, enquire_link heartbeat, exponential backoff, GSM7/UCS2 encoding, CSMS segmentation, message_payload TLV, message correlation persistence, Redis sliding-window TPS, primary/backup failover)services/smpp-connector/_report.md US-SC-001..015Far stronger than the typical "build SMPP later" placeholder.
813-month immutable compliance audit log01 §6 retention notes; compliance-engine/DATA_MODEL.mdTelecom-regulator-defensible.
9Tenant compliance scoring (0–100) + risk tiering + automated thresholds13 §4; EP-CE-05 epicTwilio has none of this surfaced to tenants.
10mTLS for gRPC, HMAC for webhooks, explicit rotation cadence per secret class13 §5, §7Enterprise table stakes done right.

2. What is Weak (must improve before national-backbone GA)

2.1 Documentation rot and contradictions (real, post re-scan)

The "stub" claim from v1.0 was wrong. The remaining real issues:

  • docs/02-ddd-bounded-contexts.md is mis-titled — its content is the testing strategy, not bounded contexts. Either rename or write the actual context map. The DDD context map is referenced everywhere but exists nowhere.
  • docs/04, 05, 08, 09, 10, 11, 12, 14 are 17-line placeholders (event-driven-architecture, api-design, frontend-design-guidelines, frontend-workflows, data-models, risks-and-tradeoffs, observability-telemetry, testing-strategy-qa). Not optional for a national backbone. (These exist as targeted topical docs; their absence is a documentation gap, not a backlog gap.)
  • Auth contradiction with the customer portal: customer-portal/_report.md US-CUST-01-01 still says login uses Firebase (Firebase Auth signInWithEmailAndPassword, POST /v1/auth/firebase). This contradicts ADR-0002 / PLT-ADR-009 (Keycloak baseline; Firebase is legacy-only). → Update story to Keycloak OIDC PKCE flow with Firebase as a feature-flagged legacy fallback.
  • auth-service/_report.md scope header says: "Firebase federation, session management, and account provisioning". Stories are Keycloak-aware but the scope sentence drifts — clean it up.
  • compliance-engine epic-ID mismatch: services/compliance-engine/_report.md uses canonical IDs EP-CE-01..10 and US-CE-001..045. The same service's services/compliance-engine/JIRA_IMPORT.csv uses CE-E1..10 and CE-1..45. One source of truth must win. Recommendation: EP-CE-* / US-CE-* is canonical (matches the platform EP-{PREFIX}-NN / US-{PREFIX}-NNN registry in 07-…md). The CSV must be regenerated.
  • _sources/<svc>/epics.md legacy artifacts still contain pre-Keycloak language (e.g., auth-service/_sources/auth-service/epics.md calls AUTH-EPIC-001 "Firebase Integration & Account Provisioning"). They are clearly legacy migration aids per README.md, but they should be deleted or marked DEPRECATED to prevent future confusion.

2.2 Architecture: missing layers (national-backbone deltas)

The container diagram is missing tiers a national backbone requires:

Missing tierWhy it must existNew service proposed
National SMS Firewall (inbound + transit)Detect and block grey-route A2P, AIT (Artificially Inflated Traffic), SIM-box originators, fraudulent OTP harvesting, inbound spam at the border. Today the platform mediates only outbound to MNOs — there is no inbound firewall.sms-firewall-service
HLR/HSS lookup + MNP authorityWithout HLR/MSISDN-to-MNO resolution and Mobile Number Portability, "operator selection" is a guess. Twilio buys this; you must own it because Afghanistan has no neutral MNP authority — Ghasi can be it.number-intelligence-service
Sender-ID RegistryIndia's DLT, UAE's TRA, KSA's CITC, US 10DLC all require a registered-sender model. ATRA does not yet — Ghasi should ship the registry.sender-id-registry-service
Cell Broadcast / Emergency Alerts adjunctA national gateway is the natural home for 3GPP TS 23.041 / ETSI EN 302 117 cell-broadcast for civil alerts. Requires MNO RAN integration; unaddressed.cbc-bridge-service
Lawful Intercept + Regulator ReportingATRA will require LI (ETSI TS 102 232) and periodic CDR reporting. Not in any doc.regulator-portal-service + cdr-mediation-service
MMS / RCS / WhatsApp Business / Voice-OTP fallback channelsPrompt asks for "intelligent fallback (SMPP → HTTP → USSD)"; no channel beyond SMPP is in the architecture.channel-router-service
CDR (Charging Data Record) pipeline distinct from billing eventsTelecom CDR has specific schema (TAP 3.12, RAP) and is the artifact regulators audit. billing.events is not a CDR.cdr-mediation-service
Number-pool / short-code lifecycleLong-codes, short-codes, alpha-IDs, MSISDN inventory, leasing, reservation, expiry, recall — none modelled.numbering-service
DND / Consent / STOP-keyword national ledgerCompliance has TEMPORAL/RECIPIENT rules but no national consent ledger; STOP keyword routing undefined.consent-ledger-service
Fraud / AIT / SIM-box detectionTwilio loses ~3% of revenue to AIT; without prevention you will too.fraud-intel-service
Multi-region active-active data plane + cross-border sovereign DRSingle region with "DR optional" (01 §10 A-001/A-002) is unacceptable for a national asset.Topology in ADR-0004
Tenant VPC peering / Direct Connect ingressBanks and ministries will not accept messaging from the public internet.Edge enhancement in ADR-0004
Quota / TPS shaping engine — per-tenant × per-operator × per-shortcode × per-priority-laneToday TPS is a single Redis namespace owned by smpp-connector.New tier in routing-engine (epic added)
Government priority-lane + emergency overrideMentioned nowhere.EP-PLAT-NB-08 (lane policy) + cbc-bridge-service
Verify / Lookup / Notify / Conversation APIs (Twilio parity)Not in scope; required for developer adoption.New epics under developer-portal-service + campaign-service

2.3 Architecture: weak choices

  • Single-region Postgres with "HA replica" is not telecom-grade. Postgres needs (a) multi-AZ synchronous + multi-region async (Patroni or managed equivalent), (b) a CDR/event store on object storage with a separate cold-tier query layer (ClickHouse). The note in 01 §10 A-003 ("ClickHouse is optional scaffolding") must be promoted to a hard requirement.
  • smpp-connector as a single StatefulSet with 2 replicas is naive at national scale. Each MNO bind needs (i) per-bind TPS shaping, (ii) per-bind windowing, (iii) per-bind sequence-number management, (iv) per-bind enquire_link cadence, (v) per-bind reconnection backoff with jitter, (vi) per-bind concatenation buffers, (vii) per-bind UDH/TLV handling. The current _report.md already covers most of these as logic (US-SC-001..015) — what is missing is the deployment topology: per-MNO per-direction Deployment/StatefulSet pools with bind affinity. → EP-SC-05 and EP-SC-06 added.
  • Single NATS JetStream cluster with no cross-region mirror (01 §3, 03 §2). For a regulated national service, JetStream needs leaf-node + mirror replication into a DR region. → ADR-0004 §13.
  • Routing Engine today exposes COST / PRIORITY / FAILOVER strategies (US-RE-005..007) — a strong start. Missing: live operator quality scoring (delivery rate, latency, cost), per-route cost tables that vary by hour/day, per-tenant route preferences and exclusions, regulatory route restrictions, gray-route exclusion, QoS lanes (OTP < 3 s, marketing best-effort, government priority). → EP-RE-05 and EP-RE-06 added.
  • compliance-engine AI fallback to "external LLM" opens a data-residency hole (SCT-001, SCT-003) acknowledged but not closed. For Afghanistan, all PII-bearing inference must remain on-prem; external fallback must be feature-flagged off by default and per-tenant opt-in only. → EP-CE-09 (Local LLM Platform) is in place; add story to disable external fallback by default and require explicit per-tenant opt-in.
  • No long-SMS encoding/segment-pricing alignment with Pashto/Dari (UCS-2 → 70 chars/segment vs. GSM7 → 160). smpp-connector handles encoding correctly (US-SC-007); billing-service does not have a story for UCS-2 segment-pricing parity — risk: customers billed wrong on Pashto/Dari. → New story US-BILL-037.
  • webhook-dispatcher retries are not specified with explicit back-off, signing-key rotation, or back-pressure policy. Existing EP-HOOK-01..04 cover delivery and HMAC but not back-pressure under stampede. → EP-HOOK-05 added.
  • api-gateway (Kong) has 29 stories (US-KONG-01..29) — strong. Missing: JA3 fingerprint blocking, adaptive rate-limit per consumer+key dimensions, mTLS upstream policy spec for sensitive routes (compliance, regulator). → EP-KONG-06 added.
  • No idempotency contract for inbound DLR in dlr-processor. Operators sometimes re-deliver DLR PDUs. → US-DLR-015 added (dedup by (operator_id, message_id, status, timestamp_bucket)).

2.4 NFRs and SLAs — almost entirely missing

The platform documents:

  • 90/80/60% test coverage thresholds.
  • gRPC P95 ≤ 500 ms for compliance evaluation.
  • Some per-service P95 latency targets (e.g., orch GET P95 ≤ 50 ms, billing usage P95 ≤ 300 ms, RBAC P95 ≤ 50 ms).
  • "30 days Prometheus retention", "14 days Loki", "7 days OTel".

That is essentially all the quantitative NFRs the platform commits to platform-wide. Missing:

  • Throughput targets (msg/s steady, msg/s burst, peak-hour, peak-second).
  • Submit-to-DLR P50/P95/P99 latency budgets per traffic class (OTP, transactional, marketing, broadcast).
  • DLR-receipt SLA per operator.
  • Availability SLOs (99.9 / 99.95 / 99.99) per service tier.
  • Error-budget policy.
  • RPO/RTO commitments (A-002 admits "TBD; assumed RPO 1h / RTO 4h" — for OTP traffic, RPO 1h is unacceptable).
  • Per-tenant quotas / fair-use defaults.
  • Per-priority-lane SLAs.
  • Concurrency, queue-depth, and back-pressure thresholds.
  • Maximum acceptable compliance HOLD review latency.
  • Maximum acceptable webhook-delivery time.
  • TPS-shaping precision and burst policy per MNO contract.

Fixed by EP-PLAT-NB-09 — NFR/SLA Catalog & Error-Budget Policy in 007 and a new 15-nfr-sla-catalog.md doc (to be authored).

2.5 Security gaps (national-asset bar)

GapRequired upliftEpic
No HSM / KMS-backed signing for platform JWT, SAML SP keys, webhook HMACPKCS#11 HSM (FIPS 140-2 L3) for: JWT signing, SAML SP keys, webhook HMAC root, SMS-content envelope keys. Vault stays for transit & lifecycle, HSM holds master.EP-PLAT-NB-04
No Zero-Trust east-west policy specifiedService-mesh mTLS + SPIFFE/SPIRE workload identities (Istio or Linkerd).EP-PLAT-NB-05
No DDoS / abuse defence at the edge beyond Cloudflare WAFPer-tenant per-API-key adaptive rate-limit, layer-7 fingerprinting, JA3 blocking, tarpit lane.EP-KONG-06
No threat model artifactSTRIDE per service under docs/security/threat-models/.EP-PLAT-NB-10
No SBOM, no signed imagesSigstore/Cosign image signing + SBOM (CycloneDX) per build, verified by Kyverno/Gatekeeper.EP-PLAT-NB-11
No CIS-Benchmarked node + Pod Security Standards profileAll workloads restricted PSA, runAsNonRoot, read-only root FS, seccomp RuntimeDefault.EP-PLAT-NB-11
No secrets-in-source CI gategitleaks + trufflehog as required CI step (claimed manually in 13 §7 but no enforcement).EP-PLAT-NB-11
Lawful intercept and SIEM forwarding undefinedSecurity-relevant events to a SIEM (Splunk/ELK/QRadar) with WORM retention.EP-REG-01
Customer-portal session security headers missingCSP, COEP, COOP, SRI, Trusted-Types, sub-resource integrity.EP-CUST-07

2.6 Operational excellence — partial

  • Grafana dashboards listed but no NOC dashboard (single pane: per-MNO bind health, queue depth, TPS shaping, DLR latency heatmap, compliance hold queue, fraud signals, regulator alerts). → EP-PLAT-NB-12.
  • No runbook catalogue. Compliance engine mentions runbooks; no other service does. → covered in DoD; track in EP-PLAT-NB-12.
  • No chaos engineering programme.EP-PLAT-NB-13.
  • No capacity model. What does 10 M msg/h mean in NATS bytes/s, Postgres rows/s, Redis ops/s, Postgres WAL/h, cluster CPU cores, MNO TPS budget? Nowhere computed. → EP-PLAT-NB-09.
  • Status page not specified. → EP-PLAT-NB-14.

2.7 Product / commercial gaps

  • billing-service has 36 stories (EP-BILL-01..05) — strong start, with usage queries, invoicing, pricing CRUD, alerts. Missing for national backbone:
    • SLA-backed pricing tiers (committed throughput, reserved capacity, government bulk, OTP premium).
    • Tax engine (VAT/national sales tax).
    • AFN-USD multi-currency + FX policy.
    • Pre-paid wallet + post-paid invoicing dual model.
    • Credit notes / refunds / dispute workflow.
    • Revenue assurance / leakage detection.
    • EP-BILL-06 and EP-BILL-07 added.
  • No marketplace / template catalog — pre-approved templates (DLT-style) are how India and the Gulf solved spam at scale. → EP-CAMP-01..04 (new service).
  • No partner/reseller programme (sub-tenants under a tenant). → EP-AUTH-06 extension for sub-org model.
  • No SDKs named (Node, Python, Java, .NET, Go, PHP, Flutter, Android, iOS). → EP-DEV-01..04 (new service).
  • No public developer portal beyond customer-portal. → EP-DEV-01.
  • No template-based personalisation engine with merge-fields + conditional content (Twilio Notify equivalent). → EP-CAMP-02.
  • No campaign management UI (segments, schedule, A/B, throttle, kill-switch). → EP-CAMP-01.
  • No 2-way SMS / inbound MO routing to tenant flow specified. → EP-CHAN-03 (new service).
  • No conversational session manager (sticky alpha-ID ↔ MSISDN ↔ tenant correlation across MO/MT pairs). → EP-CHAN-04.
  • No Verify API (managed OTP, Twilio Verify equivalent). → EP-DEV-05 or sub-epic of channel-router-service.
  • No Lookup API (number intelligence as a tenant-callable API). → EP-NI-04.

2.8 Regulatory / sovereignty gaps

  • ATRA (Afghanistan Telecom Regulatory Authority) is not named anywhere; reporting cadence, CDR format, and licensing posture are undefined. → EP-REG-01..03.
  • Data residency policy is one open point (SCT-003); it must be a first-class policy, not a TODO. → EP-PLAT-NB-04 and ADR-0004 §5.
  • GDPR / TCPA / GSMA RCS-BM compliance mentioned at one paragraph; needs an actual control catalogue. → covered by EP-CE-06 (existing) extended.
  • PII tokenisation (SCT-002) is open; for SMS bodies traversing compliance/AI, a deterministic tokeniser (e.g., FF1) with HSM-held key is the only defensible answer. → EP-PLAT-NB-04.
  • Number-portability legal posture unaddressed. → EP-NI-02.

2.9 Edge cases not in any flow

These are gaps in the flows, not always in the IDs. New stories added under appropriate epics:

  • Operator returns ESME_RTHROTTLED mid-window → new story under EP-SC-04.
  • Operator returns ESME_RSUBMITFAIL mid-window — half-close behaviour → new story.
  • DLR for unknown messageId (operator re-delivers from stale buffer) → US-DLR-015.
  • Concatenated-SMS partial DLR (segment 2 of 3 fails) — segment-aware DLR aggregation → new story under EP-DLR-05.
  • MO message arrives for a sender-ID never registered → EP-SID-03.
  • MNO emits Stop-keyword DLR (recipient opt-out) → EP-CONS-02.
  • Tenant tries to send to recipient on the DND registry → EP-CONS-01.
  • Tenant tries to send during a regulator-imposed quiet window → already in compliance-engine TEMPORAL but no national default rule-set → EP-CE-11 extension.
  • Tenant flips IdP mid-session — token-revocation propagation → EP-AUTH-06.
  • Compliance BLOCK for pre-credited tenant — refund/credit reversal → US-BILL-038 (new).
  • Webhook destination 5xx for >1 hour — circuit-break + tenant-portal alert → EP-HOOK-05.
  • Operator-ID renamed by MNO mid-day — config swap with zero in-flight loss → US-OPS-09 (new).
  • Nation-wide MNO outage — fallback to OTT (WhatsApp / Telegram / Signal-as-OTP) → EP-CHAN-01..02.

2.10 Risk register essentially absent

docs/11-risks-and-tradeoffs.md is a 17-line stub; per-service SERVICE_RISK_REGISTER.md files exist via the 17-doc template but are not all populated. For a national-asset programme, the risk register is a board-level artifact and must be populated. → EP-PLAT-NB-15.


3. What is Unclear

AreaSpecific question that must be answered
Identity broker scopeDoes Keycloak broker for every tenant or only enterprise SSO? Self-serve tenants today appear to use Keycloak directly; spec implies both — clarify in auth-service/SERVICE_OVERVIEW.md.
Compliance verdict semanticsFLAG is named in §4 but its semantics (logged but allowed? logged + sample-routed for human review?) are not defined.
Idempotency-Key scopePer (tenant, key) or per global? Retention 48h is documented (US-ORCH-005) — confirm conflict semantics for the same key with a different payload.
API versioning"OpenAPI 3.1 with /v1/" — deprecation policy N+2? Sunset header policy?
Tenant deletionGDPR erasure mentioned for users; tenant deletion (cascade across 13 schemas) is not.
Multi-tenant Postgres"per-service schema" — within a schema, RLS is enforced (DoD §2). Confirm that all tenant-scoped tables across all services have an RLS policy and a contract test asserting it.
Currencybilling-service does not pin currency strategy. AFN, USD, multi-currency? FX policy?
Legal entity modelTenant ≠ legal entity; one legal entity may own multiple tenants (sub-orgs). Not modelled.
Operator-side outbound IPsMNOs whitelist source IPs; how does the platform present a stable egress? NAT gateway, dedicated egress pods? → ADR-0004 §6 begins to address.
Disaster modeWhat is "graceful degradation" if compliance-engine is down for >1 h? Today it just queues — at national scale that becomes a regulator incident. → EP-PLAT-NB-08 (trusted-tenant fast-path) addresses the OTP slice.

4. What is Risky (top 10)

  1. Compliance fail-closed under MNO-OTP burst — A nationwide bank pushing OTPs would queue indefinitely if compliance-engine flaps. Need trusted-tenant fast-path (EP-PLAT-NB-08).
  2. Single-region everything — One DC outage = national SMS outage. Multi-region active-active is non-negotiable for the use case described. → ADR-0004 + EP-PLAT-NB-01..03.
  3. External-LLM PII leakage — Even "fallback only" is a regulator and reputational disaster waiting. Default off, per-tenant opt-in, audited. → US-CE-046 (new story under EP-CE-09).
  4. SMPP connector deployment topology — logic is good, deployment is naive. Per-MNO per-direction pools required. → EP-SC-05..06.
  5. No HLR/MNP authority — number-portability changes will silently break delivery. → number-intelligence-service.
  6. No SIM-box / AIT detection — within 12 months of public launch, fraud rings will arbitrage Ghasi for grey-route termination. → fraud-intel-service.
  7. No regulator integration — ATRA can shut you down on a single CDR audit failure. → regulator-portal-service + cdr-mediation-service.
  8. No emergency / cell-broadcast plan — the moment a public emergency happens, government will demand the gateway. → cbc-bridge-service.
  9. No supply-chain security — npm dependency hijack would compromise the national gateway; SBOM + signed images + locked registries required. → EP-PLAT-NB-11.
  10. Customer-portal Firebase login contradicts ADR-0002 — fix immediately to avoid a regression on go-live. → update US-CUST-01-01.

5. What is Not Enterprise-Grade (must be lifted)

SymptomEnterprise-grade target
Multiple topical docs are 17-line stubs (04, 05, 08, 09, 10, 11, 12, 14)Authored to the bar of 01, 03, 13.
No NFR/SLA catalogA 15-nfr-sla-catalog.md with quantitative targets per traffic class, per service tier. → EP-PLAT-NB-09.
No DR planDocumented multi-region active-active, RTO ≤ 5 min for OTP class, ≤ 15 min for transactional, ≤ 60 min for marketing. RPO ≤ 5 s for OTP/transactional, ≤ 60 s for marketing. → ADR-0004.
No published support model24×7 NOC, T1/T2/T3, P1 ≤ 15 min ack, monthly SLA credits. → EP-PLAT-NB-12.
No status pagehttps://status.ghasi.io with per-MNO and per-API-class signals. → EP-PLAT-NB-14.
No compliance certifications roadmapISO 27001, ISO 27017/27018, SOC 2 Type II, PCI DSS scope-out attestation, GSMA AA.18 (A2P SMS) accreditation. → EP-PLAT-NB-15.
No formal change managementCAB process, change windows, MNO-coordinated change notices. → EP-PLAT-NB-12.
No customer success surfaceQuarterly business reviews, dedicated TAM model for enterprise tenants. → EP-CUST-08.
No localisation discipline beyond ICU MessageFormat tagPashto/Dari translation memory, RTL audit, content lengths recomputed for UCS-2. → EP-CUST-09, US-BILL-037.

6. Mandatory Architectural Uplifts (specified in ADR-0004)

  1. Multi-region active-active across kbl (Kabul) and mzr (Mazar), plus sovereign-DR cold copy in dxb (Dubai).
  2. Control-plane vs. data-plane split on separate node pools.
  3. Per-MNO connector pools (smpp-connector-{mno}-{tx|rx|trx}) with bind affinity.
  4. Twelve new bounded contexts (see §7).
  5. HSM-backed key custody (PKCS#11, FIPS 140-2 L3).
  6. Service mesh with SPIFFE/SPIRE workload identities.
  7. NATS JetStream multi-cluster (super-cluster + leaf nodes).
  8. Postgres Patroni clusters per region; multi-master only on control-plane data.
  9. CDR pipeline distinct from billing events.
  10. National traffic priority lanes (P0–P4).
  11. Trusted-tenant fast-path for vetted regulated tenants.
  12. Chaos engineering programme.
  13. 24×7 NOC tooling.
  14. Status page + customer-facing SLO dashboard.
  15. SBOM + signed images + admission-controlled image policy.

7. New Bounded Contexts (12) — naming and prefixes

New contextService name (proposed)Epic prefixOwner
Number Intelligencenumber-intelligence-serviceEP-NI-* / US-NI-*Messaging Core
SMS Firewallsms-firewall-serviceEP-FW-* / US-FW-*Trust & Safety
Sender ID Registrysender-id-registry-serviceEP-SID-* / US-SID-*Trust & Safety + Regulator-facing
Numbering / Short-codenumbering-serviceEP-NUM-* / US-NUM-*Commerce
CDR / Mediationcdr-mediation-serviceEP-CDR-* / US-CDR-*Commerce + Regulator
Cell Broadcast Bridgecbc-bridge-serviceEP-CBC-* / US-CBC-*Government / Emergency
Channel Router (multi-channel)channel-router-serviceEP-CHAN-* / US-CHAN-*Messaging Core
Fraud Intelligencefraud-intel-serviceEP-FRAUD-* / US-FRAUD-*Trust & Safety
Regulator Liaisonregulator-portal-serviceEP-REG-* / US-REG-*Regulator-facing
Developer / SDK Portaldeveloper-portal-serviceEP-DEV-* / US-DEV-*Product
Campaign / Template Managercampaign-serviceEP-CAMP-* / US-CAMP-*Product
Consent / DND Ledgerconsent-ledger-serviceEP-CONS-* / US-CONS-*Trust & Safety

This brings the platform from 14 to 26 services. They are scoped, sequenced, and assigned epic IDs in 07-epics-and-user-stories.md.


8. Twilio / Infobip / Sinch Comparison — Where We Win, Catch Up, Differentiate

CapabilityTwilioInfobipSinchGhasi todayGhasi target
Multi-channel (SMS, MMS, RCS, WhatsApp, Voice)❌ (SMS only)✅ via channel-router-service
Per-MNO direct binds in-countryPartial✅✅ (sovereign + national)
AI content moderationPartial (Trust Hub)PartialPartial✅ (compliance-engine)✅✅ (already ahead)
Tenant compliance scoring exposed to tenant✅✅ (differentiator)
Sender-ID registryUS 10DLC onlyDLT integrationsDLT integrations✅✅ (national authority)
HLR/MNP serviceBuysBuysBuys✅✅ (owns nationally)
Cell-broadcast emergency alerts✅✅ (national authority)
Government priority laneLimitedLimitedLimited✅✅
Regulator-direct CDR export✅✅
Local LLM compliance (no data export)❌ (cloud LLM)✅✅
Fraud / AIT / SIM-box preventionPartialPartialPartial✅ (parity)
Status page + per-route SLO✅ (parity)
Multi-region active-active✅ (parity)
OAuth/SAML enterprise SSO✅ (parity)
RCS Business Messaging✅ post-GA (channel-router)
Voice OTP fallback✅ (channel-router)
Verify API (managed OTP)✅ (developer-portal/channel-router)
Lookup API (number intelligence)✅ (paid)✅✅ (free for nationals)
Notify API (broadcast/segments)✅ (campaign-service)
Conversation API (2-way sticky)✅ (channel-router)
Pricing transparencyPartialPartial✅ (must publish)

9. Engineering Punch List

#ActionOwnerSprint window
1Adopt this critique + extended 007 catalog + Jira CSVPlatform ArchitectureNow
2Approve ADR-0004 (national resilience blueprint)Architecture CouncilThis sprint
3Author 15-nfr-sla-catalog.md; bind every NFR to a Prometheus alertSRE+2 sprints
4Reconcile EP-CE-* vs CE-E* IDs — regenerate services/compliance-engine/JIRA_IMPORT.csv to use canonical IDsCompliance EngThis sprint
5Fix customer-portal/_report.md US-CUST-01-01 to use Keycloak OIDC PKCE (not Firebase)Frontend EngThis sprint
6Update auth-service/_report.md scope header to remove "Firebase federation" wordingIdentityThis sprint
7Rewrite 02-ddd-bounded-contexts.md (currently testing standards) — mis-titled filePlatform Arch+1 sprint
8Author docs 04, 05, 08, 09, 10, 11, 12, 14 to the bar of 01, 03, 13Per-domain leads+3 sprints
9Stand up the 12 new services (firewall, number-intel, sender-id, numbering, CDR, CBC, channel-router, fraud-intel, regulator-portal, dev-portal, campaign, consent-ledger) as 17-doc skeletonsPlatform PM + each domain lead+2 sprints
10Multi-region topology, HSM, service mesh, chaos, NOC dashboardsSRE + SecuritySprint windows S6–S12
11Regulator engagement (ATRA) — license posture, CDR schema, LI planLegal + Platform LeadershipContinuous
12MNO commercial + technical onboarding playbook (per-MNO bind plan, TPS contracts, escalation tree)MNO PartnershipsContinuous
13Public status page + SDKs + developer portalProduct + DevRelSprint windows S8–S14

End of critique. Continue to the extended epic catalog 07-epics-and-user-stories.md and the Jira import 07-epics-and-user-stories.JIRA_IMPORT.csv.