Numbering Service (numbering-service) — Service Overview
Status: populated Owner: Commerce Engineering Last updated: 2026-04-20 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL
1. Purpose — The System of Record for Sendable Identities
The Numbering Service is the canonical inventory and lifecycle manager for every sendable identity issued on the Ghasi-SMS-Gateway national backbone:
| Identity Class | Examples | Allocation Domain |
|---|---|---|
| MSISDN | +93701234567 (Roshan), +93791234567 (Etisalat AF), +93761234567 (MTN AF) | Per-MNO blocks coordinated with ATRA's Afghan Numbering Plan |
| Short code | 1234 (4-digit), 12345 (5-digit), 123456 (6-digit) | National pool, ATRA-issued; tenant-leasable |
| Alpha sender ID | OTPGhasi, MoFA-AF, MoPH | 1–11 character per GSM 03.38; per-tenant scope, ATRA-registered |
The service answers a single core question for every other service: "Is this identifier currently allocated, to which tenant, in which lifecycle state, and under what reservation?" It enforces the lifecycle (AVAILABLE → RESERVED → LEASED → SUSPENDED → RECALLED → QUARANTINE → AVAILABLE) and the reservation/hold/release workflow that gates every assignment.
Distinguish numbering-service from neighbors:
sender-id-registry-serviceowns registration KYC and verification of alpha IDs; numbering-service owns the inventory slot and lease record.number-intelligence-serviceowns HLR / MNP / EIR lookups (real-time per-MSISDN intelligence); numbering-service owns the inventory ledger of who can send what.sms-orchestratorconsumes numbering-service to validate that afromfield is currently leased to the calling tenant.
The service implements three capabilities:
| Capability | Description |
|---|---|
| Inventory & lifecycle | Authoritative ledger of every MSISDN, short code, and alpha ID; lifecycle state machine; lease/recall/expiry |
| Reservation, hold & release | Two-phase workflow that pins numbers temporarily before commit; auto-release on TTL expiry |
| Per-tenant pool management | Tenant-scoped pools (national, regional, vanity); quotas, cool-down windows, and pool partitioning |
2. Bounded Context
| Dimension | Value |
|---|---|
| Domain | Commerce / Numbering Plan Management |
| Owner squad | Commerce Engineering |
| Deployment unit | Kubernetes Deployment — numbering-service (3 replicas; stateless; PG is single-writer) |
| Communication style | Inbound: gRPC (from sms-orchestrator, sender-id-registry-service) · HTTP REST (admin + tenant portal) · NATS (lifecycle events) |
| Storage | PostgreSQL schema numbering · Redis (cache + reservation TTL locks) |
| Failure mode | Fail-closed on lease checks — if numbering is unavailable, no new assignment is honored; existing leases continue to function via cached snapshots |
3. Position in the Platform
4. Responsibilities
| # | Responsibility |
|---|---|
| R1 | Maintain the canonical inventory of MSISDNs, short codes, and alpha IDs with current lifecycle state and assigned tenant |
| R2 | Enforce the lifecycle state machine (AVAILABLE → RESERVED → LEASED → SUSPENDED → RECALLED → QUARANTINE → AVAILABLE) with explicit transition rules |
| R3 | Validate every assignment request against ATRA/Afghan Numbering Plan rules: E.164 format, MNO block ownership, short-code range allowlists, alpha-ID character class ([A-Za-z0-9 ], 1–11 chars) |
| R4 | Provide a two-phase reservation/hold/release workflow with explicit TTLs (default 15 min reserve; 24 h hold) |
| R5 | Issue lease records with leaseId, effectiveFrom, effectiveUntil, optional auto-renew flag, and tenant scope |
| R6 | Honour recall on regulator order, billing non-payment, abuse signal from compliance-engine, or tenant-initiated release |
| R7 | Enforce per-tenant pool quotas (max leased MSISDNs, max alpha IDs, max active reservations) |
| R8 | Apply post-recall quarantine cool-down (default 90 days for MSISDNs) before returning to AVAILABLE pool |
| R9 | Serve gRPC ValidateLease(identifier, tenantId) with P95 ≤ 20 ms (Redis-cached) for hot-path callers like sms-orchestrator |
| R10 | Publish lifecycle events to NATS: numbering.lease.created, numbering.lease.renewed, numbering.lease.recalled, numbering.reservation.held, numbering.reservation.released, numbering.pool.exhausted |
| R11 | Bulk-import MSISDN block files from MNOs (monthly CSV) and validate against ATRA-issued blocks |
| R12 | Expose tenant-portal self-serve APIs for browse pool, reserve, hold, release, view leases, view quotas |
5. Non-Responsibilities
- Does not perform HLR/MNP lookups —
number-intelligence-serviceowns real-time per-MSISDN intelligence - Does not verify alpha-ID registrant KYC —
sender-id-registry-serviceowns KYC + DNS-TXT/OTP/notarised verification - Does not route or transmit messages —
sms-orchestratorandrouting-engineown dispatch - Does not invoice tenants for leases —
billing-serviceconsumes lease events to bill - Does not decide DND/consent —
consent-ledger-serviceowns recipient consent
6. Upstream / Downstream Dependencies
| Direction | Service | Protocol | Purpose |
|---|---|---|---|
| Inbound (sync) | sms-orchestrator | gRPC (mTLS) | ValidateLease(from, tenantId) per-message; P95 ≤ 20 ms |
| Inbound (sync) | sender-id-registry-service | gRPC (mTLS) | ReserveAlpha, CommitLease, RecallLease after KYC outcome |
| Inbound (sync) | campaign-service | gRPC (mTLS) | ListLeasedNumbers(tenantId) for sender selection |
| Inbound (sync) | customer-portal, admin-dashboard | HTTP REST (JWT/mTLS) | Browse, reserve, lease, release, quota management |
| Inbound (event) | compliance-engine | NATS compliance.tenant.suspended | Trigger lease suspension |
| Inbound (event) | billing-service | NATS billing.account.delinquent | Trigger lease suspension on non-payment |
| Outbound write | PostgreSQL numbering | TCP | Inventory, leases, reservations, audit |
| Outbound cache | Redis | TCP | Lease validation cache (TTL 60s); reservation locks (TTL configurable) |
| Outbound events | NATS JetStream | TCP | Lifecycle events to billing, consent-ledger, customer-portal, analytics |
| Outbound (manual) | ATRA portal | HTTPS (operator action) | Periodic block synchronisation reports |
7. High-Level Flow — Lease Lifecycle
8. Reservation / Hold / Release Workflow
The two-phase model exists because high-volume tenants need to pin numbers during a campaign-build session before committing. Without it, a tenant browsing the pool could see an alpha ID disappear mid-flow.
| Phase | TTL | Purpose | Allowed Transitions |
|---|---|---|---|
RESERVED | 15 min | Tenant browsing/preview | → AVAILABLE (TTL), → HELD, → LEASED |
HELD | 24 h | Tenant build phase, KYC pending | → AVAILABLE (TTL/release), → LEASED |
LEASED | per lease term | Active assignment | → SUSPENDED, → RECALLED |
Reservations and holds are stored in PG and mirrored as Redis keys with EXPIRE so the auto-release is event-driven (Redis keyspace notifications) plus a safety-net periodic sweep job.
Concurrent reserve attempts are resolved by INSERT … ON CONFLICT DO NOTHING against the PG unique index (identifier, state IN (RESERVED, HELD, LEASED)) — only one wins.
9. Per-Tenant Pool Management
Per-tenant rules:
maxLeasedMsisdn,maxLeasedShortCode,maxLeasedAlpha,maxActiveReservations(configurable per plan tier)- Cool-down enforcement: a tenant that recalls a number cannot re-lease the same identifier for the cool-down period (prevents phishing rotation)
- Vanity reservation: a tenant may purchase a long-term reservation on a vanity short code with auto-renew
10. Key Design Decisions
| Decision | Rationale |
|---|---|
| Numbering as a separate service from sender-id-registry | KYC verification and inventory slot management are distinct concerns with different change-cadences and ownership |
| Lifecycle states are explicit, not derived | Auditors require unambiguous state at any timestamp; computed states risk drift |
| Two-phase reservation (RESERVE → HELD → LEASED) | Prevents race conditions in high-traffic browsing; aligns with campaign-build UX |
| Quarantine cool-down on MSISDN recall | Prevents abuse rotation: a phisher cannot re-lease a recalled number immediately under a new tenant identity |
| Redis-mirrored reservations | Sub-50ms TTL precision for auto-release; PG remains source of truth on reconciliation |
ValidateLease is hot-path | Called per outbound message; cache aggressively (60 s TTL); invalidate on lifecycle event |
| Bulk MSISDN block import via signed CSV | MNOs deliver monthly block updates; signature ensures provenance |
| Per-tenant quotas enforced by service, not DB | Keeps quota logic versioned with code; quota changes audit-logged |
| Alpha-ID character class restricted to GSM 03.38 default | Prevents confusable / homoglyph alpha IDs (no MoƒA-AF lookalikes) |
| Vanity reservation as long-term lease + auto-renew | Avoids parallel vanity-reservation workflow; reuses lease audit trail |
11. Operational SLOs
| SLO | Target |
|---|---|
ValidateLease gRPC latency | P95 ≤ 20 ms (Redis-hit), ≤ 50 ms (PG-fallback) |
| Reservation creation latency | P95 ≤ 100 ms |
| Lease creation latency | P95 ≤ 200 ms (includes audit + event publish) |
| TTL expiry precision | within ±2 s of declared TTL |
| Pool exhaustion alert | within 60 s of remaining-capacity < 5 % |
| Cache invalidation lag after lifecycle event | P95 ≤ 1 s |
12. Cross-References
- docs/07-epics-and-user-stories.md §6.4 — epic catalog
- sender-id-registry-service — KYC + verification consumer
- number-intelligence-service — sibling for HLR/MNP intelligence
- sms-orchestrator — primary hot-path consumer
End of SERVICE_OVERVIEW.md