numbering-service — Application Logic
Version: 1.0 Status: Draft Owner: Commerce Engineering + Platform Engineering Last Updated: 2026-04-21 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL · SYNC_CONTRACT
1. Use-Case Catalogue
Use cases are grouped by epic (EP-NUM-01 Inventory & Lifecycle, EP-NUM-02 Reservation Workflow, EP-NUM-03 Pool Management) from _report.md. Each UC lists: trigger, inputs, outputs, latency budget, invariants, fail-closed behaviour.
| UC | Name | Trigger | Plane | Latency Budget |
|---|---|---|---|---|
| UC-01 | ValidateLease (hot path) | sms-orchestrator gRPC per-message | gRPC | P95 ≤ 20 ms cache-hit, ≤ 50 ms PG fallback |
| UC-02 | Reserve (15-min TTL) | customer-portal REST, internal gRPC | REST + gRPC | P95 ≤ 100 ms |
| UC-03 | Hold (promote to 24-h TTL) | customer-portal REST | REST | P95 ≤ 100 ms |
| UC-04 | Release (explicit) | customer-portal REST | REST | P95 ≤ 80 ms |
| UC-05 | Assign (lease) | customer-portal REST, internal gRPC | REST + gRPC | P95 ≤ 200 ms |
| UC-06 | RenewLease (autoRenew) | Daily cron | Worker | Batch; per-row ≤ 300 ms |
| UC-07 | Recall | admin REST, compliance.tenant.suspended consumer, billing.account.delinquent consumer | REST + consumer | P95 ≤ 250 ms |
| UC-08 | Reinstate (SUSPENDED → LEASED) | admin REST | REST | P95 ≤ 200 ms |
| UC-09 | ImportLeaseBatch (MNO CSV) | admin REST (multipart) | REST | ≤ 5 min for 100k rows |
| UC-10 | ReservationCleanup (TTL expiry) | Redis keyspace notification + safety cron (60 s) | Worker | P95 ≤ 2 s after TTL |
| UC-11 | QuarantineSweep | 5-minute cron | Worker | Per tick ≤ 30 s |
| UC-12 | DetectConflict | Synchronous on every write; nightly reconciliation cron | Worker | Nightly ≤ 10 min |
| UC-13 | GenerateRegulatorExport | Monthly cron (01:00 UTC on 1st) | Worker | ≤ 15 min |
| UC-14 | EnforceTenantQuota | Called from UC-02 / UC-05 | Library | ≤ 5 ms |
| UC-15 | BulkRecallByTenant | compliance.tenant.suspended consumer, tenant.deleted consumer | Worker | Best-effort within 60 s per tenant |
| UC-16 | Lookup (metadata-by-number) | routing-engine, number-intelligence gRPC | gRPC | P95 ≤ 15 ms |
2. Hot Path — ValidateLease (UC-01)
SLA: P95 ≤ 20 ms (Redis hit); ≤ 50 ms (PG fallback). Called once per outbound message by sms-orchestrator — this is the single most-traffic gRPC on the service.
Steps:
- Input validation. Reject with
INVALID_ARGUMENTifidentifierfails per-type regex (E.164 / short-code / alpha-ID) ortenantIdis not a UUIDv4. - Redis GET.
num:valid:{type}:{value}:{tenantId}with 60 s TTL. On HIT, decode{state, leaseId, effectiveUntil, tenantId, version}and answer:LEASED+tenant == requester+effectiveUntil > now()→valid: trueLEASED+tenant != requester→valid: false, reason: WRONG_TENANTSUSPENDED→valid: false, reason: LEASE_SUSPENDED- else →
valid: falsewith mapped reason.
- Redis MISS. Query PG:
Cache the row in RedisSELECT n.state, n.assigned_tenant_id, l.lease_id, l.effective_until, n.versionFROM numbering.numbers nLEFT JOIN numbering.leases l ON l.lease_id = n.assigned_lease_idWHERE n.value = :value AND n.type = :type
SETEX 60. - Not found. Return
valid: false, reason: NOT_REGISTERED. Cache the negative result for 30 s to absorb enumeration attacks. - Fail-closed. If Redis AND PG are unreachable: return
UNAVAILABLE. The orchestrator treats this as "do not send" — the message is retried or dead-lettered. At no point doesValidateLeasereturnvalid: truewithout a confirmed read.
Cache invalidation. Every state-mutation UC (UC-02, UC-05, UC-07) publishes num.cache.invalidate.v1 on the in-cluster NATS ephemeral subject; Redis keys are DELed within 1 s. Cache TTL alone is the backstop.
3. Reservation Workflow (UC-02, UC-03, UC-04, UC-10)
UC-02 Reserve (15-min TTL)
Inputs: { identifier, type, tenantId, idempotencyKey }
Steps:
- Check
TenantPool.maxActiveReservations— reject withRESERVATION_QUOTAif at limit. - Transactional write:
BEGIN;UPDATE numbering.numbersSET state = 'RESERVED',assigned_tenant_id = :tenantId,version = version + 1,updated_at = now()WHERE value = :value AND type = :typeAND state = 'AVAILABLE' AND version = :expected_versionRETURNING number_id;-- If 0 rows: state changed concurrently → 409 CONFLICTINSERT INTO numbering.reservations(reservation_id, number_id, tenant_id, kind, created_at, expires_at)VALUES (gen_random_uuid(), :numberId, :tenantId, 'RESERVE', now(), now() + interval '15 minutes');INSERT INTO numbering.audit (...); -- hash-chainedINSERT INTO numbering.outbox (...); -- number.reserved.v1COMMIT;
- Mirror to Redis:
SET num:reserve:{numberId} tenantId EX 900. Keyspace notifications drive UC-10. - Return
{ reservationId, expiresAt }.
Race handling. Two tenants racing on the same AVAILABLE identifier: CAS on (state = 'AVAILABLE' AND version = :expected) succeeds for exactly one; the loser gets CONFLICT and retries on a different candidate. See EVENT_SCHEMAS number.conflict.detected.v1.
UC-03 Hold — promote RESERVE → HELD
Same CAS pattern on state = 'RESERVED' AND assigned_tenant_id = :tenantId; new TTL 24 h. Emits number.reserved.v1 with kind: HOLD.
UC-04 Release
Explicit tenant action. Reverts RESERVED|HELD → AVAILABLE with CAS guard on assigned_tenant_id. Emits number.released.v1 with reason: TENANT_RELEASE. Releasing a LEASED identifier is rejected with USE_RECALL_FOR_LEASES.
UC-10 ReservationCleanup
Redis keyspace notifications on key expiry trigger an idempotent worker that:
SELECT reservation_id, number_id, kind FROM numbering.reservations WHERE released_at IS NULL AND expires_at < now() ORDER BY expires_at LIMIT 1000 FOR UPDATE SKIP LOCKED;- For each row, CAS
numbers.state RESERVED|HELD → AVAILABLE, setreservations.released_at = now(), release_reason = 'TTL_EXPIRED', emitnumber.released.v1. - Safety-net cron every 60 s runs the same SELECT in case Redis notifications were dropped.
4. Lease Workflow (UC-05, UC-06, UC-07, UC-08)
UC-05 Assign (lease)
Inputs: { identifier, type, tenantId, term, autoRenew, vanityFlag?, accountId? }
Steps:
EnforceTenantQuota(UC-14) — reject withQUOTA_EXCEEDEDif the tenant is atmaxLeased{MsisdnShortCodeAlpha}.- For ALPHA_ID: gRPC call to
sender-id-registry-service.IsVerified(alphaId, tenantId). If not verified, reject withALPHA_NOT_VERIFIED. - For VANITY short code: verify
TenantPool.vanityEnabled = trueANDNumberResource.subtype = VANITY; otherwise reject withNOT_VANITY_ELIGIBLE. - Transactional write:
BEGIN;-- Valid source states: AVAILABLE (bypass), RESERVED (own), HELD (own)UPDATE numbering.numbersSET state = 'LEASED',assigned_tenant_id = :tenantId,assigned_lease_id = :newLeaseId,version = version + 1WHERE value = :value AND type = :typeAND version = :expected_versionAND state IN ('AVAILABLE','RESERVED','HELD')AND (state = 'AVAILABLE' OR assigned_tenant_id = :tenantId);-- 0 rows → 409 with reason code (HELD_BY_OTHER_TENANT / QUARANTINE_ACTIVE / NOT_AVAILABLE)INSERT INTO numbering.leases (...);UPDATE numbering.reservations SET released_at = now(), release_reason = 'PROMOTED_TO_LEASE' WHERE …;INSERT INTO numbering.audit (…);INSERT INTO numbering.outbox (…); -- number.assigned.v1COMMIT;
- Invalidate Redis
num:valid:*for the identifier. - Publish
number.assigned.v1→ consumed bybilling-service(starts billing) andsender-id-registry-service(marks alpha-ID value as inventory-committed).
UC-06 RenewLease (daily cron)
- SELECT all leases with
auto_renew = true AND effective_until BETWEEN now() AND now() + interval '7 days' AND terminated_at IS NULL. - For each: gRPC
billing-service.PreviewCharge(tenantId, item: 'lease.renewal', ref: leaseId). If billing rejects, emitnumber.renewal.failed.v1 { reason: 'BILLING_REJECTED' }; do NOT extend. - On billing ack: INSERT new
leasesrow withprevious_lease_id = oldLeaseId,effective_from = oldLease.effective_until,effective_until = effective_until + term. Updatenumbers.assigned_lease_id. Emitnumber.renewed.v1. - Grandfathering: if the tenant's current quota would be breached by renewal, renewal proceeds anyway and emits a
quota.exceeded_by_renewalwarning metric — existing leases are never forcibly invalidated by quota changes.
UC-07 Recall
Inputs: { identifier, type, reason, actorUserId | actorService, ticketId? }
Sources:
- Admin REST (
REGULATOR_ORDER,ABUSE):ticketIdrequired, rejected with 422 if missing. compliance.tenant.suspended.v1consumer: bulk recall all tenant leases withreason: ABUSE, ticketId: complianceCaseId.billing.account.delinquent.v1consumer: transitions toSUSPENDEDfirst (grace) thenRECALLEDafter grace window.- Tenant self-release:
reason: TENANT_RELEASE. - Lease expiry: daily cron;
reason: EXPIRED.
Steps:
- Transactional write: CAS
LEASED|SUSPENDED → RECALLED; markleases.terminated_at, termination_reason. - Insert
QuarantineRecordwithquarantineFrom = now(),quarantineUntil = now() + cooloffwhere:- MSISDN: 90 d (platform default; configurable per subtype)
- SHORT_CODE standard: 30 d; VANITY: 365 d
- ALPHA_ID: 0 d → direct
RECALLED → AVAILABLEin same transaction (no cross-tenant recycling risk)
- Emit
number.recalled.v1+ (for non-alpha)number.quarantine.started.v1. - Invalidate Redis cache.
UC-08 Reinstate
Platform-admin only. Transitions SUSPENDED → LEASED with an audit entry. Requires reason and ticketId (e.g., dispute resolved, billing caught up).
5. MNO Lease Batch Import (UC-09)
Endpoint: POST /v1/admin/numbering/blocks/import multipart ({operatorId, signature, csvFile})
Steps:
- Verify CSV signature against
mno_signing_keys(RSA-SHA256, detached signature). On failure: 422SIGNATURE_INVALID+ audit. - Parse CSV streaming (expected columns:
msisdn, prefix, blockType, subtype, validFrom, validUntil). Validate each row:- E.164 format (
^\+93[0-9]{9}$) prefixmatchesoperatorId'sallowed_prefixes(fromLeaseContract)validUntil > validFrom
- E.164 format (
- Idempotent per-row INSERT … ON CONFLICT (value, type) DO NOTHING:
- Track
duplicatevsinsertedcounters.
- Track
- Insert
lease_import_batchjob row with{ imported, duplicates, invalid, operatorId }and emitnumber.lease.imported.v1per batch (not per row). - Invalid rows written to
lease_import_errorswith line number + reason for operator review. - Return summary
{ batchId, imported, duplicates, invalid }.
Pre-dated imports. Rows with validFrom > now() are inserted with state = AVAILABLE but excluded from ListAvailable until the date passes — a daily cron reactivates them.
6. Quarantine Sweep (UC-11)
5-minute cron (distributed Redis lock num:lock:quarantine-sweep):
UPDATE numbering.numbers
SET state = 'AVAILABLE',
quarantine_until = NULL,
assigned_tenant_id = NULL,
assigned_lease_id = NULL,
version = version + 1
WHERE state = 'QUARANTINE' AND quarantine_until < now()
RETURNING number_id;
For each released row: mark quarantine_records.completed_at = now(), emit number.quarantine.completed.v1. Idempotent; safe to run concurrently thanks to CAS.
Admin override (NUM-US-006 §4): POST /v1/admin/numbers/{id}/quarantine/release requires justification (min 20 chars), writes override_by/at/justification, and is captured in the daily compliance dashboard.
7. Conflict Detection (UC-12)
Synchronous: Every state-mutation CAS naturally prevents double-assignment; race losers receive CONFLICT and trigger a number.conflict.detected.v1 event.
Nightly reconciliation cron (02:00 UTC):
- Scan for any
LeaseContract.prefixRangeoverlaps across operators (shouldn't happen, but guards import bugs). - Verify every
LEASEDrow has exactly one non-terminatedLeaserow — emitnumber.conflict.detected.v1withkind: ORPHAN_LEASEorMISSING_LEASEon any mismatch. - Verify
audithash chain integrity (seeSECURITY_MODEL §4).
8. Regulator Monthly Export (UC-13)
Monthly cron at 01:00 UTC on the 1st (lock: num:lock:regulator-export):
SELECTinventory snapshot:(value, type, subtype, state, operator_id, current_tenant_id_hashed, current_lease_until, quarantine_until)— tenant IDs are hashed for privacy; ATRA receives aggregates per-operator, not per-tenant detail.- Serialise to CSV.gz; compute SHA-256; sign with platform RSA key from Vault PKI.
- Upload to
s3://ghasi-regulator-exports-{region}/numbering/{yyyy-mm}.csv.gzwith object-lock (WORM, 7 y retention). - Insert
regulator_exportsrow withstatus = GENERATED. - Emit
numbering.regulator.export.generated.v1— consumed byregulator-portal-serviceto expose the file to ATRA. regulator-portal-serviceis responsible for the ATRA submission workflow; numbering-service is the data producer only.
9. Tenant Quota Enforcement (UC-14)
Pre-check invoked by Reserve and Assign. Backed by Redis counter num:quota:{tenantId}:{class} with 5-min TTL, loaded lazily from PG.
SELECT
COUNT(*) FILTER (WHERE n.state IN ('LEASED','SUSPENDED') AND n.type = 'MSISDN') AS leased_msisdn,
COUNT(*) FILTER (WHERE n.state IN ('LEASED','SUSPENDED') AND n.type = 'SHORT_CODE') AS leased_short,
COUNT(*) FILTER (WHERE n.state IN ('LEASED','SUSPENDED') AND n.type = 'ALPHA_ID') AS leased_alpha,
COUNT(*) FILTER (WHERE n.state IN ('RESERVED','HELD')) AS active_reservations
FROM numbering.numbers n
WHERE n.assigned_tenant_id = :tenantId;
If any counter meets or exceeds the corresponding TenantPool limit: reject with QUOTA_EXCEEDED (including current and quota in the response body). Upgrade path: tenant upgrades plan → billing-service publishes tenant.plan.changed → numbering subscribes and updates TenantPool within 60 s.
10. Bulk Recall (UC-15)
Triggered by compliance.tenant.suspended.v1 and tenant.deleted.v1:
SELECT number_id FROM numbering.numbers WHERE assigned_tenant_id = :tenantId AND state IN ('RESERVED','HELD','LEASED','SUSPENDED');- Batch of 100 IDs per worker tick: call
Recall(UC-07) orRelease(UC-04) withreason: ABUSE(for suspension) orPLATFORM_RECALL(for deletion), all with the sameticketIdreferencing the upstream case. - On failure of any individual recall (e.g. CAS lost), the worker retries via NATS redelivery.
11. Lookup for Routing / Intelligence (UC-16)
gRPC Lookup(identifier, type) — consumed by routing-engine and number-intelligence-service. Returns:
{ numberId, value, type, subtype, state, operatorId, mcc, mnc, leaseContractId, assignedTenantId?, assignedLeaseId?, effectiveUntil?, version }
This is a non-tenant-scoped lookup (routing needs to know which MNO owns an MSISDN regardless of lease state). Response does not include PII — assignedTenantId is included only because routing-engine is an internal trusted service; tenant-facing REST never exposes it.
12. Latency Budgets & Rate Limits
| Surface | Limit |
|---|---|
gRPC ValidateLease | No Kong limit; per-pod concurrency 1000 |
gRPC Lookup | Per-pod concurrency 500 |
gRPC Reserve / Assign / Release / Recall | Per-pod concurrency 200 |
| REST admin | 600 req/min per user (Kong rate-limit) |
| REST tenant portal | 120 req/min per tenant |
| REST MNO bulk import | 5 req/hour per platform admin |
gRPC Reserve (per-tenant) | 60/min enforced in-service to prevent reservation flood |
Alerts on breach are defined in OBSERVABILITY.
13. Error Model
| Code | HTTP / gRPC | When |
|---|---|---|
NOT_REGISTERED | 404 / NOT_FOUND | identifier not in inventory |
NOT_AVAILABLE | 409 / FAILED_PRECONDITION | identifier already in non-AVAILABLE state |
HELD_BY_OTHER_TENANT | 409 / PERMISSION_DENIED | reservation/hold owned by another tenant |
QUARANTINE_ACTIVE | 409 / FAILED_PRECONDITION | {availableAt} returned in body |
QUOTA_EXCEEDED | 403 / RESOURCE_EXHAUSTED | per-class quota at limit |
RESERVATION_QUOTA | 403 / RESOURCE_EXHAUSTED | maxActiveReservations hit |
ALPHA_NOT_VERIFIED | 409 / FAILED_PRECONDITION | sender-id-registry reports unverified |
NOT_VANITY_ELIGIBLE | 422 / INVALID_ARGUMENT | non-vanity pool short code |
CONFLICT | 409 / ABORTED | CAS lost; client should retry with refreshed version |
SIGNATURE_INVALID | 422 / INVALID_ARGUMENT | MNO CSV signature failed verification |
INVALID_TRANSITION | 400 / FAILED_PRECONDITION | attempted impossible state transition |
End of APPLICATION_LOGIC.md