Skip to main content

numbering-service — Application Logic

Version: 1.0 Status: Draft Owner: Commerce Engineering + Platform Engineering Last Updated: 2026-04-21 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · DATA_MODEL · SYNC_CONTRACT


1. Use-Case Catalogue

Use cases are grouped by epic (EP-NUM-01 Inventory & Lifecycle, EP-NUM-02 Reservation Workflow, EP-NUM-03 Pool Management) from _report.md. Each UC lists: trigger, inputs, outputs, latency budget, invariants, fail-closed behaviour.

UCNameTriggerPlaneLatency Budget
UC-01ValidateLease (hot path)sms-orchestrator gRPC per-messagegRPCP95 ≤ 20 ms cache-hit, ≤ 50 ms PG fallback
UC-02Reserve (15-min TTL)customer-portal REST, internal gRPCREST + gRPCP95 ≤ 100 ms
UC-03Hold (promote to 24-h TTL)customer-portal RESTRESTP95 ≤ 100 ms
UC-04Release (explicit)customer-portal RESTRESTP95 ≤ 80 ms
UC-05Assign (lease)customer-portal REST, internal gRPCREST + gRPCP95 ≤ 200 ms
UC-06RenewLease (autoRenew)Daily cronWorkerBatch; per-row ≤ 300 ms
UC-07Recalladmin REST, compliance.tenant.suspended consumer, billing.account.delinquent consumerREST + consumerP95 ≤ 250 ms
UC-08Reinstate (SUSPENDED → LEASED)admin RESTRESTP95 ≤ 200 ms
UC-09ImportLeaseBatch (MNO CSV)admin REST (multipart)REST≤ 5 min for 100k rows
UC-10ReservationCleanup (TTL expiry)Redis keyspace notification + safety cron (60 s)WorkerP95 ≤ 2 s after TTL
UC-11QuarantineSweep5-minute cronWorkerPer tick ≤ 30 s
UC-12DetectConflictSynchronous on every write; nightly reconciliation cronWorkerNightly ≤ 10 min
UC-13GenerateRegulatorExportMonthly cron (01:00 UTC on 1st)Worker≤ 15 min
UC-14EnforceTenantQuotaCalled from UC-02 / UC-05Library≤ 5 ms
UC-15BulkRecallByTenantcompliance.tenant.suspended consumer, tenant.deleted consumerWorkerBest-effort within 60 s per tenant
UC-16Lookup (metadata-by-number)routing-engine, number-intelligence gRPCgRPCP95 ≤ 15 ms

2. Hot Path — ValidateLease (UC-01)

SLA: P95 ≤ 20 ms (Redis hit); ≤ 50 ms (PG fallback). Called once per outbound message by sms-orchestrator — this is the single most-traffic gRPC on the service.

Steps:

  1. Input validation. Reject with INVALID_ARGUMENT if identifier fails per-type regex (E.164 / short-code / alpha-ID) or tenantId is not a UUIDv4.
  2. Redis GET. num:valid:{type}:{value}:{tenantId} with 60 s TTL. On HIT, decode {state, leaseId, effectiveUntil, tenantId, version} and answer:
    • LEASED + tenant == requester + effectiveUntil > now()valid: true
    • LEASED + tenant != requestervalid: false, reason: WRONG_TENANT
    • SUSPENDEDvalid: false, reason: LEASE_SUSPENDED
    • else → valid: false with mapped reason.
  3. Redis MISS. Query PG:
    SELECT n.state, n.assigned_tenant_id, l.lease_id, l.effective_until, n.version
    FROM numbering.numbers n
    LEFT JOIN numbering.leases l ON l.lease_id = n.assigned_lease_id
    WHERE n.value = :value AND n.type = :type
    Cache the row in Redis SETEX 60.
  4. Not found. Return valid: false, reason: NOT_REGISTERED. Cache the negative result for 30 s to absorb enumeration attacks.
  5. Fail-closed. If Redis AND PG are unreachable: return UNAVAILABLE. The orchestrator treats this as "do not send" — the message is retried or dead-lettered. At no point does ValidateLease return valid: true without a confirmed read.

Cache invalidation. Every state-mutation UC (UC-02, UC-05, UC-07) publishes num.cache.invalidate.v1 on the in-cluster NATS ephemeral subject; Redis keys are DELed within 1 s. Cache TTL alone is the backstop.


3. Reservation Workflow (UC-02, UC-03, UC-04, UC-10)

UC-02 Reserve (15-min TTL)

Inputs: { identifier, type, tenantId, idempotencyKey }

Steps:

  1. Check TenantPool.maxActiveReservations — reject with RESERVATION_QUOTA if at limit.
  2. Transactional write:
    BEGIN;
    UPDATE numbering.numbers
    SET state = 'RESERVED',
    assigned_tenant_id = :tenantId,
    version = version + 1,
    updated_at = now()
    WHERE value = :value AND type = :type
    AND state = 'AVAILABLE' AND version = :expected_version
    RETURNING number_id;
    -- If 0 rows: state changed concurrently → 409 CONFLICT
    INSERT INTO numbering.reservations
    (reservation_id, number_id, tenant_id, kind, created_at, expires_at)
    VALUES (gen_random_uuid(), :numberId, :tenantId, 'RESERVE', now(), now() + interval '15 minutes');
    INSERT INTO numbering.audit (...); -- hash-chained
    INSERT INTO numbering.outbox (...); -- number.reserved.v1
    COMMIT;
  3. Mirror to Redis: SET num:reserve:{numberId} tenantId EX 900. Keyspace notifications drive UC-10.
  4. Return { reservationId, expiresAt }.

Race handling. Two tenants racing on the same AVAILABLE identifier: CAS on (state = 'AVAILABLE' AND version = :expected) succeeds for exactly one; the loser gets CONFLICT and retries on a different candidate. See EVENT_SCHEMAS number.conflict.detected.v1.

UC-03 Hold — promote RESERVE → HELD

Same CAS pattern on state = 'RESERVED' AND assigned_tenant_id = :tenantId; new TTL 24 h. Emits number.reserved.v1 with kind: HOLD.

UC-04 Release

Explicit tenant action. Reverts RESERVED|HELD → AVAILABLE with CAS guard on assigned_tenant_id. Emits number.released.v1 with reason: TENANT_RELEASE. Releasing a LEASED identifier is rejected with USE_RECALL_FOR_LEASES.

UC-10 ReservationCleanup

Redis keyspace notifications on key expiry trigger an idempotent worker that:

  1. SELECT reservation_id, number_id, kind FROM numbering.reservations WHERE released_at IS NULL AND expires_at < now() ORDER BY expires_at LIMIT 1000 FOR UPDATE SKIP LOCKED;
  2. For each row, CAS numbers.state RESERVED|HELD → AVAILABLE, set reservations.released_at = now(), release_reason = 'TTL_EXPIRED', emit number.released.v1.
  3. Safety-net cron every 60 s runs the same SELECT in case Redis notifications were dropped.

4. Lease Workflow (UC-05, UC-06, UC-07, UC-08)

UC-05 Assign (lease)

Inputs: { identifier, type, tenantId, term, autoRenew, vanityFlag?, accountId? }

Steps:

  1. EnforceTenantQuota (UC-14) — reject with QUOTA_EXCEEDED if the tenant is at maxLeased{MsisdnShortCodeAlpha}.
  2. For ALPHA_ID: gRPC call to sender-id-registry-service.IsVerified(alphaId, tenantId). If not verified, reject with ALPHA_NOT_VERIFIED.
  3. For VANITY short code: verify TenantPool.vanityEnabled = true AND NumberResource.subtype = VANITY; otherwise reject with NOT_VANITY_ELIGIBLE.
  4. Transactional write:
    BEGIN;
    -- Valid source states: AVAILABLE (bypass), RESERVED (own), HELD (own)
    UPDATE numbering.numbers
    SET state = 'LEASED',
    assigned_tenant_id = :tenantId,
    assigned_lease_id = :newLeaseId,
    version = version + 1
    WHERE value = :value AND type = :type
    AND version = :expected_version
    AND state IN ('AVAILABLE','RESERVED','HELD')
    AND (state = 'AVAILABLE' OR assigned_tenant_id = :tenantId);
    -- 0 rows → 409 with reason code (HELD_BY_OTHER_TENANT / QUARANTINE_ACTIVE / NOT_AVAILABLE)
    INSERT INTO numbering.leases (...);
    UPDATE numbering.reservations SET released_at = now(), release_reason = 'PROMOTED_TO_LEASE' WHERE;
    INSERT INTO numbering.audit ();
    INSERT INTO numbering.outbox (); -- number.assigned.v1
    COMMIT;
  5. Invalidate Redis num:valid:* for the identifier.
  6. Publish number.assigned.v1 → consumed by billing-service (starts billing) and sender-id-registry-service (marks alpha-ID value as inventory-committed).

UC-06 RenewLease (daily cron)

  1. SELECT all leases with auto_renew = true AND effective_until BETWEEN now() AND now() + interval '7 days' AND terminated_at IS NULL.
  2. For each: gRPC billing-service.PreviewCharge(tenantId, item: 'lease.renewal', ref: leaseId). If billing rejects, emit number.renewal.failed.v1 { reason: 'BILLING_REJECTED' }; do NOT extend.
  3. On billing ack: INSERT new leases row with previous_lease_id = oldLeaseId, effective_from = oldLease.effective_until, effective_until = effective_until + term. Update numbers.assigned_lease_id. Emit number.renewed.v1.
  4. Grandfathering: if the tenant's current quota would be breached by renewal, renewal proceeds anyway and emits a quota.exceeded_by_renewal warning metric — existing leases are never forcibly invalidated by quota changes.

UC-07 Recall

Inputs: { identifier, type, reason, actorUserId | actorService, ticketId? }

Sources:

  • Admin REST (REGULATOR_ORDER, ABUSE): ticketId required, rejected with 422 if missing.
  • compliance.tenant.suspended.v1 consumer: bulk recall all tenant leases with reason: ABUSE, ticketId: complianceCaseId.
  • billing.account.delinquent.v1 consumer: transitions to SUSPENDED first (grace) then RECALLED after grace window.
  • Tenant self-release: reason: TENANT_RELEASE.
  • Lease expiry: daily cron; reason: EXPIRED.

Steps:

  1. Transactional write: CAS LEASED|SUSPENDED → RECALLED; mark leases.terminated_at, termination_reason.
  2. Insert QuarantineRecord with quarantineFrom = now(), quarantineUntil = now() + cooloff where:
    • MSISDN: 90 d (platform default; configurable per subtype)
    • SHORT_CODE standard: 30 d; VANITY: 365 d
    • ALPHA_ID: 0 d → direct RECALLED → AVAILABLE in same transaction (no cross-tenant recycling risk)
  3. Emit number.recalled.v1 + (for non-alpha) number.quarantine.started.v1.
  4. Invalidate Redis cache.

UC-08 Reinstate

Platform-admin only. Transitions SUSPENDED → LEASED with an audit entry. Requires reason and ticketId (e.g., dispute resolved, billing caught up).


5. MNO Lease Batch Import (UC-09)

Endpoint: POST /v1/admin/numbering/blocks/import multipart ({operatorId, signature, csvFile})

Steps:

  1. Verify CSV signature against mno_signing_keys (RSA-SHA256, detached signature). On failure: 422 SIGNATURE_INVALID + audit.
  2. Parse CSV streaming (expected columns: msisdn, prefix, blockType, subtype, validFrom, validUntil). Validate each row:
    • E.164 format (^\+93[0-9]{9}$)
    • prefix matches operatorId's allowed_prefixes (from LeaseContract)
    • validUntil > validFrom
  3. Idempotent per-row INSERT … ON CONFLICT (value, type) DO NOTHING:
    • Track duplicate vs inserted counters.
  4. Insert lease_import_batch job row with { imported, duplicates, invalid, operatorId } and emit number.lease.imported.v1 per batch (not per row).
  5. Invalid rows written to lease_import_errors with line number + reason for operator review.
  6. Return summary { batchId, imported, duplicates, invalid }.

Pre-dated imports. Rows with validFrom > now() are inserted with state = AVAILABLE but excluded from ListAvailable until the date passes — a daily cron reactivates them.


6. Quarantine Sweep (UC-11)

5-minute cron (distributed Redis lock num:lock:quarantine-sweep):

UPDATE numbering.numbers
SET state = 'AVAILABLE',
quarantine_until = NULL,
assigned_tenant_id = NULL,
assigned_lease_id = NULL,
version = version + 1
WHERE state = 'QUARANTINE' AND quarantine_until < now()
RETURNING number_id;

For each released row: mark quarantine_records.completed_at = now(), emit number.quarantine.completed.v1. Idempotent; safe to run concurrently thanks to CAS.

Admin override (NUM-US-006 §4): POST /v1/admin/numbers/{id}/quarantine/release requires justification (min 20 chars), writes override_by/at/justification, and is captured in the daily compliance dashboard.


7. Conflict Detection (UC-12)

Synchronous: Every state-mutation CAS naturally prevents double-assignment; race losers receive CONFLICT and trigger a number.conflict.detected.v1 event.

Nightly reconciliation cron (02:00 UTC):

  • Scan for any LeaseContract.prefixRange overlaps across operators (shouldn't happen, but guards import bugs).
  • Verify every LEASED row has exactly one non-terminated Lease row — emit number.conflict.detected.v1 with kind: ORPHAN_LEASE or MISSING_LEASE on any mismatch.
  • Verify audit hash chain integrity (see SECURITY_MODEL §4).

8. Regulator Monthly Export (UC-13)

Monthly cron at 01:00 UTC on the 1st (lock: num:lock:regulator-export):

  1. SELECT inventory snapshot: (value, type, subtype, state, operator_id, current_tenant_id_hashed, current_lease_until, quarantine_until) — tenant IDs are hashed for privacy; ATRA receives aggregates per-operator, not per-tenant detail.
  2. Serialise to CSV.gz; compute SHA-256; sign with platform RSA key from Vault PKI.
  3. Upload to s3://ghasi-regulator-exports-{region}/numbering/{yyyy-mm}.csv.gz with object-lock (WORM, 7 y retention).
  4. Insert regulator_exports row with status = GENERATED.
  5. Emit numbering.regulator.export.generated.v1 — consumed by regulator-portal-service to expose the file to ATRA.
  6. regulator-portal-service is responsible for the ATRA submission workflow; numbering-service is the data producer only.

9. Tenant Quota Enforcement (UC-14)

Pre-check invoked by Reserve and Assign. Backed by Redis counter num:quota:{tenantId}:{class} with 5-min TTL, loaded lazily from PG.

SELECT
COUNT(*) FILTER (WHERE n.state IN ('LEASED','SUSPENDED') AND n.type = 'MSISDN') AS leased_msisdn,
COUNT(*) FILTER (WHERE n.state IN ('LEASED','SUSPENDED') AND n.type = 'SHORT_CODE') AS leased_short,
COUNT(*) FILTER (WHERE n.state IN ('LEASED','SUSPENDED') AND n.type = 'ALPHA_ID') AS leased_alpha,
COUNT(*) FILTER (WHERE n.state IN ('RESERVED','HELD')) AS active_reservations
FROM numbering.numbers n
WHERE n.assigned_tenant_id = :tenantId;

If any counter meets or exceeds the corresponding TenantPool limit: reject with QUOTA_EXCEEDED (including current and quota in the response body). Upgrade path: tenant upgrades plan → billing-service publishes tenant.plan.changed → numbering subscribes and updates TenantPool within 60 s.


10. Bulk Recall (UC-15)

Triggered by compliance.tenant.suspended.v1 and tenant.deleted.v1:

  1. SELECT number_id FROM numbering.numbers WHERE assigned_tenant_id = :tenantId AND state IN ('RESERVED','HELD','LEASED','SUSPENDED');
  2. Batch of 100 IDs per worker tick: call Recall (UC-07) or Release (UC-04) with reason: ABUSE (for suspension) or PLATFORM_RECALL (for deletion), all with the same ticketId referencing the upstream case.
  3. On failure of any individual recall (e.g. CAS lost), the worker retries via NATS redelivery.

11. Lookup for Routing / Intelligence (UC-16)

gRPC Lookup(identifier, type) — consumed by routing-engine and number-intelligence-service. Returns:

{ numberId, value, type, subtype, state, operatorId, mcc, mnc, leaseContractId, assignedTenantId?, assignedLeaseId?, effectiveUntil?, version }

This is a non-tenant-scoped lookup (routing needs to know which MNO owns an MSISDN regardless of lease state). Response does not include PIIassignedTenantId is included only because routing-engine is an internal trusted service; tenant-facing REST never exposes it.


12. Latency Budgets & Rate Limits

SurfaceLimit
gRPC ValidateLeaseNo Kong limit; per-pod concurrency 1000
gRPC LookupPer-pod concurrency 500
gRPC Reserve / Assign / Release / RecallPer-pod concurrency 200
REST admin600 req/min per user (Kong rate-limit)
REST tenant portal120 req/min per tenant
REST MNO bulk import5 req/hour per platform admin
gRPC Reserve (per-tenant)60/min enforced in-service to prevent reservation flood

Alerts on breach are defined in OBSERVABILITY.


13. Error Model

CodeHTTP / gRPCWhen
NOT_REGISTERED404 / NOT_FOUNDidentifier not in inventory
NOT_AVAILABLE409 / FAILED_PRECONDITIONidentifier already in non-AVAILABLE state
HELD_BY_OTHER_TENANT409 / PERMISSION_DENIEDreservation/hold owned by another tenant
QUARANTINE_ACTIVE409 / FAILED_PRECONDITION{availableAt} returned in body
QUOTA_EXCEEDED403 / RESOURCE_EXHAUSTEDper-class quota at limit
RESERVATION_QUOTA403 / RESOURCE_EXHAUSTEDmaxActiveReservations hit
ALPHA_NOT_VERIFIED409 / FAILED_PRECONDITIONsender-id-registry reports unverified
NOT_VANITY_ELIGIBLE422 / INVALID_ARGUMENTnon-vanity pool short code
CONFLICT409 / ABORTEDCAS lost; client should retry with refreshed version
SIGNATURE_INVALID422 / INVALID_ARGUMENTMNO CSV signature failed verification
INVALID_TRANSITION400 / FAILED_PRECONDITIONattempted impossible state transition

End of APPLICATION_LOGIC.md