Number Intelligence Service — Application Logic

Version: 1.0 Status: Draft Owner: Messaging Core Last Updated: 2026-04-21 Companion: DOMAIN_MODEL · API_CONTRACTS · SYNC_CONTRACT · SECURITY_MODEL

1. Use Cases

Use cases are organised by caller plane: hot path (gRPC, sub-15 ms P95), batch (cron, SFTP intake), and tenant-facing (REST via Kong, billable).

UC-Lookup: ResolveMsisdn (gRPC hot path)

Trigger: Any authorised internal caller (routing-engine, sms-firewall-service, compliance-engine, channel-router-service, fraud-intel-service) invokes NumberIntelligenceService/ResolveMsisdn(e164, opts).

Input: ResolveMsisdnRequest { e164, scope, opts: { maxStalenessSeconds, forceFresh, tpsWaitMs }, traceId }

Output: MsisdnAttribution { mno, originalMno?, lineType, country, mnpStatus, riskFlags[], source, confidence, cachedAt, stalenessSeconds, tier }

SLA: P95 ≤ 15 ms aggregate (assumes ≥ 99 % cache hit), P99 ≤ 50 ms. Worst-case under forced live_hlr is bounded by the MAP timeout (1500 ms) or REST timeout (800 ms) per SERVICE_OVERVIEW §10.

Steps:

Input validation. e164 must match ^\+[1-9]\d{6,14}$; normalise to NFKC; reject with INVALID_ARGUMENT on failure.
Compute msisdnHash with the platform pepper.
LRU probe. lru.get(msisdnHash) — in-process lru-cache v10; TTL 60 s; 100 000 entries/pod. On HIT return immediately with tier = LRU, source preserved from the cached record.
Redis probe. GET numint:lookup:{hash} — hash-per-record containing { mno, lineType, country, mnpStatus, vlr, cachedAt, source, confidence }. On HIT populate LRU and return with tier = REDIS.
Postgres probe. SELECT * FROM numint.number_records WHERE msisdn_hash = $1 on the replica pool. On HIT with cachedAt >= now() - ttl_by_class(source):
- Populate Redis with per-class TTL (LINE_TYPE 30 d, MNO 24 h, VLR 5 min; see SERVICE_OVERVIEW §9).
- Populate LRU; return tier = PG.
Stale or missing — apply refresh policy:
- If opts.forceFresh = true OR opts.maxStalenessSeconds is violated by the PG row, proceed to UC-HlrProbe.
- Else return PG row with confidence = LOW and tier = PG; do not probe live HLR.
MNP overlay. Before returning, consult PortabilityRecord most-recent row for this msisdnHash: if a port is recorded after NumberRecord.cachedAt, overwrite mnoId with recipientMnoId and emit numint.mnp.divergence.v1 for fraud correlation (async).
MNP divergence detection. When step 7 triggers, mark riskFlags += MNP_DIVERGENCE on the response.
Write-through (async). On any live/MNP update, enqueue an outbox write (UC-WriteThrough) so the durable store grows toward steady-state coverage.
Metric side-effects. Increment numint_lookup_total{tier, source, confidence}, record numint_lookup_duration_seconds histogram.
Return to caller.

Error codes:

gRPC status	Condition
`INVALID_ARGUMENT`	Malformed E.164
`RESOURCE_EXHAUSTED`	Per-pod concurrency cap (default 10 000 in-flight)
`DEADLINE_EXCEEDED`	> 1 s default deadline (rare — usually cache returns sub-ms)
`UNAVAILABLE`	PG and Redis both down AND no live HLR fallback succeeds → `{ source: "PREFIX_FALLBACK", confidence: UNKNOWN }` preferred over error; only if even the prefix table load fails do we return `UNAVAILABLE`
`INTERNAL`	Unhandled exception

Fail-degraded rationale. routing-engine has a prefix-table fallback; sms-firewall-service has allow-by-default for unknown origins; compliance-engine GEO_RESTRICTION treats UNKNOWN country as the most-restrictive class. A typed answer with LOW/UNKNOWN confidence is always more useful than an error.

UC-BulkLookup: ResolveBatch (gRPC server-streaming)

Trigger: sms-orchestrator bulk-submit pipeline or tenant SDK calls ResolveBatch(repeated e164).

Input: ResolveBatchRequest { entries: string[ ≤ 1000 ], opts, traceId }

Output: Stream of MsisdnAttribution in input order, one message per entry.

SLA: P95 ≤ 80 ms for 500-entry batch at ≥ 95 % cache-hit.

Steps:

Validate size (> 1000 → RESOURCE_EXHAUSTED); validate each entry (invalid → emit one error slot for that index, not a whole-batch fail).
Deduplicate. Group by msisdnHash; consult cascade once per unique hash; replicate into each slot.
Parallel cascade. For unique entries: LRU batch-get → Redis MGET → Postgres WHERE msisdn_hash = ANY($1).
Live-HLR fan-out (bounded). If forceFresh or stale beyond limit, enqueue live probes through the per-MNO TPS governor; entries exceeding the 2 s internal deadline return with source = FALLBACK_PREFIX, confidence = LOW, tier = FALLBACK.
Emit in input order.

Error handling: Per-slot errors are returned inline; a partial failure never fails the whole batch unless input validation of the request envelope fails.

UC-HlrProbe: Live HLR lookup via ni-hlr-gateway

Trigger: UC-Lookup or UC-BulkLookup escalates to live, OR admin explicitly invokes ProbeHlr(e164).

Steps:

TPS gate. EVAL Lua against Redis token bucket numint:tps:hlr:{mnoHint} (capacity = MnoSnapshot.tpsLimit, refill = same / sec). On bucket empty, wait up to opts.tpsWaitMs (default 200 ms). On exhaustion, emit HlrProbe { status: THROTTLED } and return the most-recent persisted answer with source = stale_throttled, confidence = MEDIUM.
Transport selection. Read MnoSnapshot.hlrEndpoint.kind:
- MAP → dispatch through ni-hlr-gateway LiveLookup(e164, mnoHint) gRPC. Gateway builds a MAP SendRoutingInfoForSM per 3GPP TS 29.002; application context shortMsgGatewayContext-v3; timeout 1500 ms.
- REST → gateway issues POST {endpoint}/v1/hlr/lookup with client-credentials JWT; timeout 800 ms.
Response normalisation. The gateway returns { imsi, vlr, lineType, mnoId }. NI derives country from the E.164 CC (MCC derivation from IMSI first 3 digits is used as a secondary confirmation).
PCAP sampling. At 0.1 % sampling, the gateway captures the full MAP TCAP+SCCP PDU (encrypted with KMS key numint-pcap-kek) to MinIO numint-hlr-pcap/ for post-incident review.
Write-through to PG + Redis + LRU (UC-WriteThrough).
Record HlrProbe row (append-only) with status, durationMs, resultSnapshot.
Emit numint.hlr_probe.completed.v1 (async, for fraud-intel VLR-change correlation).

Error codes:

Condition	Behaviour
`DEADLINE_EXCEEDED` (MAP timeout)	Status `TIMEOUT`; caller gets last-known answer with `confidence = LOW`
`MAP_ABORT` (SS7 MAP abort)	Status `MAP_ABORT`; same as timeout
`REST 5xx`	Status `REST_5XX`; retry once then stale fallback
`ADAPTER_DOWN`	Gateway DaemonSet pod unreachable → retry against sibling pod; alert `NumIntHlrAdapterDown`

Mermaid sequence:

UC-WriteThrough: Authoritative attribution UPSERT

Trigger: Any successful live HLR or MNP reconciliation result.

Steps:

Begin PG transaction.
INSERT INTO numint.number_records (...) ON CONFLICT (msisdn_hash) DO UPDATE SET mno_id = EXCLUDED.mno_id, line_type = EXCLUDED.line_type, vlr = EXCLUDED.vlr, imsi_prefix = EXCLUDED.imsi_prefix, last_seen = now(), lookup_count = lookup_count + 1, version = version + 1, cached_at = now() WHERE number_records.version = $expected_version.
If mno_id or mnp_status changed vs prior, insert numint.outbox row for numint.attribution.changed.v1.
SETEX numint:lookup:{hash} with per-class TTL.
Commit; LRU update happens outside the transaction.
The caller-facing response does not wait on write; failures retry through the outbox with exponential backoff (max 6 attempts).

Idempotency. Same (msisdn_hash, mno_id, line_type) as the current row → version bump only; no event emitted.

UC-MnpReconciliationDaily: Per-MNO SFTP intake

Trigger: Kubernetes CronJob mnp-recon at 02:30 Asia/Kabul daily; one job per MNO (fan-out). Runs only in kbl region.

Steps:

Distributed lock. SET numint:lock:mnp_recon:{mnoId} NX EX 1800 — prevents concurrent runs during cron re-runs.
Fetch. Pull sftp://{mno-sftp}/mnp/{yyyy-mm-dd}.csv (CSV: msisdn,donor_mno,recipient_mno,port_date,direction).
Archive raw file. PUT s3://numint-mnp-raw/{mnoId}/{yyyy}/{mm}/{dd}.csv with sha256:{hash} tag.
Validate each row: E.164 regex; valid mno_id; port_date <= today; CSV schema version header.
Conflict detection pre-insert. For each row compute msisdnHash; read the most-recent existing PortabilityRecord for this MSISDN. If a different recipientMnoId with port_date within ±2 days exists, this is a conflict — insert into numint.reconciliation_conflicts and SKIP the port insert (the active NumberRecord.mnpStatus is not updated).
Insert valid, non-conflicting rows into numint.portability_history via INSERT … ON CONFLICT DO NOTHING keyed on (msisdn_hash, port_date, recipient_mno_id, source_feed).
Materialise NumberRecord. For each new port, UPDATE numint.number_records SET mno_id = :recipient, original_mno_id = COALESCE(original_mno_id, :donor), mnp_status = 'PORTED_IN', version = version + 1, cached_at = now().
Chain hash. Update ReconciliationRun.recordHash = sha256(canonical(run-payload) || prevChainHash) — per-MNO chain.
Invalidate cache. DEL numint:lookup:{hash} for every changed MSISDN; emit numint.attribution.changed.v1 per row to warm subscriber caches (routing-engine, sms-firewall-service).
Summary event. Publish numint.reconciliation.completed.v1 { runId, mnoId, totalRecords, accepted, rejected, conflictsCount, durationMs, fileSha256 }.
Failure handling. If SFTP fetch fails, the job retries hourly until 23:00 same day; after that it escalates to P1 via NumIntMnpReconciliationStale alert.

Budget. Each MNO's daily file is typically < 100 k rows; whole run completes P99 ≤ 4 h end-to-end per SERVICE_OVERVIEW §1.

UC-MnpConflictResolve: Admin dispute resolution

Trigger: Platform admin invokes POST /v1/admin/mnp/conflicts/{conflictId}/resolve with { resolution, note }.

Steps:

Load ReconciliationConflict by id; reject if resolution IS NOT NULL (already resolved).
Apply the chosen resolution:
- A_WINS → insert the candidate A PortabilityRecord; materialise NumberRecord.
- B_WINS → insert B.
- KEEP_BOTH_PENDING_VENDOR_CONFIRM → leave state as-is; mark conflict as deferred; create a follow-up ticket.
- DISCARDED → ignore both (e.g., both turned out to be erroneous reports).
Write AuditLog entry { entityType: 'MNP_CONFLICT', action: 'RESOLVE', before, after }.
Emit numint.mnp.changed.v1 if a record was committed.

UC-EirCheck: LookupEir(imei)

Trigger: sms-firewall-service or fraud-intel-service calls LookupEir(imei).

Steps:

Validate Luhn per 3GPP TS 23.003 §6.2.1; reject malformed with INVALID_ARGUMENT.
GET numint:eir:{imeiHash} (Redis) → HIT returns { status, reasonCode, reportedBy[], lastUpdated }.
On MISS: SELECT … FROM numint.eir_records WHERE imei_hash = $1. On row return, populate Redis with 24 h TTL.
On row absent: return { status: UNKNOWN } — never error (known-unknowns are a legitimate response shape).
Emit numint_eir_lookup_total{outcome} metric.

UC-LookupBillingEvent: Per-call billing meter

Trigger: Public Lookup API REST call (GET /v1/lookup/{msisdn} or POST /v1/lookup/batch).

Steps:

After successful response, determine SKU:
- lookup.v1 — standard (cache or PG path; maxStaleness >= 86400).
- lookup.fresh.v1 — forced live probe (maxStaleness < 86400 AND a live HLR was actually issued).
Compute msisdnHash with per-tenant salt (tenantSalt loaded from Vault KV secret/ghasi/numint/tenant-salts/{tenantId}).
Insert numint.outbox row with subject billing.metering.recorded.v1 and payload { tenantId, sku, quantity: 1, occurredAt, requestId, msisdnHash }.
Insert matching numint.lookup_audit row (hash-chained).
Response header X-Metering-Status: ok (or degraded if outbox insert failed; the call still returns 200 but the tenant is not billed — SRE alert fires).
Internal gRPC callers (SPIFFE SAN in {routing-engine, sms-firewall-service, compliance-engine, channel-router-service, fraud-intel-service}) are not metered.

UC-CacheWarmCold: Warm-on-deploy

Trigger: Pod startup (readiness gate OFF until warm completes) or cron numint-cache-warm hourly.

Steps:

Query top-N MSISDNs by lookup_count from numint.number_records (default N = 500 000, tuned per pod capacity).
Load into Redis via SETEX in pipelined batches of 1 000.
Emit numint.cache.refreshed.v1 { kind: "warm_on_deploy", keys: N, durationMs }.
Flip readiness to ready once ≥ 80 % of the target is loaded.

UC-TenantQuotaEnforce: Rate limit + monthly cap

Trigger: Every Public Lookup API call.

Steps:

Plan snapshot. Load TenantLookupQuota from Redis cache numint:quota:{tenantId} (TTL 60 s); on MISS read PG.
RPS bucket. Lua-eval Redis token bucket numint:tps:lookup:{tenantId}; capacity = plan RPS (default 10). On empty, 429 + Retry-After.
Monthly counter. INCR numint:quota:lookup:{tenantId}:{yyyymm}; expire at 1st of next month 00:00 Asia/Kabul; if result > monthlyQuota, 429.
Fresh-lookup separate bucket. For maxStaleness < 86400, enforce freshLookupRpsLimit separately (lower cap — defends SS7 quota).
Plan changes. Consume billing.tenant.plan.changed.v1 → invalidate local cache → new caps take effect within 60 s.
Quota audit. Emit audit.lookup.quota_exceeded.v1 when 429 occurs (for tenant churn analysis).

UC-AuditChainVerify: Daily hash-chain integrity check

Trigger: CronJob numint-audit-verifier at 04:30 Asia/Kabul daily (and on demand via admin endpoint).

Steps:

Acquire distributed lock numint:lock:audit_verifier.
For LookupAuditEntry and PortabilityRecord partitions updated in last 24 h: re-compute chain tail-to-head; compare recordHash with stored values.
On mismatch: insert AuditLog { action: AUDIT_INTEGRITY_BROKEN }; page on-call with NumIntAuditChainBroken CRITICAL; freeze writes (manual un-freeze only).
On success: emit numint.audit.chain_verified.v1 (ops marker).

2. Performance Optimisation

2.1 Fast-path ordering (sub-5 ms budget)

Evaluation order in UC-Lookup is chosen so the majority of calls terminate at step 3:

LRU (P50 0.2 ms) — absorbs OTP-storm repeat lookups (same MSISDN hit 3-10× within seconds).
Redis (P50 1.5 ms) — per-region hot cache; 6-node cluster sized for ≥ 95 % hit ratio.
Postgres (P50 6 ms) — replica pool; monthly-partitioned number_records.
Live HLR (P50 250 ms MAP / 80 ms REST) — MNO-facing; TPS-governed.

2.2 Budget enforcement

Each ResolveMsisdn call has an internal 12 ms budget (leaves 3 ms for gRPC serialisation). Per-step budget sub-allocation:

Step	Budget
Validation + hash	0.1 ms
LRU get	0.2 ms
Redis get	3 ms
PG select	8 ms
MNP overlay	2 ms

If budget exhausts mid-step, step 6 default behaviour (return PG with confidence = LOW) applies.

2.3 Redis-Lua atomic TPS gate

-- KEYS[1] = numint:tps:hlr:{mno}
-- ARGV[1] = capacity, ARGV[2] = refill_per_sec, ARGV[3] = now_ms
local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(bucket[1]) or tonumber(ARGV[1])
local ts     = tonumber(bucket[2]) or tonumber(ARGV[3])
local elapsed = math.max(0, tonumber(ARGV[3]) - ts) / 1000.0
tokens = math.min(tonumber(ARGV[1]), tokens + elapsed * tonumber(ARGV[2]))
if tokens < 1 then
  redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', ARGV[3])
  return 0
end
tokens = tokens - 1
redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', ARGV[3])
redis.call('EXPIRE', KEYS[1], 3600)
return 1

3. MNP Conflict Resolution Heuristics

When two MNOs claim the same ported number within ±2 days:

Heuristic	Weight
Which MNO's file has a more recent `sourceFeed` timestamp?	30 %
Which candidate has the later `portDate`?	25 %
Does a recent `HlrProbe` confirm one of the two MNOs as current?	30 %
Does `fraud-intel-service` flag one side as a known conflict-prone MNO for this MSISDN range?	15 %

Weighted score > 0.7 → platform can propose an auto-resolution; lower → surfaces for manual review. Auto-resolutions are still reviewable (5-day undo window).

4. SLA Budgets (summary)

Use case	P50	P95	P99
UC-Lookup cache-hit	1 ms	5 ms	10 ms
UC-Lookup PG fallback	6 ms	15 ms	30 ms
UC-Lookup live HLR forced	250 ms	600 ms	1200 ms
UC-BulkLookup (500 entries, 95 % cache)	40 ms	80 ms	150 ms
UC-EirCheck	2 ms	8 ms	20 ms
UC-MnpReconciliationDaily per MNO	30 min	90 min	4 h
UC-AuditChainVerify 24 h window	60 s	120 s	5 min
Public Lookup GET (REST)	8 ms	200 ms	500 ms
Public Lookup batch POST (100 entries)	200 ms	800 ms	2000 ms

1. Use Cases​

UC-Lookup: ResolveMsisdn (gRPC hot path)​

UC-BulkLookup: ResolveBatch (gRPC server-streaming)​

UC-HlrProbe: Live HLR lookup via ni-hlr-gateway​

UC-WriteThrough: Authoritative attribution UPSERT​

UC-MnpReconciliationDaily: Per-MNO SFTP intake​

UC-MnpConflictResolve: Admin dispute resolution​

UC-EirCheck: LookupEir(imei)​

UC-LookupBillingEvent: Per-call billing meter​

UC-CacheWarmCold: Warm-on-deploy​

UC-TenantQuotaEnforce: Rate limit + monthly cap​

UC-AuditChainVerify: Daily hash-chain integrity check​

2. Performance Optimisation​

2.1 Fast-path ordering (sub-5 ms budget)​

2.2 Budget enforcement​

2.3 Redis-Lua atomic TPS gate​

3. MNP Conflict Resolution Heuristics​

4. SLA Budgets (summary)​

1. Use Cases

UC-Lookup: ResolveMsisdn (gRPC hot path)

UC-BulkLookup: ResolveBatch (gRPC server-streaming)

UC-HlrProbe: Live HLR lookup via ni-hlr-gateway

UC-WriteThrough: Authoritative attribution UPSERT

UC-MnpReconciliationDaily: Per-MNO SFTP intake

UC-MnpConflictResolve: Admin dispute resolution

UC-EirCheck: LookupEir(imei)

UC-LookupBillingEvent: Per-call billing meter

UC-CacheWarmCold: Warm-on-deploy

UC-TenantQuotaEnforce: Rate limit + monthly cap

UC-AuditChainVerify: Daily hash-chain integrity check

2. Performance Optimisation

2.1 Fast-path ordering (sub-5 ms budget)

2.2 Budget enforcement

2.3 Redis-Lua atomic TPS gate

3. MNP Conflict Resolution Heuristics

4. SLA Budgets (summary)